The predictability of COVID-19 mortality rates based on ex-ante economic, health and social indicators

The paper analyses the differences of COVID-19 mortality rates (MR) in 24 European countries. We explain MRs on the available, reliable ex-ante economic, health and social indicators pertaining to the year 2019 – i.e., before the outbreak of the pandemic. Using simple regression equations, we received statistically significant results for 11 such variables out of 28 attempts. Our best model with two ex-ante independent variables explains 0.76 of the variability of our ex-post dependent variable, the logarithm of Cumulative COVID Deaths. The estimated coefficient for the variable Density of Nurses shows that having one more nurse per 1,000 of population decreases cumulative COVID deaths by almost 15%. Similarly, one more unit Consumption of Non-Prescribed Medicine decreases cumulative deaths by 5%. It seems that until now those European countries were successful in minimising the fatalities where the population had a high level of health literacy, people pursue healthier lifestyle and the healthcare systems worked with a relatively large nursing force already prior to the COVID pandemic.


INTRODUCTION
We know three things for sure about the pandemic unfolding before our eyes since early 2020, and there is one known unknown factor. What we can be certain of is that: (i) the coronavirus syndrome is an infectious disease caused by a newly discovered type of the coronavirus family. Most people infected with the virus experience mild to moderate respiratory illness and recover without requiring any special treatment. But in some cases, the disease can cause severe medical complications and lead to death. (ii) medical science is now much better prepared than ever before to prevent and treat a pandemic like COVID, and as a result (iii) this disease causes significantly fewer deaths, both in absolute numbers and relative to the population than similar health cataclysms of the past centuries.
What we do not understand, what we do not know, is: (iv) why does the spread of the disease through human-to-human transmission, which potentially threatens the entire human race evenly, differs so considerably among the countries.
The present paper focuses on this last question from a novel, econometric approach. We shall refer to this as "the black box approach". Our country sample reflects the limitations of data availability on both sides of the regression analysis (two dependent and 28 independent variables), as well as our assumptions about the cross-country comparability concerning commonly used definitions and reliable reporting. We selected 16 West European countries and 8 European transition economies (24 in total).
Looking back from a 21st century perspective, we can state without any doubt that the history of humankind was a history of communicable diseases. On the one hand, low-intensity infections have always been present in all countries (e.g., seasonal influenza, malaria), but on the other hand, epidemics have flared up dramatically from time to time and in some places, causing the deaths of thousands and millions. In this regard, we refer here to just two important historical examples. The bubonic plague of 1,347-1,351 killed 30-40% of the population of Western Europe, and in England the figure is thought to have been as high as 70%, while Eastern Europe largely escaped the pandemic. The death toll is estimated at between 50 and 200 million. 1 As widely known, the last major pandemic to affect humanity as a whole was the 1918-1920 influenza (known at that time as the Spanish flu), when the H1N1 influenza virus was estimated to have infected 1/3 of the total human population in three years with a death toll of 50-100 million, or 2-4% of the world's population (Taubenberger -Morens 2006). The available estimates for individual countries are, not surprisingly, sporadic and not comparable directly. In the United States, the Spanish flu killed 1% of those infected (case fatality rate, CFR 5 1%) 2 , 1 Plague is transmitted between rodents by rodent fleas and can be transmitted to people when infected rodent fleas bite them. As with many zoonotic diseases, plague is a very severe disease in people, with case fatality rates (CFR) of 50-60% if left untreated. Today, bubonic plague is entirely curable with antibiotics, but preventive vaccination has not yet been developed. There are about 2000 cases of the disease worldwide every year, with Madagascar being the most affected country (WHO 2000). 2 Case fatality rate (CFR) 5 Proportion of people who die from a specified disease among all individuals diagnosed with the disease over a certain period of time. with a total of 675,000 deaths. In Ghana, 5% of the population died in the first two months after the outbreak, in Samoa in the Polynesian archipelago 20% of the population died, etc.
So far, the present coronavirus pandemic has caused 4-10 million deaths in 18 months, 3 a 0.05% loss of the world's population. This is two orders of magnitude less than the proportion the world experienced during the Spanish flu or the apocalyptical attacks of the bubonic plague bacteria (Table 1, column 5).
The etiology of the plague was not understood until the general acceptance of the germ theory in the 19 th century. From this, scientists established that the cause of the striking disparities in loss of life between the two parts of Europe was straightforward: the germs were carried from Asia to Europe through sea-trade, thus the landlocked part of Europe was fortunate to be spared from the deadly communicable Bacteria yersinia pestis. The actual cause of the Spanish fluthe H1N1 virus that caused the diseasewas not identified until 1933; the proven safe vaccine against influenza was not available until 1945. By contrast, the responsible pathogen for COVID, the severe acute respiratory syndrome virus (SARS-CoV-2) was isolated in China literally within days of the first clinically diagnosed cases and it took less than 12 months to develop different types of the first generation of effective vaccines in the US, the UK, China and Russia.
According to the Worldometer database, which is updated daily, in July 2021, cumulative mortality per 100 people (i.e., COVID-related deaths as a percentage of the total population) was highest in the world in Peru and Hungary: 0.6 and 0.3, respectively. At that time, these two countries topped the "list of shame". 4 The United States came 21 st with 0.2 and Sweden 36 th with 0.1. India, with 1.4 billion inhabitants, came 107 th with 0.03 per 100 inhabitants. In China, which is roughly the same size as India, ranked 200 th on the list, there were only 3 (!) deaths per 1 million inhabitantsthis figure does not appear to be reliable.
Unfortunately, there is a false perception of the pandemic in the public opinion worldwide. The media has been bombarding the people from the very beginning with absolute death figures, such as "4 million total deaths in the world" or "130 thousand deaths in the UK", without emphasising that the population size of different countries varies enormously. E.g., the Hungarian public was overwhelmed with the high absolute number of COVID-deaths in Italy and the UK, although proportionally the number of fatalities was bigger in Hungary than in the other two countries. Not surprisingly, this false perception distorts the thinking horizon of policy experts as well, because they must take into consideration what people thinkeven if these thoughts are ill-founded.
It is a matter of ethical and/or political perspective whether the disparities in MRs among these above-mentioned countries and their deviation from the world average (0.05%) is considered dramatically significant or relatively insignificant. There are legitimate and sensible arguments for both positions. One can say that every death matters. But one can also defend a position by saying that on per capita basis, small and changing country differences should not be allowed to influence the strategies of the public health system. Furthermore, there is also a 3 The lower figure is the registered number, the higher figure is an expert estimate. On 21 May 2021, the WHO estimated that the proper number would be two to three times higher than the Worldometer's calculation based on officially reported national data. 4 It is worth noting that among the top 20 countries of this shame-list, 10 countries can be categorised as post-communist economies (Bosnia-Herzegovina, Bulgaria, Croatia, Czechia, Hungary, Montenegro, North Macedonia, Slovakia, Slovenia, and Poland). It is, however, beyond the scope of the present paper to dig deeper into the causes of this striking phenomenon. logical explanation what the leader of the Swedish anti-COVID public health policy said at the very outset, namely that within 1-2 years, the infection rates would be roughly the same in all countries of the world and herd immunity would be eventually reached at about the same time everywhere. 5 This may be the result soon, but so far, the statistics have not borne out this assumption. In the eyes of the public, the total rounded number of COVID deaths in Sweden (15 thousand), Hungary (30 thousand), and in Israel (6,500) signifies massive disparities, given the trivial fact that these three countries are roughly equal in population size.

MEASURING THE DISEASE BURDEN -DEPENDENT VARIABLES
The present paper puts the COVID-related official death numbers into focus by emphasising that more than 90% of the registered worldwide coronavirus cases end with recovery. 6 What really matters from a public health perspective is the loss of life, not the illness itself. There are several methods to measure the death burden. One way is to look at the CFR as we did in column 6 of Table 1 above for the comparison of different epidemics in history and as Pa zitn y et al. (2021) show in the present Special Issue of this journal, using CFR as the sole dependent variable of their econometric model. We built our database from two other country-by-country metrics as ex-post dependent variables: 1: Excess death as reported by Eurostat and 2: Cumulated death, as reported by Worldometer.
1. The excess death indicator was created by Eurostat in April 2020 with the help of national statistical institutes. The number of deaths from all causes is compared with the expected number of deaths extrapolated from data of a certain period in the past. The indicator is expressed as the percentage rate of additional deaths in a month, compared to a base period (2016-2019). The higher the value, the more additional deaths have occurred compared to the base, on the contrary, a negative value means that fewer deaths happened in a month under reviewed compared to the base period (see Appendix). 2. Cumulated death is reported by Worldometer. It shows the total number of detected COVID-related death cases between the beginning of January 2020 and July 16, 2021 (our cut-off date) per 1 million population. 7 To make the two variables directly comparable, the excess death data needed to be aggregated across timei.e., across 16 months, because of the tremendous volatility of the monthly time series in many countries of the sample. 8 As a quick illustration of our findings, let us take the case of Poland, where the excess death in November 2020 was 97% higher than in the base periodthe largest figure in the table. Or look at Slovenia, where in November 2020, excess mortality was 91% above the 2016-2019 base period, while three months later, in February 2021 the number of total deaths was 1 per cent below the base period average.
In column 2 of Table 2, the above-mentioned aggregation was done already. The data are presented in alphabetic order of the countries to ease the readers' navigation. It is undeniable 6 For the world as a whole, 173 million out of 190 million peopleto be precise. 7 Worldometer (formerly Worldometers), is a reference website that provides counters and real-time statistics for diverse topics. It is owned and operated by a data company Dadax. In 2020, the website attained popularity due to hosting statistics relating to the COVID pandemic. The underlying cumulated death data used in the present paper are almost identical with another frequently used database, Our World in Data compiled from the daily data set provided by Johns Hopkins University. https://www.worldometers.info/coronavirus/. that the differences among the countries are significant. The worst results belong to Poland and the Czech Republic, in which until May 2021, 23% more people died from all causes than in the base period of 2016-2019. The best figures were reported from Norway, in which not more but 2% less people died during the COVID-crisis compared to the base period. Column 3 shows the total number of COVID-caused death cases per 1 million population. In our sample, the highest Spearman correlation 0.84 number was reported from Hungary (a land-locked, middle-size country in Central Europe) and the lowest one from Iceland (essentially comprising from a remote, huge main island at the juncture of the North Atlantic and Arctic Oceans). In columns 4 and 5 the absolute figures were converted into ranking, which allow us to compare the differences between the main messages of the two underlying data series. The Spearman correlation analysis confirms the readers' first visual inspection of Fig. 1: The two series are strongly correlating (ῥ 5 0.84). Being assured by the similarity of the two dependent variable time series, in the remaining part of the paper only the total COVID death number (Table 2, column 2) will be used.

GETTING AROUND THE BLACK BOX
The starting point of our research hypothesis was that COVID deaths per country are ultimately causally related to two groups of variables. First, we know the economic, health and social conditions in each country before the pandemic. We referred to all these data as ex-ante independent variables. Without exception, we used explanatory data for the year 2019 or before.
On the other hand, the measures implemented in each country to contain the transmission of the virus and to treat and save the lives of people who became ill obviously matter a lot, most likely more than the ex-ante variables. Without much intellectual effort, even average newspaper readers can think of hundreds of ex-post variables, where the ex-post restriction is meant to indicate that only what happened during the pandemic (i.e., after 1 January 2020) matters in the  Acta Oeconomica 71 (2021) S1, 53-71 explanation of the dependent variable(s). Schematically, our modelling alternatives look like this below: However, in our attempt to collect ex-post independent variables, we ended up with a confound of abundance. In retrospect, this was inevitable. 9 All countries of the world, including, of course, the countries selected in our sample, have applied a vast range of non-pharmaceutical interventions, but not at the same time and not with the same stringency. The most obvious example is the curfew, which was applied to some extent by most countries, but in some cases starting at 8 p.m., in others only at 10 p.m. or midnight. Some countries that hermetically sealed their borders for long, but others did so only selectively and from time to time. Even the recommended space of social distancing (1.5 m or 2.0 m) or the choice of face masks (cloth mask, surgical mask or N95 mask) varied from country to country and from time to time. In many countries, such as Germany and the United Kingdom, the regulations varied in many dimensions from one federal state to another and from one county to another, so that it is not possible to find representative variables for a single country. 10 The time elapsed since the beginning of the pandemic has been far from homogenic in at least three dimensions. 11 Firstly, the chances of infection, hospitalization and ultimately death were fundamentally different before and after the start of mass vaccination in each European country. Secondly, the hard fact is that there are currently no effective drugs targeting SARS-CoV-2 directly, specifically and effectively. As a second-best alternative, the research community and hundreds of clinicians worldwide are falling back on the existing repertoire of approved drugs to probe into their anti-coronavirus properties. Such practices, however, also vary within the countries. Thirdly, the SARS-Cov-2 virus, which causes the disease, cannot be considered as a single biological entity. As was quickly recognised, the virus, first identified in China, mutated and is still mutating at a dizzying rate. A good example is what happened on the Diamond Princess, a ship carrying mostly Japanese tourists in February 2020. One person carried the infection on board, where 400 of the 3,711 passengers and crew showed symptomatic indications (9.7%), and 9 of them died (2.4%), while at least 24 new mutations developed (Yeah -Contreras 2021). Ex-ante independent variables → Ex-post dependent variable(s) 9 Baum et al. (2021) have come to the same conclusion, when they tried to use the Global Health Security Index and the Epidemic Preparedness Index both of which had been computed a few months before the COVID outbreak to measure the preparedness of 195 countries of the world for epidemics or pandemics. As it turned out, while these studies correctly showed that the world as a whole was not prepared to combat "a fast-spreading respiratory disease", both studies failed to predict national COVID morbidity and mortality figures. In their ex-post study, the authors identified 10 factors which seemed to have contributed to this failure. Some of these factors will be shown in our study as well. 10 Within Japan, a large regional disparity in COVID mortality was observed. The ratio of mortality rates in the most and least affected territorial units was 83 to 1 during the first wave of the pandemic (Osaki et al. 2021). 11 WHO declared the outbreak of pandemic on 11 March 2020. At time of this paper's cut-off date (16 July 2021), the Delta variant in the US went from making up just 2% of the cases as recently as mid-May 2021 to 80% by mid-July of the same year. The UK experienced a similar, earlier rise, and it now accounts for over 99% of the analysed cases there. In Brazil, the Gamma variant, which was first identified there, accounted for 96% of the analysed cases. In Chile, in addition to Gamma, the Lambda variant made up a significant share of analysed cases, etc. By our cut-off date, the Alpha variantthe original Wuhan virushas been practically eradicated in most countries of the Earth. 12 In Fig. 2 below, these complicating matters are illustrated in more detail. The arrows show the "impossible strategy": to explain the variation of the dependent variables with the help of the ex-post independent variables (thick arrows), as well as "the getting around the black box" strategy, where the possible logical links leading from the ex-ante independent variables to the dependent variables are shown by the thinner arrows.
Taking into consideration of the above complications, we did not even try to identify with econometric tools directly the causality links between ex-post independent variables and our two dependent variables at a country-by-country level due to the vastly inhomogeneous content of the variables, such as the curfew (already mentioned) or the closing of schools, restaurants, shops, etc. When we decided to "get around the black box", we have committed ourselves to indirect estimations. Let us consider two examples. It makes sense to suppose that the geographical characteristics of the countries (i.e., being landlocked or accessible through maritime transport means) do matter for the spread of the coronavirus. In the same way, it makes sense to assume that the development of the healthcare sector (e.g., number of doctors) also matters. But there is a price to pay. We were convinced from the very beginning that our simple regressions will be capable in this indirect way, in the best case, explain a small part of the variation of the COVID-death disparities among 24 countries. In other words, even if we find statistically significant explanatory variables with ordinary least squares (OLS) regressions calculated always by one independent variable a time, it is unlikely that our regression equations will yield high R 2 values. But as John Maynard Keynes once allegedly quipped, "it is better to be roughly right than precisely wrong".

TESTING 28 INDEPENDENT VARIABLES ONE-BY-ONE
Altogether 28 independent variables were tested for 24 European countries. All data were taken from Eurostat, reflecting the state of affairs in 2019 (i.e., before the COVID pandemic) or the latest year before that. More than half of them (17 variables to be precise) did not prove to be significant from the perspective of our preferred "raw" dependent variable (see column 3 in Table 2), which was converted to its logarithmic value as is often the case in linear regression analysis. 13 Taking the logarithm was justified for multiple reasons. The distribution of Cumulative Deaths is not normal (see the histogram in Fig. 3 below). Getting closer to normal distribution was achieved by the logarithmic transformation, which increased the reliability of our estimates. 12 Our World in Data: Biweekly digest, 16 July 2021. 13 In this regard, we followed the technique applied in Osaki et al. (2011).  Fig. 2.
By taking the logarithm, the numbers shrunk in absolute value, closer to the magnitudes of our independent variables. In addition, this procedure can eliminate heteroscedasticity. As a result, the interpretation of the estimated coefficients of the independent variables changed to per cent. The general form of our ordinary least squares (OLS) model is as follows: The list of the tested variables and the first results of the variable-by-variable regression analyses are presented in Table 3.
In selecting independent variables, special attention was paid to pick indicators associated with chronic illnesses, because clinical experience in the European countries showed that comorbidities such as hypertension increase the likelihood of fatal COVID outcomes. It is also known that the severity of COVID is higher among smokers, obese men and women; living in nursing homes or staying in hospital for a long time for any reason increases the chances of COVID transmission. This linkageat least in Europeappears to be so strong that the Density of Hospital Beds (Variable 6) does not appear to have a negative association with COVID deaths, as we originally thought. The abundance of hospital beds seems to be the consequence of the aging population and the high share of people with chronic diseases.
The most significant variable was Consumption of Non-Prescribed Medicine 14 (28), the R 2 of which was 0.58 (meaning that the simple regression equation explains 58% of the dependent variable's variation). The independent variable with the second-highest explanatory power was Regular Exercising (24) with 0.51. Furthermore, two other healthy lifestyle-related variables had R 2 s above 0.2: Consumption of Prescribed Medicine 15 (27) (0.23) and Alcohol Consumption (23) (0.27). The high R 2 values of these four variables reflect the increased health consciousness of a considerable proportion of the European populationpeople with higher education and higher incomes. So, a health-conscientious life pre-COVID was not in vain. At the same time, variables   Notes: *: Opposite to expectation, #: Strong association with chronic diseases. Source: Authors' collection of data predominantly from Eurostat publications.

Obesity (21) and
Smoking (22) proved to be unimportant in explaining the differences amongthe country variation of COVID's death burden. Simply said, it seems that among the highly developed sample countries, there is no divergence: less and less people smoke, and more and more people are obese everywhere. 16 This logic, however, does not apply to the Median Age variable (R 2 5 0.27), which is a demographic reality and not a matter of conscientious health behaviour. What we found here is in line with the dominant view in the literature: as age increases, so does the probability of COVID mortality. It was perhaps even more surprising that the variables, such as Density of Physicians (4) and Health Expenditure per Capita (3), turned out to be statistically insignificant. If these results are at least "roughly right"to be in line with Keynesspending more money on health and healthcare were not the deciding factor when the so-far unknown virus arrived at the borders of our sample countries. 17 The economic development level reflected in the customary GDP per Head (1) figures were significant, but its estimated coefficients were negligibly small.
One of the probable lessons of the present pandemic is that the availability of trained nurses in a country's healthcare system is beneficial both in "normal" times and during a pandemic. It improves the comfort of hospitalized patients and alleviates the burden on the shoulders of the overburdened clinicians. The high R 2 value (0.37) of the Density of Nurses (5) variable thus can be explained.
The two most important independent variables were also plotted against our dependent variable on two separate scatter graphs to see whether the countries of the study form distinct groups in the upper left and lower right quadrants as separated by the median values of the two selected variables. And this seems to be the case. In Fig. 4 for examplethe quadrant division clearly confirms our linear OLS-regression results above. Health-conscientious countries (here represented by Non-Prescribed Medicine Consumption (28)) performed better in terms of COVID deaths. Out of 19 countries, 7 unequivocally fell into the upper left quadrant (these are the poorly performing countries) while another 7 falls into the lower right quadrant (the countries with relatively low COVID death figures). Four Nordic countries -Iceland, Denmark, Norway and Finlandhave squarely outperformed the rest of the sample. It sounds meaningful to conclude that the relatively high level of the use of non-prescribed medicines is a good indicator of health literacy, healthy lifestyle (but also higher incomes). A very similar message can be read from Fig. 5, where the relationship between COVID-related deaths and the Regular Exercising (24) variable is shown.

TOWARDS BUILDING A COMBINED MODEL WITH MULTIPLE REGRESSIONS
The best model had an adjusted-R 2 of 0.76 with two ex-ante independent variables, both of which are significant at 5%: Non-Prescribed Medicine Use and Number of Nurses. Two other models were also found to hold considerable explanatory powers but only at the 10% 16 It is noteworthy that according to our preliminary (unpublished) analysis of a worldwide sample of 109 countries obesity and smoking are two out of the three most significant variables. 17 In fact, the Baum et al. (2021) study already arrived at the same conclusion.  Acta Oeconomica 71 (2021) S1, 53-71 significance level. Adding Alcohol Consumption to the above-mentioned variables increased the adjusted-R 2 to 0.81 and doing the same with Prescribed Medicine Use gave almost the same results. 18 The output tables of the three best models are presented below.

The best model
The best model explains 0.76 of the variability of Log Cumulative COVID Deaths. Both independent variables are significant at the standard level. The estimated coefficient for the variable Density of Nurses (5) shows that having one more nurse decreases cumulative COVID deaths by 13%. Similarly, one more unit Consumption of Non-Prescribed Medicine (28) decreases cumulative deaths by 4%.

The second-best model
The second-best model of our study has the explanatory power of 0.81. Two out of three variablesnamely variables (5) and (28)are significant at 5%, while the third variable, Consumption of Prescribed Medicine (27) is significant only at 10%. Similarly to our best model, the estimated coefficients for variables (5) and (28) are negative and are of almost the same magnitude as in the best model. At the same time, variable (27) has the opposite effect, meaning 18 The Akaike Information Criterion of the three best models above in order of mention were as follows: 34.20, 31.04 and 31.91. that consuming one more unit (per cent) of prescribed medicine leads to 2% more COVID deaths. This is consistent with the following interpretation: a higher level of prescribed medicine consumption isto a large extentthe reflection of a higher prevalence of chronic diseases, which in turn is one of the circumstances increasing the probability of death in case of an eventual COVID infection.

The third-best model
Finally, the third-best model has an R 2 of 0.81 and had three independent variables just like the second-best model. Variables (5) and (28) are significant at 5% and have similar estimated coefficients to the two models presented before. The third independent variable is Alcohol Consumption (23), which is only significant at 10% and its estimated coefficient suggests that consuming one more litre of pure alcohol increases cumulative COVID deaths by 14%.

CONCLUSION AND SUMMARY
This paper explains the country differences of COVID mortality rates in 24 European countries with econometric tools from the beginning of the pandemic until our cut-off date for information (16 July 2021). Since the quantification of the various country-specific anti-COVID health policy actions is impossible, we explain MRs with available, reliable, ex-ante economic, health and social indicators pertaining to the year 2019i.e., before the outbreak of the pandemic.
We built three models. Our best model with two ex-ante independent variables explains 0.76 of the variability of our ex-post dependent variable, the logarithm of Cumulative COVID Deaths. The estimated coefficient for the variable Density of Nurses shows that having one more nurse per 1,000 of population decreases cumulative COVID deaths by almost 15%. Similarly, one more unit Consumption of Non-Prescribed Medicine decreases cumulative deaths by 5%. All in all, it seems that until now those European countries were successful in minimising the fatalities where the population had a high level of health literacy, pursue healthier lifestyle and the healthcare systems worked with a relatively large nursing force already prior to COVID.
Our econometric estimates could be improved by the inclusion of a larger number of countries, but the trade-off appears to be high for us. If we went beyond the broadly defined borders of the European Union and tried to include 20-40 less developed countries, our main dependent variable used in this paper, the cumulative number of COVID-related death figures would become less and less reliable. As we already mentioned above, outside of Europe, the real number of COVID death could be 2-3 times higher than the officially reported data. Thus, mixing reliable and unreliable data would bound to lead (paraphrasing the words of Keynes) to "precisely wrong" estimates.
The monthly changes of excess mortality during the COVID pandemic in selected European countries, % Source: Eurostat, accessed on July 16, 2021.