Academia.eduAcademia.edu

Predicting COVID-19 Evolution during Mid-March Crisis

2020

The corona virus responsible for COVID-19 has come into our lives with huge stampede. Every human activity has been seriously hurt and millions were confined to their homes. As of March, people in Europe wonder whether the confinement, closures and no-flights policies are effective or how effective they are, in spite of the positive previous example of China. In this paper we present our analysis specifically focused at detecting whether the new daily cases curves are in a stabilization route or exploding. This required a set of steps for data processing and analysis that we describe in detail. The conclusion is that, as of 22 March, the curves were in a trajectory of stabilization and possible decrease soon. We show why, also finding a most probable correlation with confinement and other government policies.

Predicting COVID-19 Evolution during Mid-March Crisis Pedro Furtado University of Coimbra Polo II, Coimbra, Portugal +35123700000 [email protected] inoculated by vaccination, but until one or the other happens this experience has proven that a pandemic such as Covid-19 can cause mayhem. ABSTRACT The corona virus responsible for COVID-19 has come into our lives with huge stampede. Every human activity has been seriously hurt and millions were confined to their homes. As of March, people in Europe wonder whether the confinement, closures and no-flights policies are effective or how effective they are, in spite of the positive previous example of China. In this paper we present our analysis specifically focused at detecting whether the new daily cases curves are in a stabilization route or exploding. This required a set of steps for data processing and analysis that we describe in detail. The conclusion is that, as of 22 March, the curves were in a trajectory of stabilization and possible decrease soon. We show why, also finding a most probable correlation with confinement and other government policies. Estimates of the incubation period - the time between infection and the onset of symptoms - range from 1 to 14 days, most infected people showing symptoms within five to six days. One of the problems of Covid-19 was that asymptomatic carriers infect others before they even realize they are infected, and since the virus spreads easily and there is no immunity in humans to that new strain. This way the virus has been able to spread all over the world and by mid-February the situation started to feel out of control. 1.1 The Meaning of ro The reproduction number (ro) is a single number with a significant meaning. It helps health organizations determine if the outbreak will spread. If ro is greater than 1, then the disease will probably spread. It is the average number of people who will be infected by a single infected person. As an example, a ro of 2 means each infected person will in average infect two other persons. In spite of being so relevant, ro is hard to determine. Several groups used different methods to try to determine ro for SARS-CoV-2, their estimates varying mostly within the range of 2 to 3. CCS Concepts • Applied computing➝Health informatics Keywords Epidemiology; Data analysis; Analysis of variation 1. INTRODUCTION In January the world woke up to the news that China had a new outbreak of a deadly virus. The novel coronavirus, identified by Chinese authorities on January 7 and since named SARS-CoV-2, is a new strain that had not been previously identified in humans. Signs of infection include fever, cough, shortness of breath and breathing difficulties. In more severe cases, it can lead to pneumonia, multiple organ failure and even death, affecting particularly older people and people with other health-related conditions. Covid-19 death rate was estimated to be around 3% and the reproduction rate somewhere between 2 and 3. At those initial times, the virus was already a potential threat to the rest of the world, however governments and health officials in most countries were slow to act because there was no idea that it would spread as it did. At around the beginning of February, patients started testing positive for Covid-19 in Italy, and in just one month and a half the virus spread quickly to other places in Italy, other European countries and to the rest of the world, and the daily numbers of new cases and deaths followed an exponential growth curve in several places. Initially, as the virus started spreading, governments and health authorities were slow to react, probably due to the potential huge damage that action would bring to the economy. In particular, flights continued unabated and some countries had no significant checks or quarantine requirements. Even testing for the disease was restricted, initially. Of course, this all changed as soon as the numbers skyrocketed and as everyone watched on TV hospitals overcrowded in Italy, health staff completely desperate and double daily death figures such as more than 500 just in Italy. A new virus may be especially dangerous to humans if it is simultaneously sufficiently virulent and deadly and there is a lack of immunity. Immunization eventually happens as the virus infects a vast majority of the population or when the population is One after the other, countries started closing schools, bars, restaurants, non-essential shops, social gatherings, flights and other activities, until finally states of exception were raised and people confined to their homes as much as possible. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. This work is licensed under a Creative Commons Attribution International 4.0 License For all other uses, contact the Owner/Author. ICBBT 2020, May 22–24, 2020, Xi'an, China © 2020 Copyright is held by the owner/author(s). ACM ISBN 978-1-4503-7571-9/20/05. Our current work was motivated precisely by the danger posed by the exponential growth of the SARS-CoV-2 virus among the population. Our goal was to analyze the available data to understand whether the current government actions are going to have relevant effects. This is important since the number of daily cases and the number of total cases is seen increasing, therefore people who are confined may think that the measures are not being well-succeeded, and that in turn can make them less careful DOI:https://doi.org/10.1145/3405758.3405771 1 about confinement and other important actions such as washing their hands frequently after arriving home and so on. We take a data-driven approach to study the initial exponential growth of the SARS-CoV-2 virus in the helpless society, and the effect of social isolation and other mechanisms in turning that terrible evolution into a sigmoid or logarithmic curve. 3. MODELING GROWTH AND RELIEF 3.1 Exponential and Sigmoid Growth Mathematical models can be used to accurately describe changes in a population and to predict future changes as well. In population growth theory, r (per-capita rate of increase or rate of growth) is used to define population growth as being either exponential, if r is constant, or logistic (a.k.a. logarithmic) if r decreases as the population grows. Given N, the population size, equation 1 describes exponential growth, while equation 2 describes logistic growth, 2. RELATED WORK The mathematics of infectious diseases is reviewed in work [7], and references [3], [1], [4] and [2] are textbooks on the subject. The dynamics of an epidemic can be modelled as a set of differential equations, and one of the most famous initial models is SIR, by Kermack and McKendrick [8]. The mnemonic SIR stands for Susceptible (S), Infectious (I) and Recovered (R). Given the three possible states (S, I and R), they are organized sequentially (S->I->R) such that individuals transition between those states, and the differential equations describe the change in the stock of each state per unit of time. Naturally, the sum of changes of S, I and R is 0, since the individuals transition between states (one more individual in one state corresponds to one less in the previous state. Between S and I the transition rate is βI, where β is the average number of contacts per person per time, multiplied by the probability of disease transmission in a contact between a susceptible and an infectious subject. Between I and R the transition rate is γ (number of recovered or dead during one time unit divided by the total number of infected on that same time interval). If the duration of the infection is denoted by D, then γ = 1/D, since an individual experiences one recovery in D units of time. It is assumed that the permanence of each single subject in the epidemic states is a random variable with exponential distribution. With these differential equations it is possible to simulate the dynamics of a specific epidemic. In [5] the authors studied analytical solutions to the SIR model with equal death and birth rates. SIR has given rise to numerous other derived models. As an example, in [6] the authors present and analyze three basic epidemiological models. dN/dt=r N (1) dN/dt=r (k-N)/k N (2) In (2) k is the carrying capacity, the maximum population size that a specific environment can support. The idea is that, in order to reproduce, a population needs resources, such as energy and other nutrients, therefore the growth will be restricted by such carrying capacity. Figure 1 illustrates the “daily rate of change” (dN/dt) and the population size (N) of an exponential growth distribution with growth rate factor r of 3. Figure 2 shows a logistic growth with the same r and k, the carrying capacity, equal to 25000. The meaning of this 25000 is that the environment can only provide for as many as 25000 specimens. Note that in Figure 2 the curve initially looks like exponential growth but, as the population size nears k (25000) the rate or increase decreases significantly. Note also that the equation for the total population size at any time in exponential growth is, N=no(r+1)t (3) In this formula no is an initial number of cases. SIR and other related models can be applied to Covid-19 and simulation ensues however the work we present here focuses solely on a data-driven analysis of the rates of growth, analyzing the countries and trying to understand whether they are evolving positively in terms of containing the current outbreak. Consequently, a simpler derivative model analyzing only the rate of change of new cases is sufficient for our purposes. Another concept very much related to our work is the concept of exponential and sigmoid growth. Exponential growth describes a possible way that a variable may increase over time. As an example of use, in [9] the authors study how cells can grow exponentially at a constant rate while remodeling their metabolism and gene expression. In exponential growth the rate of change is proportional to the value itself. The function that models exponential growth is an exponential function of time (i.e. time is the exponent). Exponential growth is important in this work because a virus typically will spread exponentially if no artificial immunization is available and the reproduction rate is high enough, as each infected person can infect multiple other persons. Our current work was motivated precisely by the danger posed by the exponential growth of the Covid-19 virus among the population. We take a data-driven approach to study the initial exponential growth of the Covid virus in the helpless society, and we analyze curves trying to find whether the measures and public alarm contribute to ease the spread. In short, what is the potential effect of social isolation and other mechanisms in turning that terrible evolution into a sigmoid or logarithmic curve. Figure 1. Exponential growth example (r=3). Figure 2. Logistic growth example (r=3). Our objective in the case of Covid-19 is that the rate of increase, initially an exponential-shaped curve, becomes a sigmoid. In our 2 that a sigmoid curve is happening anytime soon, it means the pandemic is probably going to recede, at least partially. This simple analysis needs to be done for each country, which we did, however in this work we only illustrate for a few countries for size reasons; case this is achieved by social isolation, confinement and closures, to break the chain of transmission and therefore decrease the environment conditions favorable to the spread. 3.2 Turning the Tide The first objective of epidemic containment should be to turn the exponential growth into a sigmoid or logarithmic curve as soon as possible, through either immunization, which is currently unavailable for the Covid-19 pandemia (as of March 2019), or by resorting to the only viable existing alternative, which is social isolation and confinement. Social isolation and confinement intends to decrease the r (per-capita transmission rate) and is achieved by closing most activities, schools and industry included, except those that are absolutely necessary, such as hospitals, pharmacies, food production and supermarkets. If r is less than 1, the epidemy is contained and we can see a decrease of those infected along time. Figure 3 shows the three moments, (1) exponential growth, (2) logarithmic growth, and (3) the decrease in those infected. 5. Analysis of potential correlation of government measures (e.g. confinements, closures, other) with evolution of the daily rate. Using an experimental approach and the steps described above, we tested several data possible sequences of data transformations and concluded for a specific sequence of transformations over the series of daily new cases, which is our choice of main input for the analysis: a. Smoothing daily new cases – the raw data is not smooth, because of some factors. As an example, in many countries some days have many more tests than others, and sometimes there is a delay in reporting (which means that specific daily quantities may be accounted for in posterior days). As a result, we observed that the data has oscillations that need to be smoothed by preprocessing. Moving averages over the daily numbers allows these effects to be smoothed; b. Obtaining the series of daily rates of change – we obtain the series of daily rates of change (dC/dt) from the moving averages, where C is the number of cases in each day. This way we are able to analyze how the daily rate of change is evolving. The dC/dt is obtained as Ci-Ci-1/Ci-1 for each day, and results in a sequence of rates of change. Our objective is to observe whether rates or change are decreasing as governments impose measures; Figure 3. Turning exponential growth curve into sigmoid. c. Cleaning the series of daily rates of change per country since Ci-1 is 0 before the first case happens, the first non-zero daily rate of change would be infinite (divided by Ci-1=0). Additionally, while there are too few cases (e.g. 3) the daily rates of change are very high (e.g. another 3 new infected persons is a rate of change of 100%, even though it represents only 3 new patients). Since we are analyzing the rates of changes, it is important to not consider these initial huge changes by filtering them out; 3.3 Data-driven Approach Our data-driven approach intends to evaluate whether containment is being well-succeeded by analyzing the data. Datadriven approaches are possible today mainly because data is updated almost in real-time. In our case we have daily data on the number of new cases (infected), deaths and recoveries all over the world, organized by country. Our methodology consisted on the following phases: d. Further smoothing of the daily rates of change prior to analysis – even after smoothing the initial series of daily new cases (a), the calculated daily rates of change (b) still had some degree of oscillation that we smoothed by applying moving averages on that series. This removed abrupt inter-day variations are smoothed to ease the job of analysis further; 1. Data collection: data collection involves accessing the sources and extracting the relevant daily data about Covid-19, e.g. through web-scrapping; e. Analysis of the curves – In this step we analyzed the daily rate of change series for each country. In this work we provide a few examples and show how we did it, also concluding for those countries. 2. Organizing the data: the data is organized as time series along the days since the start of the pandemic. We have detailed data from each country that was affected; 3. Pre-processing of the data: in this step we apply a set of functions to clean the data and to prepare it for analysis. Note that this is one of the most important steps, because there are a number of uncertainties in the way the various countries handle the process. In particular, countries may do more or less testing of suspect infections, and we notice that some days there are many more cases reported than others, therefore we need to clean the data; 4. CLEANING AND TRANSFORMATIONS The countries outbreak evolution daily raw data is frequently dirty, in the sense that the number of new cases can vary widely from one day to the next. Figure 4 shows this effect. Here we show daily cases per country in February and March for a subset of countries. 4.1 Moving Average of Daily New Cases 4. Data transformation: in the data transformation step we transform the input data into actionable knowledge. In this case we concentrate in a single detail, the curbing of the exponential growth. Therefore, we planned the appropriate key analysis to evaluate this factor, taking into consideration that if we can detect Figure 5 shows the result of applying moving averages over the countries data. To obtain this result we applied a moving average di’=[di-1+di+di+1]/3, where d is the original series and d’ is the new series. From comparison with Figure 4 we can conclude that some 3 sudden variations that are actually an artifact of the test and report process were smoothed. 6. REMOVING TRAILING RATES OF CHANGE This step, which we denoted as truncate (TRUNC), removes the first peak of the series of daily rates of change. Given a series d where i0 is the index of the first non-zero case, the algorithm zeroes all values from i0 to the first index i1 with both more than nD daily cases and rate of change below a specified threshold Thr (these two parameters must be given). Figure 7 shows the new version of the series of Figure 6 after applying the algorithm with nD=25 and Thr=1 (these parameter mean to zero elements before the first instance of more than 25 daily cases occurs and then after the daily rate of change decreased below 100%). Note that in Figure 7 the maximum rate of change is 2 (200%), whereas in Figure 6 it was 17.5 (1750%!). Figure 4. Daily cases for a set of countries. Figure 7. Truncated daily rates of change for countries. Figure 5. M-Averaged daily cases for countries. 7. ANALYSIS OF COUNTRIES After the previous steps the dataset became ready for country analysis. It is not possible to show here the analysis for all 20 considered countries, instead we concentrate in a few illustrating examples (e.g. we have chosen Italy because it one of the epicenters). Figure 8, Figure 9 and Figure 10 show the analysis for Italy, Spain and USA respectively. The next section will correlate the evolutions with measures taken by the governments and conclude. 5. COMPUTING THE DAILY RATES OF CHANGE Given the series of daily new cases of a country d, the daily rates of change is a series given by (di’=di-di-1). Figure 6 shows the daily rates of change for the same countries of Figure 5. Note that this result is not ready for analysis of rate of change because, for at least some countries, there is an initial huge value for the rate of change. This is an expected artifact of computing a relative value since, as we explained previously, a change from 0 to 3 is an infinite increase (because the initial value was zero), and a change from 3 to 6 is a 100% increase. In most of the countries analyzed in our study, after one or more peaks, the daily rate of change is already decreasing or inverting its tendency and starting to decrease, but in most cases it is still above 0. The observed rate decrease can possibly be correlated with social distancing, since the disease spread has triggered a huge alarm and authorities have been very vocal regarding the danger and the need to take shelter. Posteriorly, authorities acted also by closing non-essential institutions and making social closure and confinement mandatory. The decrease in daily rate of change is also a sign of sigmoid behavior of new daily cases, an important improvement of the conditions. It means that, looking at Figure 3, we are in the sigmoid phase for most countries. In the future, as soon as this rate turns below 0, it means that the total number of cases starts decreasing, the third phase in Figure 3. According to these values, as of March 22, the USA still had a rate of daily increase of new cases of around 25%, as is visible in Figure 10, while Italy was around 8% as visible in Figure 8 and Spain was around 6%, as visible in Figure 9. Figure 6. Daily rates of change for countries. 4 Figure 11. Daily rates of change for China. Figure 8. Daily rates of change for Italy. The case of Spain showed in Figure 13 shows significant increase until around the 11th March. The public alarm that the rapid increase in cases and deaths brought, alerts by the government and health officials and the start of government actions to restrict and close are very plausible causes for the start of the decrease in the rate of change from 11th March on. Then on March 14 the Spain government declared state of alarm and confinement. The Figure shows that the rate of daily change decreased very significantly until the March 22 value of 6%, and was expected to continue decreasing as of March 22. The daily rates of change for China, the first country affected, is also shown in Figure 11. 8. CORRELATING WITH EVENTS In this section we resort to analysis of policy milestone events to help analyze the evolution further. In Figure 12 we can see that, in the case of Italy, February 25 marked the closure of 11 towns in the north of Italy, which were the main focus of the outbreak (confining 50K people). The rate of change still increased from 30 to 35% for 2 days, but then inverted its evolution to decrease at a reasonably fast rate. Later on, when the curve was flattening at around 20% further political actions, first the north confinement (March 08) and then confinement of the whole country (March 11) are the most probable cause for further and faster decrease to the current 8% daily change rate. As of March 22, Italy had a daily rate of new cases above 5000, and still increasing (the 8% rate of increase), but this rate of increase was slowing and, in a trajectory, to soon cross the 0 axis and start decreasing the number of new daily cases). Figure 12. daily rates change correlated to events-Italy. Figure 9. Daily rates of change for Spain. Figure 13. daily rates change correlated to events-Spain. In both cases discussed above (Figure 12 and Figure 13) it may seem surprising that the curves are already starting to go down when the most stringent government measures kick-in. However, one must consider that, some days before commanding closures and lockdowns, the governments were already asking people to say at home, quarantining people and other strong sanitary actions were being introduced on a daily basis, also very quickly raising public awareness. In essence, people also got scared and decreased their social contacts and actions drastically even before lockdowns. Figure 10. Daily rates of change for USA. 5 The case of China, shown in Figure 11, is more difficult to analyze because, at a certain point in time, there were some adjustments in the number of infected. We can assume that, if that number is moved earlier, the causality between lockdown and improvement also becomes apparent. and Evolution. Oxford: Oxford University Press. ISBN 0-19856585-2. [2] Anderson, R. M. (1982). Population Dynamics of Infectious Diseases: Theory and Applications. London-New York: Chapman and Hall. ISBN 0-412-21610-8. 9. BRIEF ON ANALYSIS OF OTHER COUNTRIES [3] Bailey, N. T. (1975). The mathematical theory of infectious diseases and its applications (2nd ed.). London: Griffin. ISBN 0-85264-231-8. We have analyzed more than 20 countries using the same approach shown for the illustrating examples, most evidence pointing also to additional confirmation of the relevant link between government measures, also public alarm and significant reduction of the rate of increase, and later on actual decrease of the daily rate. Our future work on the issue will include forecasting the future curve to determine the expected eradication of the disease in each country. [4] Brauer, F.; Castillo-Chávez, C. (2001). Mathematical Models in Population Biology and Epidemiology. NY: Springer. ISBN 0-387-98902-1. [5] Harko, T., Lobo, F. S., Mak, M. K. (2014). Exact analytical solutions of the Susceptible-Infected-Recovered (SIR) epidemic model and of the SIR model with equal death and birth rates. Applied Mathematics and Computation. 236: 184–194. doi:10.1016/j.amc.2014.03.030. Finally, note that the analysis is only as good as the raw data that is collected. There are a lot of factors that impact the veracity of the raw data itself, e.g. degree of availability of testing for Covid19 in a certain country, wrong diagnosis, delays in reporting and many other details. [6] Hethcote, H. W. (1989). "Three Basic Epidemiological Models". Applied Mathematical Ecology. Biomathematics. 18. Berlin: Springer. pp. 119–144. doi:10.1007/978-3-64261317-3_5. ISBN 3-540-19465-7. [7] Hethcote H (2000). "The Mathematics of Infectious Diseases". SIAM Review. 42 (4): 599–653. doi:10.1137/s0036144500371907. 10. CONCLUSIONS In this work we proposed a data-wise approach to analyze the evolution of Covid-19 cases in the current pandemic in several countries. We have shown how we prepared and analyzed the data and, most importantly, a degree of correlation of the data with important government measures such as mandatory confinement and closures. The results provide important evidence to the relevance of those measures, the slowing of the exponential growth and the future decay trajectory of the daily new cases. [8] Kermack, W. O.; McKendrick, A. G. (1927). A Contribution to the Mathematical Theory of Epidemics. Proceedings of the Royal Society A. 115 (772): 700–721. doi:10.1098/rspa.1927.0118. [9] Slavov, N., Budnik, B. A. Schwab, D., Airoldi, E. M., van Oudenaarden, A. (2014). Constant Growth Rate Can Be Supported by Decreasing Energy Flux and Increasing Aerobic Glycolysis. Cell Reports. 7 (3): 705–714. doi:10.1016/j.celrep.2014.03.057. ISSN 2211-1247. 11. REFERENCES [1] Altizer, S., Nunn, C. (2006). Infectious diseases in primates: behavior, ecology and evolution. Oxford Series in Ecology 6