Ricochet is the best place on the internet to discuss the issues of the day, either through commenting on posts or writing your own for our active and dynamic community in a fully moderated environment. In addition, the Ricochet Audio Network offers over 50 original podcasts with new episodes released every day.
There’s a saying that I learned back in the 1980s while studying probability, statistics, and mathematical modeling: “Torture numbers enough and they’ll confess to anything.” It was right up there with “correlation does not imply causation” and “GIGO.” (GIGO stands for garbage in, garbage out.)
As most of you know by now, I’ve been skeptical of the catastrophic projections of the expected progression of the WuFlu. I do not think that the figures presented are an intentional “hoax,” though I suspect that some people and institutions, particularly the major media, have an ideological reason to exaggerate the danger. But I suspect that the bigger problem is the limited amount of information presently available, even to the most sophisticated modelers.
Mathematical modeling is tricky. It is sensitive to a number of choices. The first choice is the mathematical formula selected to model the phenomenon. Typically, there are a great many formulas that might be selected. The second choice – or more often, the second and third and fourth and fifth – are the values selected for various parameters of the mathematical formula.
I’m going to present several hypothetical models of WuFlu spread, to demonstrate how difficult it is to differentiate between such models in the early period. These examples are not intended to be an accurate prediction of the actual progress of the WuFlu. They are intended to demonstrate how it is possible to select several alternative formulas, adjust the parameters so that each appears to be a good fit in the early period, and yet generate predictions that can vary enormously in as little as 2-4 weeks.
I hope that this will be of interest to some of you.
I. What We Expect
A general rule of epidemiology is that disease outbreaks follow an “S-Curve.” This is called Farr’s Law (here and here are two papers describing the rule; I’ll add a brief technical discussion in the comments). If you want to impress your friends, you can use the 50-cent term for an S-Curve, which is “Sigmoid Function.”
Another model assumes “exponential growth,” which means a constant rate of daily increase. For example, an exponential growth model may increase by 10% daily, or 33% daily, or more.
It turns out that, in the early period of many S-Curves, the graph is pretty close to an exponential growth curve. Here is an example:
Notice that in this example, the curves look quite similar until Day 36, then diverge sharply. The S-Curve, in this case, is calibrated to reach a final level of 500,000. I had to truncate the exponential growth curve at Day 41, or the S-Curve graph would have ended up looking like a straight line at the bottom of the graph. This is because, by Day 63 in this graph, the exponential growth function would be approximately 477 million, almost 1,000 times greater than this particular S-Curve.
The S-Curve is a generic term for curves shaped like the one in the graph above. There are several mathematical formulas that follow this general shape (some described at the Wikipedia entry for S-Curves, here).
II. An Example – Evaluation of Three Models
I’ve previously posted graphs showing reported WuFlu cases by country, with my most recent posts focusing on cases per million. In this example, I use the actual data for reported cases by country, per million, in Italy and the US, through yesterday’s reporting (Saturday, March 21, 2020). Each graph starts on the day when the country reaches 10 or more cases per million.
I’ve created three separate mathematical models of the increase in cases per million, which will be labeled Model 1, 2, and 3. I will provide further information about their characteristics later.
Note that these are simplified models for illustration purposes. A true model would be much more complex, and would account for variables such as the rate of infection, the degree of contact between the population, the time lag between infection and the onset of symptoms, and others.
A note on logarithmic scale: I’ll be showing some graphs in normal (linear) scale, and some in logarithmic scale. Logarithmic scale is a bit counter-intuitive, until you’re used to it. The trick is to notice that the increment between intervals on the vertical axis (the y-axis) is not fixed, but grows as you look higher up the axis. So instead of the vertical hash-marks being at 1, 2, 3, and so on, they are at 1, 10, 100, and so on.
The logarithmic scale is not meant to mislead, but it can be misleading if you are not accustomed to it. It has two advantages (at least) in examining phenomena with rapid growth: (1) it makes it easier to differentiate the curves in the early period (the left side of the graph), and (2) it makes it easier to discern whether growth is exponential, because exponential growth appears linear in a logarithmic scale. A disadvantage of logarithmic scaling is that it makes large differences in the late period (the right side of the graph) appear smaller than they actually are.
On to the models. Here is the graph for the first week, in logarithmic scale:
Notice that all 3 models seem pretty accurate in this period, with Model 2 notable for underpredicting in the first few days. Here is how the same information looks in linear scale:
The data in this graph is exactly the same as the first, but the scale is different. Notice that: (1) it is harder to differentiate between the curves in the first few days, and (2) the differences at the end of the week look a bit bigger than in the logarithmic graph.
Now let’s move forward to the second week. The data for the US ends after the first week, because we are only 7 days into the time series (i.e. the US first exceeded 10 cases per million on March 15). Here is the graph for the second week, in linear scale:
Notice that all 3 models are significantly over-predicting the course of the disease in Italy (dark blue). Model 2 is the worst, over-estimating by a factor of about 5. Model 1 is the best, but even this model is about twice the actual figure.
Lesson 1: A model that looks accurate for the first week can be
quite inaccurate just one week later.
On to the third week. Here is the graph, first in logarithmic scale:
Notice how things have changed. All three models continue to significantly over-predict the course of the disease in Italy (dark blue), but Model 2 is curving downward, and is now the most accurate after three weeks, though it was the least accurate after two weeks.
Also notice how all three models don’t appear to wildly overstate the actual case number in Italy – because we’ve switched to a logarithmic scale. Here is how the same data looks in linear scale:
In the linear scale, the huge divergence between Italy’s actual reports, and all three models, becomes clear. Model 2 is now the best, but is about 3 times higher than the actual figure. Model 3 is the worst, approximately 7 times the actual figure, with Model 1 in the middle.
Lesson 2: The model that looked the worst last week can
look the best just one week later.
Three weeks is about all of the data that we have for Italy so far. But let’s run the projections forward for another three weeks, to week 6. Here is how it looks in logarithmic scale:
Notice how Model 2 has leveled off, and Model 1 has grown to surpass Model 3 (on day 36). Italy’s trend line is still below all three models, and curving down slightly, though it appears to be on a path to surpass Model 2 in the next few days. Still, in the logarithmic scale, none of the models look wildly incorrect.
But look at it in the linear scale:
In this scale, the extent of the vast over-estimates generated by Model 1 and Model 3 is apparent. Notice that Model 1 is now predicting about 1.2 million cases per million – in other words, 120% of the population has had the WuFlu after 6 weeks. (This is theoretically possible, I suppose, if it turns out that there are high levels of re-infection, but it seems extremely unlikely.)
Notice further that the projections generated by Model 1 and Model 3 are so huge, by the end of Week 6, that the result of Model 2 appears to be a flat line at the bottom of the graph, and the actual reported figures in the US and Italy are not even visible. By the end of 6 weeks, Model 1 projects about 600 times as many cases as Model 2, and Model 3 projects about 250 times as many cases as Model 2.
Lesson 3: Projections over periods as short as 6 weeks can lead to drastically different
results, which can easily be wrong by a factor of 100 or even 1,000.
III. How The Magic Trick Is Done
Model 1 is an exponential growth model, assuming a daily growth rate of 33%. This is the number that has been pretty widely used in a number of media sources. It is pretty accurate in the very early period, and the US remains above this trend line after the first 7 days. But Model 1 leads to an enormous estimate of the number of cases after just 6 weeks – about 120% of the entire population. Moreover, it is already quite wrong as to Italy – Model 1 projects that Italy would have over 7,000 cases (per million) yesterday, while the true figure is about 885 cases (per million). That is about 8 times the number of cases that Italy has actually reported, after about 3 ½ weeks.
Models 2 and 3 are examples of the generalized logistic function (Wikipedia entry here), which takes the form:
If that makes your eyes glaze over, I don’t blame you. Notice that there are six parameters that can be adjusted (labeled A, B, C, K, Q, and v). Each of these parameters can be adjusted independently. The parameter K is the upper asymptote (when C=1, which I assumed in these two models). Thus, I was able to input, into my formula, a pre-determined maximum number of cases (per million). The independent variable is t (time), and the “e” in the formula is not a parameter, but is the transcendental number e (about 2.7182).
Model 2 was designed to have an upper bound of 2,000 cases per million (i.e. 0.2% of the population). Remember that this was the model that gave the greatest overestimate of actual cases (in Italy) after the first two weeks.
Model 3 was designed to have an upper bound of 500,000 cases per million (i.e. 50% of the population).
It was pretty easy for me to adjust the parameters of these models to be fairly accurate over the first week or two, compared to actual reported cases in Italy.
IV. What Difference Does It Make
The problem, at present, is that we are being presented with quite alarming projections of the progress of the WuFlu. It appears that our President, and other leaders, are making major decisions on the basis of such information. Specifically, there was a report released by Imperial College London (here), which included a projection of approximately 2.2 million deaths in the US (page 7). It predicted “deaths per day per 100,000 population” in a graph (page 7), with this prediction passing 5 around mid-May, and peaking at about 17 in early June (you have to estimate these figures from the graph, which will be shown a bit further on).
With a total US population of about 330 million, this implies about 16,500 deaths per day beginning about 8 weeks after release of the report (7 weeks from now), peaking at about 56,000 deaths per day about 11 weeks after release of the report (10 weeks from now).
Is this at all realistic? We have no idea. I don’t think that the doctors who performed the study have any idea. But – they can plug certain assumptions into a model, and out come the results.
It would be nice if the Imperial College report provided a prediction about the daily number of new reported cases, which would allow us to assess, over time, whether reality is following the predictions. The report does predict that 81% of the US population would be infected (page 6), though it did not state when this would occur. The graph shows the number of daily deaths will be declining by mid-June, so presumably, we will be approaching the 81% infection rate by that time – about 90 days from now.
The Imperial College model is much more sophisticated than mine – but it has an even greater number of parameters, none of which are known with confidence. Small changes to any of those parameters might cause huge swings in the predicted spread of the disease.
As an example, I present my Model 4. This is another generalized logistic function, tweaked to accomplish 2 things: (1) to match the disease progression in Italy thus far, and (2) to achieve the 81% ultimate infection rate predicted by the Imperial College report in approximately 90-120 days.
Here are the graphs, starting with the first 24 days (through yesterday, March 21, for Italy):
Is that a great fit or what?
Now here is the projection through 14 weeks – which will be around June 21 in the US:
Remember that this graph is total cases per million, and the model is designed to reach 810,000. With a population of about 330 million, this predicts about 267 million cases over the next 3 months or so. As of yesterday (March 21), there were about 27,000 reported cases in the US and about 54,000 in Italy. Notice that you can’t even see the figures for Italy (data for 24 days) or the US (data for 7 days), because they cannot be distinguished from zero on this scale.
This leads to our final lesson:
Lesson 4: In most complicated mathematical models, you can tweak the
parameters to show almost any result that you want.
As a final demonstration that my Model 4 is quite similar to the Imperial College report, I’ve graphed the expected number of daily deaths. Again, their model is more complex than mine, but my simple assumptions are: 14-day lag between case onset and death; 0.82% death rate. Here is my graph, and the one from the Imperial College report (page 7, Fig 1A), side-by-side:
Pretty uncanny, isn’t it? My graph shows peak death rate a bit higher than the Imperial College report for the US (though very similar for the UK). The total number of estimated deaths in both models is the same, 2.2 million (for the US). Note that this graph shows total daily deaths per 100,000 — so you can multiply that graph by about 3,300 to get the actual projections, which peak at around 75,000 per day in my model and about 56,000 per day in theirs.
The point of this post is not to make projections about the spread of the WuFlu, or of the ultimate death toll. The point is to demonstrate how easy it is to put together a mathematical model that will show anything that you might want to show, and that matches the data collected to date.
I want to emphasize again that the Imperial College model is far more complex than mine. I do not have enough information to evaluate it. However, it is projecting an extraordinary spread for this disease, and I am very skeptical of this projection. As demonstrated in Section II, such a projection could easily be wrong by a factor of 100 to 1,000.Published in