Ricochet is the best place on the internet to discuss the issues of the day, either through commenting on posts or writing your own for our active and dynamic community in a fully moderated environment. In addition, the Ricochet Audio Network offers over 50 original podcasts with new episodes released every day.
The term ‘model’ is much in the news, and I’m not talking about @RightAngles trade. It’s the term apparently favored by the media to describe a general area that may also go by: cybernetics, system dynamics, advanced statistics, simulation, control theory, and others. Having some academic and professional background in the domain, this is my (inevitably simplified) attempt to sketch its limits, so you can be smarter than the average journalist.
So, simplifying, as warned: There are two types of models. One is broadly statistical in approach. The other attempts to be more mechanistic.
And there are two major uses of models. One is descriptive: What’s going on here? The other is control: What can we do about it?
This may also be labeled curve fitting, black-box models, deep learning, stochastic models, and more. It means taking as large a sample as possible of system inputs over time, and correlated outputs over time, and building a statistical description of how they relate to one another.
The farthest the mass media go into this territory is the canonical bell curve: “Here is the distribution of salaries for purple humped clerics. Here is the distribution for green crested clerics. They are different -> discrimination!” Having tried to explain the output of complex statistical models to state-level legislators, I have a bit of empathy.
In our current situation, the best known statistical model is the IHME model, being used by both media and government to estimate where the pandemic is headed and, importantly, what resources will be required to meet it. IHME is only slightly more complex than the standard bell curve model, it’s using something called a logistic or S-curve. Statistical modeling is also widely used in another domain temporarily shoved off the front pages, climate.
Why use this? It’s easy to get running – just start watching and recording what’s going on. No need for fancy experiments to isolate cause and effect – which might not be possible anyway – just watch the trends. You can refine things as you go along and get more data. Note that IHME is doing exactly that as more data comes in from states and from countries that are further along in the pandemic. These are very compelling arguments when you are under the gun for forecasts and lives depend on it.
What can go wrong? Just a few things…
Biased or inaccurate sampling. All statistical techniques depend on having a representative sample of the domain in question. What happens if some of the data going into the model has been deliberately perturbed (*cough* China *cough*). What happens if your sampling space, say South Korea or northern Italy, has economic or social practices that differ from where you are attempting to forecast, North America? Nothing good.
Under-sampling and over-extrapolation. These often go together. An unbiased statistical model may be good in areas where you have lots of data, but fall apart outside that sample space, quite a problem if you are trying to forecast extreme conditions. Climate models are notorious for this, using techniques like principal components analysis on limited historical records, and attempting to extrapolate the results into extreme conditions of CO2 and temperature.
Overfitting and incorrect model assumptions. Again, these often go together. It’s an aphorism in the field that you can fit an elephant with enough parameters, meaning roughly that you can always pile on fudge factors to conceal the fact that your underlying system concept is wrong. Hockey sticks come to mind. Simplified statistical epidemic models may fall apart if we try to restart an economy without reaching a steady-state of virus.
Hidden variables and lack of understanding. These are not the same thing, but they will both destroy attempts to use a statistical model for control purposes. Something you can’t currently observe (asymptomatic carriers?) may turn out to be a major driving variable. If you don’t really know what’s in the black box, attempts to drive its inputs to create desired outputs may not go well, particularly when there are inevitable time delays between taking an action and seeing its results.
Also known as mechanistic models or just plain science. This is where you attempt to understand cause and effect in some detail, going into internal processes of the system as necessary, and build a mathematical replica. If you’re doing climatology you’ll model things like carbon fixing by plants depending on temperature and CO2 levels. If you are doing epidemiology, you’ll have things like social network density and incubation periods. In the current situation, the best-known model of this type comes from Imperial College of London. This is a simulation that was constructed after the H1N1 pandemic and embeds a detailed model of epidemic spread that was retrospectively tested against the pandemic records.
Why use this? In a word, understanding. If you have some validation of cause and effect mechanisms, you are on firmer ground trying to reach beyond your previous experiences, and in coming up with control strategies, which are both perilous with purely statistical models.
What could go wrong? Just a few things…
Taking the model out of context. What worked very well for Carboniferous forests, may not when we have things like managed tree farms. H1N1 is all well and good, but the Wuflu isn’t actually a flu and propagates differently.
Time, we have no time! Understanding takes time and often controlled experiments, and often neither is or will ever be available before decisions must be made.
Incompleteness. There are very few simulations of any complexity that are completely mechanistic. There’s always some statistical modeling buried in there. The Imperial College model doesn’t actually have all the churches, schools and airports described, instead it has a ‘synthetic population’ generated in accordance with a statistical description. Components like that are subject to all the problems described above for statistical models.
There’s no neat conclusion to this post. None of the models being tossed about are completely right or wrong. They are incomplete. This might give you some sympathy, beyond what the MSM spin will ever provoke, for those modelers being sweated by decision-makers who have trillions of dollars and thousands of lives on the line.Published in