Although the day is young.
The bottom line for me is that there is a paradox of computer models. If you understand why a computer model gets the results that it does, then you do not need a computer model. And if you do not understand why it gets the results that it does, then you cannot trust the results. If you are using a computer to try to figure out causal structure, you are using it wrong.
This is from Arnold Kling, “Epistemology,” April 26.
The whole thing is well worth reading. Arnold establishes his street cred with the rest of the post.
READER COMMENTS
William Connolley
Apr 26 2020 at 12:21pm
This isn’t intelligent. Compare: if you understand how your computer factored a 100-digit number into it’s constituent primes, then you don’t need a computer to do it.
David Henderson
Apr 26 2020 at 12:26pm
Good counterexample but you need to consider Arnold’s point in context.
William Connolley
Apr 26 2020 at 4:34pm
I’m not entirely sure quite what context you mean. Is it forecasting? He says “In all of this work with models, no one ever trusted a model to do forecasting…”. But again that fails: numerical weather predictions are (a) skillful; (b) better than humans; and (c) understood, in terms of how they function.
Matthias Goergens
Apr 26 2020 at 9:31pm
Similar for all kinds of engineering. Be that structural engineering of dams, or engineering of planes or engines. Or even financial engineering.
One of the most unexpected discovery when humanity was first beginning its love affair with electronic computers was how much team we were going to spend debugging. In some sense, debugging is exploring the unexpected consequences of the code that you just wrote, and so presumably know and understand very well.
BC
Apr 26 2020 at 4:25pm
Fair point, but Kling’s point is still intelligent. One can understand all the assumptions and implications in a set of modeling equations, i.e., “understand why a computer model gets the results that it does”, but still need the computer to get the numerical results. A computer is just a fancy calculator, and generating model results is like adding a long list of numbers with a calculator. In that sense, Kling’s first statement is not really correct.
The second statement, “And if you do not understand why it gets the results that it does, then you cannot trust the results,” is apropos. Indeed, most policymakers looking at modeling results, say a curve of projected fatalities or infections over time, are not looking at or understanding the underlying assumptions in the modeling equations, which is actually the important part. Kling mentions a 200+ equation Fed model. That’s a lot of (possibly interdependent) underlying assumptions. Focusing on model results rather than underlying assumptions is like looking at the resulting sum on a calculator without understanding how one decided on the components to add together.
Roger McKinney
Apr 26 2020 at 9:44pm
Calculation is not the same as modelimg. Of course computers can calculate faster than humans. So can the old adding machines. Bit have you seen a computer make a forecasting model? Computers cant tell of a model os good or not, only how closely the forecast matches the data. Buy it cant tell you if youre missing important data or if the data and assumptions are junk.
Alan Goldhammer
Apr 26 2020 at 12:23pm
Anyone with a decent desktop computer and minimal software can do disease modeling. I stopped reading modeling papers four weeks ago and unless it really is something unique, do not even cite this stuff in my daily newsletter. For any model one can find at least one (and sometimes more) that directly contradicts it. I decided this was just a huge waste of time (as are the 100s of posts on the Internet objecting in one way or another). The same thing goes for objections to some of the field epidemiology results that are coming out. I’m still waiting for constructive comments from critics rather than nitpicks about why a sample size or test kit is no good. What would you have us do???
The ironic thing is one of the first models I read from a Greco/Italian group pretty much has nailed what is happening in Italy since early March.
Policy decisions are going to be difficult to make in the absence of information as the Rumsfeld Doctrine well points out. Thanks to our state by state approach to getting people back to work, we will soon find out what does and doesn’t work.
Personally, I’m spending my time constructively with OHDSI group projects that have already generated two good preprints looking across datasets. They have a huge study planned now: Project SCYLLA. see the April 23 entry! Also, to toot my own horn so to speak, I’ve just put up a new paradigm: “Clinical Trials During a Pandemic” on my COVID-19 website.
@David – I know you unfortunately won one of the two bets on COVID-19 mortality but I don’t remember what the other one was (100K???).
David Henderson
Apr 26 2020 at 12:28pm
You wrote:
That’s probably right. Plus we have cross-country data.
You wrote:
You remembered correctly. It was 100K. And, fortunately, I’m becoming increasingly confident that I’ll lose.
Dylan
Apr 27 2020 at 7:40am
What makes you think this? If I recall correctly, that bet was through the end of the year, right? And, when you include excess death estimates, it looks like we may have as many as 80,000 deaths so far. I’m surprised that you think this is going to slow down enough for you to lose your bet. I forget what source you agreed to use, do you expect it won’t be adjusted by the end of the year to include probable COVID-19 deaths?
robc
Apr 27 2020 at 9:43am
Do the excess deaths count?
Shouldn’t those be put in the ledger (so to speak) on the shutdown side and not the virus side? I don’t think deaths due to government policy can be blamed on the virus.
Some fraction of them (30%, 40%?) can maybe be applied to the virus, depending on what you think the percent of voluntary shutdown related deaths are vs mandatory shutdown related deaths.
And even then, deaths due to actions people take due to the virus are different than deaths from the virus and shouldn’t be counted in the virus column either.
Dylan
Apr 27 2020 at 10:20am
Rob,
I think you’re misunderstanding. The point is that the official numbers only count those that have actually been tested. People dying outside of hospitals are generally not tested. The FT did an analysis of the data from 14 countries which suggests that the official numbers under-report true coronavirus deaths significantly, and the actual numbers of deaths may be 60% higher.
I know two people in the city that likely died of the virus, but it’s unclear whether either is counted in the official statistics. One was a 28 year old woman who had developed mild corona virus symptoms and stayed at home as suggested, they then took a sudden turn and she ended up dying at home without ever going to a hospital. Another acquaintance was also fairly young (late 40s I think). He had been diagnosed and in the hospital for awhile, but was feeling better and was released and then died of a heart attack the next day at home. Not clear if he is being counted in the official statistics either.
Agreed that you can’t just look at the total excess death numbers and make a concrete conclusion that those are all due to coronavirus, but it also isn’t clear which direction the sign is.. Other causes of mortality have likely decreased due to the lockdowns, like traffic accidents, and probably seasonal flu. So excess death figures may underestimate the true number of virus deaths.
Alan Goldhammer
Apr 27 2020 at 10:51am
The very early deaths in Santa Clara county were only discovered by post-mortem testing 8 weeks following death. IIRC, death was reported complications from pneumonia and tests of tissue showed SARS-CoV-2 infections. I suspect there are more in that category that may never be put into the “official” COVID-19 category.
We’ll see how the gradual reopening of things goes. We certainly do not want a repeat as happened in Hokkaido. the State of Georgia will be instructive to policy makers.
Scott Sumner
Apr 26 2020 at 1:57pm
Yes, that’s a great quote you spotted.
Roger McKinney
Apr 26 2020 at 9:34pm
Mpdels give the illusion of science and control. They’re intended to fool the vast majority who know nothing about modeling.
James
Apr 27 2020 at 2:48am
This best paragraph seems confused. There is a very extensive literature showing that model based forecasts are always at least as accurate and often more accurate than judgement based forecasts.
If you are pressed for time, just read this paper to get a sense of the situation: https://courses.washington.edu/pbafhall/514/514%20Readings/clinical%20versus%20actuarial.pdf
As long as people need point estimates, people will be well served to use models to generate those forecasts.
John Alcorn
Apr 27 2020 at 2:12pm
Nobel-laureate Daniel Kahneman, in his book, Thinking, Fast and Slow, updates and confirms the findings of Paul Meehl and fellow researchers about predictive accuracy:
However, Dr. Kahneman also points out that simple formulas + good data + common sense go a long way:
IMO, clear reasoning + simple formulas + good data + common sense = Arnold Kling’s forte!
robc
Apr 27 2020 at 2:36pm
An example of a simple weighted formula is the Marcel predictions for baseball.
http://tangotiger.net/marcel/
The idea was it was so simple a monkey could do it. It is the minimal effort at projecting a players results in the upcoming season based on a weighting of previous 3 seasons and reversion to the mean. If you follow thru some of the links, you can see a discussion with Nate Silver (back when he primarily did baseball work) on how his projections are only marginally better than Marcel. The big difference is Marcel doesn’t even try for minor leaguers.
John Alcorn
Apr 27 2020 at 2:49pm
Thx for the pointer. Fascinating!
John Alcorn
Apr 27 2020 at 2:33pm
The Wikipedia entry, “Computer simulation,” distinguishes models and simulations:
The computer, per se, isn’t the problem. The model and data (or empirical assumptions) are the issues.
John Alcorn
Apr 27 2020 at 2:36pm
Xavier Gabaix (MIT) and David Laibson (Harvard) propose criteria for assessing a model’s quality, in their paper, “The Seven Properties of Good Models.” They explain:
Comments are closed.