Cargo Cult Science

 

The first principle [of science] is that you must not fool yourself — and you are the easiest person to fool. –– Richard Feynman, from his 1974 commencement address at Caltech

South Sea Island Infrastructure Project

In 1974, Richard Feynman gave the commencement address at Caltech, in which he cautioned the audience to understand and be wary of the difference between real science and what he called “Cargo Cult” science. The lecture was a warning to students that science is a rigorous field that must remain grounded in hard rules of evidence and proof. Feynman went on to explain to the students that science is extremely hard to get right, that tiny details matter, that it is always a struggle to make sure that personal bias and motivated reasoning are excluded from the process.

It’s not the good will, the high intelligence, or the expertise of scientists that makes science work as the best tool for discovering the nature of the universe. Science works for one simple reason: It relies on evidence and proof. It requires hypothesis, prediction, and confirmation of theories through careful experiment and empirical results. It requires excruciating attention to detail, and a willingness to abandon an idea when an experiment shows it to be false. Failure to follow the uncompromising rules of science opens the door to bias, group-think, politically-motivated reasoning, and other failures.

Science is the belief in the ignorance of experts. — Richard Feynman

As an example of how unconscious bias can influence even the hardest of sciences, Feynman recounted the story of the Millikan Oil Drop Experiment. The purpose of the experiment was to determine the value of the charge of an electron. This was a rather difficult thing to measure with the technology of the time, and Millikan got a result that was just slightly too high due to experimental error — he used the wrong value for the viscosity of air in his calculations. This was the result that was published.

Now, a slightly incorrect result is not a scandal — it’s why we insist on replication. Even the best scientists can get it wrong once in awhile. This is why the standard protocol is to publish all data and methods so that other scientists can attempt to replicate the results. Millikan duly published his methods along with the slightly incorrect result, and others began doing oil drop experiments themselves.

As others published their own findings, an interesting pattern emerged: The first published results after Millikan’s were also high – just not quite as much. And the next generation of results were again too high, but slightly lower than the last . This pattern continued for some time until the experiments converged on the true number.

Why did this happen? There was nothing about the experiment that should lead to a consistently high answer. If it was just a hard measurement to make, you would expect experimental results to be randomly distributed around the real value. What Feynman realized was that psychological bias was at work: Millikan was a great scientist, and no one truly expected him to be wrong. So when other scientists found their results were significantly different from his, they would assume that they had made some fundamental error and throw the results out. But when randomness in the measurement resulted in a measurement closer to Millikan’s, they assumed that it was a better result. They were filtering the data until the result reached a value that was at least close enough to Millikan’s that the error was ‘acceptable’. And then when that result was added to the body of knowledge, it made the next generation of researchers a little more willing to settle on an even smaller, but still high result.

Note that no one was motivated by money, or politics, or by anything other than a desire to be able to replicate a great man’s work. They all wanted to do the best job they could and find the true result. They were good scientists. But even the subtle selection bias caused by Millikan’s stature was enough to distort the science for some time.

The key thing to note about this episode is that eventually they did find the real value, but not by relying on the consensus of experts or the gravitas and authority of a great scientist. No, the science was pulled back to reality only because of the discipline of constant testing and because the scientific question was falsifiable and experimentally determinable.

Failure to live up to these standards, to apply the rigor of controlled double-blind tests, predictions followed by tests of those predictions and other ways of concretely testing for the truth of a proposition means you’re not practising science, no matter how much data you have, how many letters you have after your signature, or how much money is wrapped up in your scientific-looking laboratory. At best, you are practising cargo-cult science, or as Friedrich Hayek called it in his Nobel speech, ‘scientism’ – adopting the trappings of science to bolster an argument while at the same time ignoring or glossing over the rigorous discipline at the heart of true science.

This brings us back to cargo cults. What is a cargo cult, and why is it a good metaphor for certain types of science today? To see why, let’s step back in time to World War II, and in particular the war against Japan.

The Pacific Cargo Cults

During World War II, the allies set up forward bases in remote areas of the South Pacific. Some of these bases were installed on islands populated by locals who had never seen modern technology, who knew nothing of the strange people coming to their islands. They watched as men landed on their island in strange steel boats, and who then began to cut down jungle and flatten the ground. To the islanders, it may have looked like an elaborate religious ritual.

In due time, after the ground was flat and lights had been installed along its length, men with strange disks over their ears spoke into a little box in front of their mouths, uttering incantations. Amazingly, after each incantation a metal bird would descend from the sky and land on the magic line of flat ground. These birds brought great wealth to the people – food they had never seen before, tools, and medicines. Clearly the new God had great power.

Years after the war ended and the strange metal birds stopped coming, modern people returned to these islands and were astonished by what they saw; ‘runways’ cut from the jungle by hand, huts with bamboo poles for antennas, locals wearing pieces of carved wood around their ears and speaking into wooden ‘microphones’, imploring the great cargo god of the sky to bring back the metal birds.

Ceremony for the new Tuvaluan Stimulus Program

Ceremony for the new Tuvaluan Stimulus Program

Understand, these were not stupid people. They were good empiricists. They painstakingly watched and learned how to bring the cargo birds. If they had been skilled in modern mathematics, they might even have built mathematical models exploring the correlations between certain words and actions and the frequency of cargo birds appearing. If they had sent explorers out to other islands, they could have confirmed their beliefs: every island with a big flat strip and people with devices on their heads were being visited by the cargo birds. They might have found that longer strips bring even larger birds, and used that data to predict that if they found an island with a huge strip it would have the biggest birds.

Blinded With Science!

Blinded with Science

There’s a lot of “science” that could have been done to validate everything the cargo culters believed. There could be a strong consensus among the most learned islanders that their cult was the ‘scientific’ truth. And they could have backed it up with data, and even some simple predictions. For example, the relationship between runway length and bird size, the fact that the birds only come when it’s not overcast, or that they tended to arrive on a certain schedule. They might even have been able to dig deeply into the data and find all kinds of spurious correlations, such as a relationship between the number of birds on the ground and how many were in the sky, or the relationship between strange barrels of liquid on the ground and the number of birds that could be expected to arrive. They could make some simple short-term predictions around this data, and even be correct.

Then one day, the predictions began to fail. The carefully derived relationships meticulously measured over years failed to hold. Eventually, the birds stopped coming completely, and the strange people left. But that wasn’t a problem for the island scientists: They knew the conditions required to make the birds appear. They meticulously documented the steps taken by those first strangers on the island to bring the birds in the first place, and they knew how to control for bird size by runway length, and how many barrels of liquid were required to entice the birds. So they put their best engineers to work rebuilding all that with the tools and materials they had at hand – and unexpectedly failed.

How did all these carefully derived relationships fail to predict what would happen? Let’s assume these people had advanced mathematics. They could calculate p-values, do regression analysis, and had most of the other tools of science. How could they collect so much data and understand so much about the relationships between all of these activities, and yet be utterly incapable of predicting what would happen in the future and be powerless to control it?

The answer is that the islanders had no theory for what was happening, had no way of testing their theories even if they had had them, and were hampered by being able to see only the tiniest tip of an incredibly complex set of circumstances that led to airplanes landing in the South Pacific.

Imagine two island ‘scientists’ debating the cause of their failure. One might argue that they didn’t have metal, and bamboo wasn’t good enough. Another might argue that his recommendation for how many fake airplanes should be built was ignored, and the fake airplane austerity had been disastrous. You could pore over the reams of data and come up with all sorts of ways in which the recreation wasn’t quite right, and blame the failure on that. And you know what? This would be an endless argument, because there was no way of proving any of these propositions. Unlike Millikan, they had no test for the objective truth.

And in the midst of all their scientific argumentation as to which correlations mattered and which didn’t, the real reason the birds stopped coming was utterly opaque to them: The birds stopped coming because some people sat on a gigantic steel ship they had never seen, anchored in the harbor of a huge island they had never heard of, and signed a piece of paper agreeing to end the war that required those South Pacific bases. And the signing itself was just the culmination of a series of events so complex that even today historians argue over it. The South Sea Islanders were doomed to hopeless failure because what they could see and measure was a tiny collection of emergent properties caused by something much larger, very complex and completely invisible to them. The correlations so meticulously collected were not describing fundamental, objective properties of nature, but rather the side-effects of a temporary meta-stability of a constantly changing, wholly unpredictable and wildly complex system.

The Modern Cargo Cults

Today, entire fields of study are beginning to resemble a form of modern cargo cult science. We like to fool ourselves into thinking that because we are modern, ‘scientific’ people that we could never do anything as stupid as the equivalent of putting coconut shells on our ears and believing that we could communicate with metal birds in the sky through them. But that’s exactly what some are doing in the social sciences, in macroeconomics, and to some extent in climate science and in some areas of medicine. And these sciences share a common characteristic with the metal birds of the south sea cargo cults: They are attempts to understand, predict, and control large complex systems through examination of their emergent properties and the relationships between them.

No economist can hope to understand the billions of decisions made every day that contribute to change in the economy. So instead, they choose to aggregate and simplify the complexity of the economy into a few measures like GDP, consumer demand, CPI, aggregate monetary flows, etc. They do this so they can apply mathematics to the numbers and get ‘scientific’ results. But like the South Sea islanders, they have no way of proving their theories and a multitude of competing explanations for why the economy behaves as it does with no objective way to solve disputes between them. In the meantime, their simplifications may have aggregated away the information that’s actually important for understanding the economy.

You can tell that these ‘sciences’ have gone wrong by examining their track record of prediction (dismal), and by noticing that there does not seem to be steady progress of knowledge, but rather fads and factions that ebb and flow with the political tide. In my lifetime I have seen various economic theories be discredited, re-discovered, discredited once more, then rise to the top again. There are still communist economics professors, for goodness’ sake. That’s like finding a physics professor who still believes in phlogiston theory. And these flip-flops have nothing to do with the discovery of new information or new techniques, but merely by which economic faction happens to have random events work slightly in favor of their current model or whose theories give the most justification for political power.

As Nate Silver pointed out in his excellent, “The Signal and the Noise,” economists’ predictions of future economic performance are no better than chance once you get away from the immediate short term. Annual surveys of macroeconomists return predictions that do no better than what you’d get throwing darts at a dartboard. When economists like Christina Romer have the courage to make concrete predictions of the effects of their proposed interventions, they turn out to be wildly incorrect. And yet, these constant failures never seem to falsify their underlying beliefs. Like the cargo cultists, they’re sure that all they need to do is comb through the historical patterns in the economy and look for better information, and they’ll surely be able to control the beast next time.

Other fields in the sciences are having similar results. Climate is a complex system with millions of feedbacks. It adapts and changes by its own rules we can’t begin to fully grasp. So instead we look to the past for correlations and then project them, along with our own biases, into the future. And so far, the history of prediction of climate models is very underwhelming.

In psychology, Freudian psychoanalysis was an unscientific, unfalsifiable theory based on extremely limited evidence. However, because it was being pushed by a “great man” who commanded respect in the field, it enjoyed widespread popularity in the psychology community for many decades despite there being no evidence that it worked. How many millions of dollars did hapless patients spend on Freudian psychotherapy before we decided it was total bunk? Aversion therapy has been used for decades for the treatment of a variety of ills by putting the patient through trauma or discomfort, despite there being very little clinical evidence that it works. Ulcers were thought to have been caused by stress. Facilitated communication was a fad that enjoyed widespread support for far too long.

A string of raw facts; a little gossip and wrangle about opinions; a little classification and generalization on the mere descriptive level; a strong prejudice that we have states of mind, and that our brain conditions them: but not a single law in the sense in which physics shows us laws, not a single proposition from which any consequence can causally he deduced. This is no science, it is only the hope of a science.

— William James, “Father of American psychology”, 1892

These fields are adrift because there are no anchors to keep them rooted in reality. In real science, new theories are built on a bedrock of older theories that have withstood many attempts to falsify them, and which have proven their ability to describe and predict the behavior of the systems they represent. In cargo cult sciences, new theories are built on a foundation of sand — of other theories that themselves have not passed the tests of true science. Thus they become little more than fads or consensus opinions of experts — a consensus that ebbs and flows with political winds, with the presence of a charismatic leader in one faction or another, or with the accumulation of clever arguments that temporarily outweigh the other faction’s clever arguments. They are better described as branches of philosophy, and not science — no matter how many computer models they have or how many sophisticated mathematical tools they use.

In a cargo cult science, factions build around popular theories, and people who attempt to discredit them are ostracised. Ad hominem attacks are common. Different theories propagate to different political groups. Data and methods are often kept private or disseminated only grudgingly. Because there are no objective means to falsify theories, they can last indefinitely. Because the systems being studied are complex and chaotic, there are always new correlations to be found to ‘validate’ a theory, but rarely a piece of evidence to absolutely discredit it. When an economist makes a prediction about future GDP or the effect of a stimulus, there is no identical ‘control’ economy that can be used to test the theory, and the real economy is so complex that failed predictions can always be explained away without abandoning the underlying theory.

There is currently a crisis of non-reproducibility going on in these areas of study. In 2015, Nature looked at 98 peer-reviewed papers in psychology, and found that only 39 of them had results that were reproducible. Furthermore, 97 percent of the original studies claimed that their results were statistically significant, while only 36 percent of the replication studies found statistically significant results. This is abysmal, and says a lot about the state of this “science.”

This is not to say that science is impossible in these areas, or that it isn’t being done. All the areas I mentioned have real scientists working in them using the real methods of science. It’s not all junk. Real science can help uncover characteristics and behaviors of complex systems, just as the South Sea Islanders could use their observations to learn concrete facts such as the amount of barrels of fuel oil being an indicator of how many aircraft might arrive. In climate science, there is real value to be had in studying the relationships between various aspects of the climate system — so long as we recognize that what we are seeing is subject to change and that what is unseen may represent the vast majority of interactions.

The complex nature of these systems and our inability to carry out concrete tests means we must approach them with great humility and understand the limits of our knowledge and our ability to predict what they will do. And we have to be careful to avoid making pronouncements about truth or settled science in these areas, because our understanding is very limited and likely to remain so.

Science alone of all the subjects contains within itself the lesson of the danger of belief in the infallibility of the greatest teachers of the preceding generation.

— Richard Feynman

Published in General
Like this post? Want to comment? Join Ricochet’s community of conservatives and be part of the conversation. Join Ricochet for Free.

There are 103 comments.

Become a member to join the conversation. Or sign in if you're already a member.
  1. drlorentz Member
    drlorentz
    @drlorentz

    iWe:Help me out in an argument with a family member: is a chaotic system, by definition, impossible to accurately (quantitatively) model so as to precisely predict the future?

    The system can be modeled. The trouble is that the evolution of the system is critically dependent on the initial conditions. In plain English, that means if you change an input slightly, you might end up with a very different state later. Since there’s uncertainty about the starting values (nothing is known with perfect precision), the model doesn’t help much. You can make a double pendulum at home to see how that works.

    I have one of these things on my desk at work. Where it goes is not practically predictable because it depends on precisely where you start. You can’t make it follow the same path twice. We know all the equations to model this thing but still can’t predict its motion.

    Climate modeling has other problems, including the following:

    1. We don’t know all the phenomenology to put in the models: unknown unknowns.
    2. The models have to approximate some things because they are too hard to treat exactly.
    3. The models have a coarse resolution that smooths over local variability.

    Even if all these problems were solved, the chaos issue will remain.

    • #91
  2. iWe Coolidge
    iWe
    @iWe

    Thank you, JW.

    For what it is worth, my mother likes to point out that it is pretty easy to accurately predict what a billiard ball does when another ball strikes it. It is much harder to predict what a kitten will do in the same situation.

    • #92
  3. Tuck Inactive
    Tuck
    @Tuck

    drlorentz:

    Bigfoot: First, kill all the butterflies.

    Even before the lawyers?

    It’s for the children!

    • #93
  4. Tuck Inactive
    Tuck
    @Tuck

    anonymous:

    iWe: It is much harder to predict what a kitten will do in the same situation.

    I have long said, “There is no game which cannot be improved by replacing the ball with a cat.”

    Fore!

    That’s evil.  HA HA….

    • #94
  5. Owen Findy Inactive
    Owen Findy
    @OwenFindy

    drlorentz: The system can be modeled. The trouble is that the evolution of the system is critically dependent on the initial conditions. In plain English, that means if you change an input slightly, you might end up with a very different state later.

    Something you didn’t mention, though, is that, for one to know the value of a chaotic system at time t, the system has to have gone through all the intervening steps.  Isn’t that correct?  You can’t plug an initial value into a formula, and have it spit out the value at t; the system has to evolve through all time steps until it reaches t before you can know what it’s value at t is.  Which means it is inherently unpredictable.

    • #95
  6. Owen Findy Inactive
    Owen Findy
    @OwenFindy

    Owen Findy: Which means it is inherently unpredictable.

    OTOH…isn’t the prediction attempted in climate science statistical/probabilistic, which may partly get around the impossibility of  predicting a chaotic system’s behavior the way I described it?

    • #96
  7. drlorentz Member
    drlorentz
    @drlorentz

    Owen Findy:

    Something you didn’t mention, though, is that, for one to know the value of a chaotic system at time t, the system has to have gone through all the intervening steps. Isn’t that correct? You can’t plug an initial value into a formula, and have it spit out the value at t; the system has to evolve through all time steps until it reaches t before you can know what it’s value at t is. Which means it is inherently unpredictable.

    Consider a system that is completely deterministic like an ideal double pendulum. The equations of motion are known exactly. So there is a model that describes the system without approximation. Since the equations are nonlinear, they probably have to be solved numerically (i.e., using a computer). While that involves approximation, that’s not the cause of chaotic behavior.

    The computer solves the equations by stepping through time from 0 to t, passing through many intermediate time steps, to come up with a prediction for the state at time t. The unpredictability stems from the fact that a slight change in the starting condition results in a large change in the final state. Since the starting condition cannot be known precisely in the real world, the model is not useful for predicting the behavior of the system.

    Models of non-chaotic systems do not suffer from this problem even though they make some approximations of the real system and suffer from numerical roundoff.

    • #97
  8. drlorentz Member
    drlorentz
    @drlorentz

    Owen Findy:

    Owen Findy: Which means it is inherently unpredictable.

    OTOH…isn’t the prediction attempted in climate science statistical/probabilistic, which may partly get around the impossibility of predicting a chaotic system’s behavior the way I described it?

    The models are run many times and there are many models to try to address the problem. That’s why the model prediction graphs are spaghetti plots. In the end, you have to average the results to come up with a prediction. The trouble is that chaotic systems do not respond well to averaging since final states are often wildly separated so the average may deviate greatly from the final state that’s realized.

    Lorenz was trying to make a simple simulation of the atmosphere when he found chaos in his model. Even for this simple system, the model is not useful for predicting the final state if there’s any uncertainty about the initial state.

    Chaos is not the only issue with climate models. Unlike a simple system like a double pendulum or the Lorenz equations, climate models have the additional problems I listed before.

    • #98
  9. Mark Wilson Inactive
    Mark Wilson
    @MarkWilson

    Dan Hanson: So when other scientists found their results were significantly different from his, they would assume that they had made some fundamental error and throw the results out. But when randomness in the measurement resulted in a measurement closer to Millikan’s, they assumed that it was a better result. They were filtering the data until the result reached a value that was at least close enough to Millikan’s that the error was ‘acceptable’.

    A few years ago I read an article on Judith Curry’s blog that the validation of Global Circulation Models (climate models) was heavily influenced by this effect as well.  Models are tuned until their predictions agree with other models’ predictions, producing a big self-reinforcing common mode failure.  And since there is only one, massive, long-term experiment (reality) to validate all the models, the scientific corrective process is not working efficiently.

    • #99
  10. Mark Wilson Inactive
    Mark Wilson
    @MarkWilson

    Midget Faded Rattlesnake:The claim, “I trust my instruments so much that, when their data contradicts established results, I believe my instruments and not the established results,” is a pretty big claim. […]

    True, this is a case where new data rectified established knowledge. But it works both ways – we also rely on established knowledge to interpret new data – and I think it working both ways is a good thing.

    Sounds like a job for … Bayes Theorem!

    • #100
  11. Mark Wilson Inactive
    Mark Wilson
    @MarkWilson

    JimGoneWild:This is exactly what happened in Obama’s first term. They had economists models that you would input spending here or there and the model would spit out a percent increase in GDP or a certain decrease in unemployment.

    Of course, none of it came to be.

    It was even worse than that.  A few months down the road, the CBO was asked to assess whether the stimulus had worked.  They reported that the stimulus worked exactly as predicted.  The basis for such a statement did not consist of measurements, but rather literally rerunning the same model.

    “According to our model, this is what will happen.”

    became

    “According to our model, this is what did happen.”

    • #101
  12. Midget Faded Rattlesnake Member
    Midget Faded Rattlesnake
    @Midge

    Mark Wilson:

    Midget Faded Rattlesnake:The claim, “I trust my instruments so much that, when their data contradicts established results, I believe my instruments and not the established results,” is a pretty big claim. […]

    True, this is a case where new data rectified established knowledge. But it works both ways – we also rely on established knowledge to interpret new data – and I think it working both ways is a good thing.

    Sounds like a job for … Bayes Theorem!

    ;-)

    • #102
  13. Z in MT Member
    Z in MT
    @ZinMT

    I am officially calling this my favorite post ever on Ricochet.

    • #103
Become a member to join the conversation. Or sign in if you're already a member.