A Scandal in Political Science – An Update

 

Michael LaCour, a UCLA graduate student in political science, has been accused of faking results for a paper that he published in the journal Science. Two days ago, Science retracted the article.

Four days ago, in my Ricochet post, I explained why I believe LaCour has faked the results of a second paper. Much of my explanation was speculation. For instance, I described how the confidence intervals for his estimates of media slants didn’t follow the pattern that one would have expected. I said in that post that I would have given 10:1 odds that the results were fake; that is, I was about 90% sure that this second paper was also an instance of fraud.

A few days ago, Emory political scientist Gregory Martin posted a note examining the same paper by LaCour. Approximately a year ago, Martin and his coauthor, Ali Yurukoglu, had asked LaCour for the computer code that he wrote for his paper. LaCour gave him the code, but Martin was unable to replicate the results that LaCour reported. Perhaps most shocking, Martin noticed the following problem: In his paper LaCour wrote that all of his media data came from the UCLA Closed Caption (LACC) archive of news shows transcripts. The paper reported estimates for 60 news shows. Martin noticed, however, that 14 of the shows do not really exist in the LACC data set.

Although one could imagine innocent explanations for the problems Martin found, the evidence was stunning. After seeing Martin’s results, I became about 99% sure that LaCour faked results in the second paper.

Late yesterday, Martin added to his report two paragraphs explaining additional evidence of fraud by LaCour. The evidence in these new paragraphs make me approximately 99.99% sure that LaCour faked his results.

I’ll describe the new evidence in a moment. But first, here’s some background. A few years ago, while I was a professor in the UCLA political science department, I had a conversation in a hallway with my colleague Jeff Lewis. Lewis noted a potential problem with the media-bias paper I had written with Jeff Milyo. Although our method assumed that observations are independent (a standard assumption with statistical models), Lewis discussed some reasons why we shouldn’t believe that that is the case. He described a way to alter our method that would correct for the potentially untrue assumption.

I also remember Lewis telling me that he had described his alteration to a PhD class that he was teaching, or possibly to a single student. I think I remember that he also told me that he incorporated his model (i.e. the altered version of my and Milyo’s model) into a homework problem for his class.

I’m now almost certain that the statistical model that LaCour describes in the last page of the appendix is the exact model that Lewis constructed.

Lewis constructed his model to be run on the data that Milyo and I gathered. Those data involve citations to think tanks made by members of Congress and media outlets. In contrast, LaCour’s paper uses data about “loaded political phrases” made by members of Congress and media outlets in the LACC archive.

Here now is some of the new evidence that Martin reports. First, the data file that LaCour’s code references is labeled, “/Users/michaellacour/Dropbox/MediaBias/Groseclose/counts.csv.” Curiously, the label of the file contains my name.

Approximately two or three years ago, I gave LaCour the data from my and Milyo’s project. LaCour’s method appears to use that data. But my and Milyo’s data is based on think tank citations, whereas LaCour’s purported results are based on loaded political phrases. If LaCour really executed the method that he says he executed, it would make no sense to use my and Milyo’s data.

Second, LaCour writes in his appendix that his method assumes that the slant of each of the news shows that he analyzes has a “prior distribution” with mean equal to 50.06. Meanwhile, in our media-bias paper Milyo and I report that our estimate of the average voter’s ideology (on the “adjusted ADA” scale) is 50.06. LaCour must have gotten that number from our paper. However, it makes no sense for LaCour to use that number. The slants that he reports are on the “DW-Nominate” scale, which runs from -1 to 1. It is impossible for those slants to have a mean of 50.06.

Third, as Martin reports, LaCour’s code instructs his computer to compute slants for 20 news outlets. This happens to be the number of outlets that Milyo and I examined. In contrast, LaCour reports slant estimates for 60 outlets.

In sum, the code that LaCour claimed to use – and sent to Martin – does not make sense for the results he publishes in his paper. In contrast, the code makes perfect sense for the method that Jeff Lewis described to me that day in the hallway.

I strongly believe that something like the following happened: LaCour completely faked the media-slant results that he reports in the second paper and did not really write any computer code to estimate those results. When Martin and Yurukoglu asked him for his code, he sent the best substitute he could think of—the code that Jeff Lewis wrote for a different problem.

Published in General
Like this post? Want to comment? Join Ricochet’s community of conservatives and be part of the conversation. Join Ricochet for Free.

There are 34 comments.

Become a member to join the conversation. Or sign in if you're already a member.
  1. drlorentz Member
    drlorentz
    @drlorentz

    Claire Berlinski:

    Ball Diamond Ball:Most people who should know better, and perhaps most scientists, cannot correctly describe a p-value.

    I don’t think I’d be the only one here who would be interested in reading a definition of the concept

    Might I be able to tempt you to write a post explaining this?

    I’ll leave it to BDB to explain p-values, but I’d add a few comments:

    1. p-values are almost exclusively used in social sciences. That’s because the effects measured in social sciences have relatively low significance. This is not a value judgment, it a technical term that tells you how likely the result is due to chance instead being a real effect. Natural scientists often give their uncertainties as a multiple of standard deviation that are typically much more stringent. For example, the Higgs boson was a 5 sigma effect, which translates into a p-value of about 0.0000003. Typical p-values for social science work are 0.05 or 0.01. Big difference.
    2. The validity of p-values depends on an important assumption about the underlying statistics of the situation, namely that the underlying statistics are gaussian. If this sounds esoteric and irrelevant to everyday life, consider that this erroneous assumption was a principal cause of the 2008 financial crisis.
    3. Social scientists often use a software package called ‘R’. Anecdotally I can report that many people using R have no idea what it’s doing. They just turn the crank.
    • #31
  2. The Reticulator Member
    The Reticulator
    @TheReticulator

    drlorentz:

    Claire Berlinski:

    Ball Diamond Ball:Most people who should know better, and perhaps most scientists, cannot correctly describe a p-value.

    I don’t think I’d be the only one here who would be interested in reading a definition of the concept

    Might I be able to tempt you to write a post explaining this?

    I’ll leave it to BDB to explain p-values, but I’d add a few comments:

    1. p-values are almost exclusively used in social sciences. That’s because the effects measured in social sciences have relatively low significance. This is not a value judgment, it a technical term that tells you how likely the result is due to chance instead being a real effect. Natural scientists often give their uncertainties as a multiple of standard deviation that are typically much more stringent. For example, the Higgs boson was a 5 sigma effect, which translates into a p-value of about 0.0000003. Typical p-values for social science work are 0.05 or 0.01. Big difference.

    Also in ecology and agriculture, to name a couple of others. As far as I know they are used in most biological sciences, all of whose practitioners would be highly offended to be lumped in with social scientists.

    But the reasons are similar.  Their is a lot of variation in their data.  Maybe physiology is different.  Just guessing about that.

    • #32
  3. drlorentz Member
    drlorentz
    @drlorentz

    The Reticulator:Also in ecology and agriculture, to name a couple of others. As far as I know they are used in most biological sciences, all of whose practitioners would be highly offended to be lumped in with social scientists.

    But the reasons are similar. Their is a lot of variation in their data. Maybe physiology is different. Just guessing about that.

    Biology is a mixed bag. Papers in medicine often use p-values of 0.05 or 0.01, especially epidemiology. Those are much like social science since they involve large surveys of people and there are many confounding factors. On the other hand, some genomics work has low p-values too, for example the guidelines for PLOS Genetics call for p<0.00000005.

    Low significance comes with the territory when you explore complex phenomena. In The Bell Curve, Herrnstein & Murray remarked on the typically low correlation values found in social science research.  I feel for my brethren in those fields; they have a tough job.

    • #33
  4. PsychLynne Inactive
    PsychLynne
    @PsychLynne

    drlorentz:

    1. Social scientists often use a software package called ‘R’. Anecdotally I can report that many people using R have no idea what it’s doing. They just turn the crank.

    The two professors who provided most of my stats training wouldn’t let us use menus.  We had to use coding for everything.  It was to train us to avoid doing just that.

    I work a lot with standard deviations and standard error of measurement because of all the neuropsych testing I’ve done.  While we don’t use p-values per se, we report confidence intervals around the scores.

    What’s interesting there, is that a score that is a standard deviation away from the mean is generally considered statistically significant, as well as clinically significant.  However, in testing results, just like in research you can do a zillion post-hoc analyses that the test/study wasn’t powered to handle and wind up with some pretty dramatic statements that sounds impressive but have almost no clinical significance in someones intellectual functioning.  (another lesson from my great stats professors)

    • #34
Become a member to join the conversation. Or sign in if you're already a member.