Over the past few days a scandal has begun to plague political science. A UCLA graduate student, Michael LaCour, appears to have faked a data set that was the basis for an article that he published in the highly prestigious journal Science. I have examined a second paper by LaCour. As I’ll explain, I’m convinced that it also is the product of faked results.
The Science article purportedly showed that personalized, door-to-door canvassing is effective at changing political views. LaCour and his co-author, Don Green of Columbia University, enlisted members of an LGBT organization at UCLA to contact voters who had earlier indicated on a survey that they opposed gay marriage. The article shows, based on follow-up surveys, that the LGBT door-to-door canvassing had a significant effect in shifting voters toward pro-gay-marriage views.
Two graduate students at UC Berkeley, however, had significant difficulties in replicating the study. They called the private firm that LaCour had supposedly enlisted to conduct his survey. The firm, however, said that it did not conduct such a survey. LaCour had also reported to the grad students the name of an employee of the survey firm with whom he worked. The firm, however, said that it had no records of such an employee ever working at the firm.
After confronting his coauthor, Green requested that Science retract the article. LaCour still stands by his results. Science, faced with this dilemma, has not (yet) retracted the paper. Instead, it has issued an “editorial expression of concern.”
On Friday, a reporter emailed to ask me about two papers that LaCour has written and whether he faked results in those papers. One, entitled “Improving Media Measurement: Evidence from the Field,” was published in the journal Political Communication. The results seem sound. I believe that LaCour and his coauthor did not fake those results. The other paper, “The Echo Chambers are Empty: Direct Evidence of Balanced, Not Biased Exposure to Mass Media,” is an unpublished working paper, which, according to a footnote in the paper, was presented at the most recent Midwest Political Science Association conference.
The paper examines the news “diet” of voters. It concludes that the news diet of Republicans hardly differs from that of Democrats. In contrast to conventional wisdom, voters do not, primarily, get their news from “echo chambers.”
I could find no problem with the main results of the paper. However, to derive those results, LaCour writes a section describing a way to measure media bias, which he uses to classify a media outlet as “conservative,” “liberal,” or “centrist.” I found many problems with this section, and I am highly confident that LaCour faked the results for this section.
The section builds upon a method proposed by economists Matthew Gentzkow and Jesse Shapiro. These researchers construct a list of “loaded political phrases” (e.g. “death tax” vs. “estate tax”). They then examine congressional speeches to determine the frequencies that legislators of different ideologies use the phrases. They compute similar frequencies of media outlets, then, after comparing the frequencies, make conclusions such as “The New York Times is approximately as liberal as a speech by Sen. Joe Lieberman” and “The Washington Times is approximately as liberal as a speech by Sen. Susan Collins.”
Many Ricochet readers will recognize the similarity of the method with a method that I have devised with University of Missouri economist Jeff Milyo. Indeed, Milyo and I devised our method first, and Gentzkow and Shapiro give full credit to Milyo and me for inspiring much of theirs.
LaCour introduces a method that is almost identical to Gentzkow and Shapiro’s method. However, in at least some minor ways, LaCour introduces some innovations to their method. I do not believe, however, that he really understands Gentzkow and Shapiro’s method, much less the innovations that he seemingly introduces. Nor do I believe that he actually carried out the method he introduces. That is, I do not believe he actually wrote the code that would be necessary to execute his method and, accordingly, I don’t believe that he really computed the estimates that he reports from his method.
I do not have proof of my beliefs, only strong evidence. Proof would be simple—we ask LaCour to provide the code he wrote as well as the output from his computer. But until we see that proof—or LaCour’s refusal to provide such proof—I am willing to speculate. The following are key pieces of evidence that lead me to my (speculative) conclusion, that LaCour faked at least some of the results of his paper.
Perhaps the strongest evidence is illustrated in LaCour’s Figure 1. The figure lists his “point estimate” of how liberal or conservative various media outlets are. With each point estimate, the figure also lists a line segment indicating the confidence interval for the estimate. A rule of thumb among statisticians is that the confidence interval for any parameter estimate will be smaller when the researcher has more data to compute that estimate. More specifically, the rule of thumb is the following: if a researcher uses N data points to compute an estimate, then the confidence interval of that estimate will be proportional to one over the square root of N.
Now consider LaCour’s Figure 1. Note that some of the outlets he examines are on air for several hours per week, while others are on air for an hour or less per week. For instance, Meet the Press airs only 30 minutes per week, while Fox & Friends airs approximately 15 hours per week. (During 2006, the period for which LaCour examined media outlets, Fox & Friends aired 3 hours per day. It’s not clear whether LaCour included the weekend edition of Fox & Friends in his data set. If so, then it airs approximately 21 hours per week.)
Because of this fact, LaCour’s data set should have approximately 30 times as many observations for Fox & Friends as it has for Meet the Press. Consequently, the confidence interval for Meet the Press should be approximately 5.5 (i.e. approximately the square root of 30) times larger than the confidence interval for Fox & Friends. Instead, the confidence interval for Meet the Press is actually smaller than that for Fox & Friends. In general, the table should have produced a pattern, whereby the few-hours-per-week shows, such as Meet The Press, Face the Nation, The Beltway Boys, and Real Time with Bill Maher, have much larger confidence intervals than the many-hours-per-week shows, such as The Rush Limbaugh Show, The Mark Levin Show, and Fox & Friends. Instead, there does not seem to be any pattern at all between the two types of shows.
In Appendix A of the paper, LaCour gives details of his method. Two pages of the appendix describe the part of his method that he borrows from Gentzkow and Shapiro (while giving due credit to Gentkow and Shapiro). The remaining third page describes (albeit sparsely) how he extends the Gentzkow-Shapiro method. By my estimate, approximately 90% of the first two pages is a word-for-word copying of sentences from Gentzkow and Shapiro’s article.
While this might appear to be plagiarism, I would not call it that. LaCour is upfront about borrowing from Gentkow and Shapiro. However, I do believe that it is evidence that he does not really understand the Gentzkow-Shapiro method. I believe that if LaCour had really understood the Gentzkow-Shapiro method, he would have written a description in his own words. Instead, his cut-and-paste method is clumsy and at times poorly informs the reader about what Gentzkow and Shapiro did. Also strange is the fact that LaCour usually does not copy the sentences with technical details. Instead, he mostly copies the sentences that focus on the intuition of Gentzkow and Shapiro’s method. I think this is another piece of evidence that LaCour did not fully understand Gentzkow and Shapiro’s method—he only copied the (non-technical) sentences that he understood.
When copying the sentences from Gentzkow and Shapiro, Lacour sometimes makes transcribing mistakes that are telling. For instance, when he describes the “Chi-squared” statistic that Gentzkow and Shapiro construct, instead of writing the Greek letter “Chi,” as Gentzkow and Shapiro wrote, he substituted “x.” While this may appear to be an innocent typo, those who are well-trained in statistics understand that this is often the sign of a person who is not familiar with a “Chi-squared” distribution. If a researcher is not really familiar with the Chi-Squared distribution, I don’t see how he or she could fully understand the Gentzkow-Shapiro method. And if LaCour does not fully understand the Gentzkow-Shapiro method, I do not see how he could have executed the method that he describes in his paper.
Yet another instance involves LaCour’s Table 2. Here he lists the 30 loaded political phrases that Democrats said most often and the 30 loaded political phrases that Republicans said most often. According to LaCour’s paper, his media data come from only one year, 2006. Meanwhile, the title of his Table 2 is “Most Partisan Phrases from 109th Congressional Record (2005-07).” The sixty phrases that he lists are identical to, and are listed in the same order as, the 60 two-word phrases that Gentzkow and Shapiro list in their Table 1. Strangely, Gentzkow and Shapiro’s list is constructed with data only from 2005. Assuming LaCour did not make a typo, this means the 2005 data and the 2005-07 data produce identical lists.
Although I have not checked the data, I strongly doubt this is really true. If so, it means that LaCour did not really do the analysis he said he did. Instead, it appears that he simply copied the lists that Gentzkow and Shapiro published in their article. Of course, it is possible that LaCour made a typo—that is, where he wrote “2005-07” he really meant “2005.” But if this is the case, why would he select 2005 as the year for his congressional data, while selecting 2006 as his year for media data?
Some other problems involve the third page of his appendix, where he describes his extension of the Gentzkow -Shapiro method. Unlike the first two pages of the appendix, where he describes his summary of the Gentzkow-Shapiro method, this page is not written in complete sentences. Instead, it is a series of bullet points with fragments of sentences. It is consequently very difficult to follow. It appears to be a form of “Bayesian analysis,” however, nowhere does LaCour use the word “Bayesian.” Further there are a number of strange irregularities in the description. I strongly doubt that he fully understands how to do a Bayesian statistical analysis. Consequently, I do not believe that he really executed the method that he says he executed.
In his defense, there are some other pieces of evidence in the paper that suggest I’m wrong. One is that, whereas Gentzkow and Shapiro construct a list of two- and three- word phrases, LaCour (purportedly) constructs a list of only two-word phrases. If he really just made up his results, this choice is a little odd. The simplest and easiest thing to do would be to say that he followed Gentzkow and Shapiro’s method as closely as possible.
Another piece of evidence is his estimate of the slant of the Thom Hartmann show. Hartmann is an avowed progressive. By my casual observation, his show is at least as left-wing as the average primetime MSNBC show. However, according to LaCour’s estimates, the Thom Hartmann show is conservative. This is clearly an anomalous result, one that might suggest an inaccuracy with LaCour’s method. However, if a researcher simply makes up his results, it would be most natural for him not to make up any anomalous results. A retort to this line of thinking is “Ahh, but if you’re making up results, the smart thing to do is to throw in one or a few anomalous results so that people won’t suspect that you’re making up the results.” But if the latter is the case, and you were LaCour, wouldn’t you point out the anomalous result? In the prose of his paper, LaCour does not even mention the Hartmann result, much less that is anomalous.
Nevertheless, on net I think the bulk of the evidence suggests that LaCour faked at least some of the results of this second paper. Not only would I be willing to bet on this conclusion, I would be willing to give 10:1 odds on it. Still, I’m not certain, and I would be hesitant to give 100:1 odds. And I would refuse to give 1,000:1 odds.
Regardless, I am certain that LaCour faked the results of the original paper—the one published in Science. I predict that UCLA will refuse to award him a PhD, and I predict that Princeton will retract the assistant professorship that it offered him. I predict that UCLA or Princeton or both will conduct an investigation. I suspect that they will find that LaCour faked results in a few papers, not just one.
I also believe that there are lots of similar, yet so far undetected, cases like LaCour’s in political science. Over the past five or ten years I have noticed more and more papers written by young political scientists (grad students and assistant professors) that claim to use extremely fancy and complex statistical techniques, yet the authors do not seem to fully understand the techniques that they claim to use. Their descriptions of their statistical methods are often as opaque as the LaCour appendix that I discuss above.
I won’t be surprised if, because of LaCour, journal editors and other researchers begin to request computer code and output of such papers. I won’t be surprised if we find a few more cases where the results have been completely fabricated. My hunch is that within political science there are about a half dozen additional Michael LaCours—researchers who are perceived to be solid and talented scholars yet have built that perception partly upon faked results.
Regardless of my latter speculation, the LaCour case is a genuine scandal for political science. Although there have probably been bigger scandals in political science, I cannot think of one.
The scandal could not come at a worse time for the field. On Wednesday, the House of Representatives passed H.R. 1806, an authorization bill for the National Science Foundation. The bill, written before the LaCour scandal appeared in the news, recommends funding levels for the NSF. Rather than allowing NSF bureaucrats to allocate funding across various divisions and programs within NSF, the bill specifies this allocation. Its recommended allocation for the Social, Behavioral, and Economic Sciences division at NSF is a 45% cut from its previous year’s funding level. Further, the bill specifies that this division can only spend money on certain areas of research—e.g. projects that will help to improve the U.S.’s economy or to help its national defense. These specifications would bar the vast majority of political science research from receiving any NSF funding.
The American Political Science Association has encouraged its members to lobby Congress to oppose the cuts. APSA and several progressive commentators have called the cuts part of the “anti-science” agenda of Republicans. In truth, however, H.R. 1806 increases the overall budget of NSF by 3.4%. Anyone with basic math skills can deduce that this necessarily means that the non-social-science divisions receive on average an increase in funds. Indeed, some areas of the NSF receive a significant increase. For instance the “Biological Sciences” division receives an increase of 14%. The “Mathematics and Physical Sciences” division receives an increase of 12%.
(To see some examples of the “anti-science” claims, I encourage readers to peruse the alert that APSA has issued on its website or to do a search on twitter for the key words “#NoToHR860” or “#Stand4Science.” I suspect that readers will, like me, find these claims highly deceptive.)
It is very rare for political scientists to have our results mentioned alongside results from the “hard” sciences. But that’s exactly what happened when the journal Science published the Green-LaCour article. That the article is fraudulent is a mild blow to the prestige of the discipline. Although I doubt it will have much impact on the NSF-funding issue, it surely doesn’t help the case that political scientists are trying to make to members of Congress.