The Problem with Field Experiments

Field experiments are spectacular tools for discovering what works in the mess of the real world at a given point in time. But in many ways, because of that mess of reality, they are often blunt and inefficient tools for investigating message effectiveness and political psychology in general.

A fascinating recent field experiment testing the political effectiveness of Facebook ads illustrates many of the problems of this approach.

A big problem with field tests is that they typically cannot disentangle who is actually exposed to a treatment and who is not. Field tests entangle mode of delivery with the message content, and therefore seriously degrade one’s ability to identify differences in message effectiveness. Essentially, you’re confounding tests of mode of delivery and message content. 

An example might help . . .

Joe is in the treatment group, and his Facebook page loads the ad treatment 10 times over three days (or he was left a flier at his house), but Joe might not have actually seen the ad (or flier). Maybe Joe has become very good at screening out these ubiquitous ads so that they don’t register even for the most fleeting moments (or perhaps Joe’s wife threw away the flier before he saw it). Joe was never meaningfully treated with the message because the mode of delivery was extremely ineffective for him.

Paul, on the other hand, tends to keep an eye out for Facebook ads for deals on his home beer-brewing hobby. Paul briefly scans the ads, and does see, and register fairly effectively, the ads on his page (or perhaps his wife hands him the flier to read). Paul was meaningfully treated with the message because the mode of delivery was relatively effective for him.

Both Joe and Paul are in the treatment groups, but only Paul was exposed to the treatment in a meaningful sense. And yet the analysis of the ad effects must include both Joe and Paul as being treated (this is called “intent to treat” analysis) because we can’t know who was effectively exposed to the ad (self-reports of exposure are notoriously problematic). 

Now, for a statistical test of the ad impacts, all the “Joes” and all the “Pauls” are lumped together in the “treatment” groups and compared to the control group. What that means is that any impact from being exposed to the message content will be diluted by the fact that none of the “Joes” were really exposed to the message. 

In the case of the Facebook experiment, they tested 3 different messages:

The first ad merely sought to build the candidates’ name recognition and identify him as a proud resident of the area:

[Name of candidate] for [Name of office]

My family is one of few younger families to move to [region]. Find out why!

[Picture of candidate with his family]

A second ad included a more explicit character appeal stressing the candidates’ business experience and military service:

[Name of candidate] for [Name of office]

I spent 12 years in [branch of the military] and grew a [region] small business. Connect with me today!

[Picture of candidate smiling and holding his campaign sign]

Finally, a third appeal sought to appeal to voters on a salient policy issue by stressing the candidates’ desire to improve farming in the state:

[Name of candidate] for [Name of office]

Farming is crucial to [state]’s economy. [Candidate’s first name]’s 4 WAYS to improve farming in [state] today!”

[Picture of candidate dressed nicely and giving a speech to a small crowd]

The candidate’s constituency includes a large number of people connected to the farming industry, and thus the candidate expected this to be a particularly salient issue.

The study found no significant effects from “intention to treat” voters with these Facebook ads, and no differences in effectiveness between the messages. 

Here’s the problem . . . we have no way of knowing wether these ads were ineffective because the mode of delivery is ineffective, the messages are all ineffective, or both. We have no way of knowing whether message 1, 2, or 3 is more effective if a voter is actually exposed to the message, not simply intended to be exposed. The field test most likely wasted a test of message content effectiveness by choosing a mode that was so ineffective.

In contrast, if voters were randomly assigned to these three messages within an online survey experiment, we can ensure to a much, much greater extent that Joe and Paul are both meaningfully exposed to the messages. We can prevent them from advancing in the survey until a certain amount of time has passed, we can ask them a question about the content, or we can instruct them to pay attention in advance. 

These are all artificial aspects of message delivery compared with the real world, but they ensure that we can identify the actual effect of being meaningfully exposed to a message. We can also test for specific mode/message effects and interactions . . . is a particular message more or less effective when delivered as an audio-only, radio ad, TV/web ad, or print flier? How do specific visuals moderate the impacts?

Furthermore, we can identify the impact of messages on subgroups that might be very difficult to reach through specific modes of delivery. 

All methods have pros and cons, but the much-greater control that we have over “lab” experiments offers a much more efficient means of investigating political psychology.