Michael Jensen, Autonomous University of Barcelona, Department of Political Science
Brian Kesser, IGMAS Technologies, Inc., Woodland Hills, CA, USA
Introduction
The number of followers has figured centrally among the metrics indicating the level of success for a political campaign's use of social media. Furthermore, these figures have been used reliably to predict election outcomes as candidates in with more social media followers than their opponents generally have been victorious in American elections. However, the existence of services that will create social media followers raises the possibility that a political campaign may engage in astroturf to create the illusion of support among certain segments of a political system. Additionally, if followers can be faked, they can also be harnessed to plant memes and drive the political conversations. As scholars are beginning to learn how to collect and analyse large volumes of digital artefacts, it is likewise important to develop criteria to distinguish authentic and from astroturf communications for two reasons. First, astroturf can distort inferences as automated or planted actors and communications fail to correspond with the wider human organization of political and social systems. For instance, this can produce significant consequences for semantic polling as bots can be programmed to flood Twitter and Facebook with messages of a particular valence. Second, these distortions have consequences for the democratic operation of political campaigns as planted online communications may have consequences for voter preferences.
The 2011 Spanish General Election
The Twitter following of the two main candidates for president in the 2011 Spanish general election presents an interesting case to study political astroturfing. Over the course of the campaign, the Partido Popular's Mariano Rajoy saw considerable growth in his Twitter following, besting his rival, Alfredo Pérez Rubalcaba, by considerable margins despite the low levels of support among users of social media for Rajoy that were indicated by survey data. This was also in spite of the fact that communications via Twitter were, relative to Rubalcaba, unfavourable towards Rajoy. This paper takes a systematic look at the Twitter followers of the two main candidates for president. It addresses the following four research questions:
1. In the context of the 2011 Spanish election for the Congress of Deputies, what percentage of the Twitter followers of the two leading presidential candidates are likely astroturf rather than biographical humans?
2. To what extent is robotic message delivery and re-tweeting likely occurring?
3. What messages and links have been sent from these accounts over the course of the electoral campaign and is there evidence of coordination between these accounts and the official campaigns?
4. Are astroturf messages more vulnerable to rebuttal than other messages?
Methods:
Data Collection
There are two sources of data for this paper. The first source of data, a panel survey, consists of a nationally representative sample of over 1700 Spanish internet users conducted during the campaign period and after the election. This survey includes questions regarding respondents' online political engagement, whether they are following a candidate or campaign via social media channels, and a series of attitudinal questions regarding respondents' evaluation of the campaigns, probability to vote, and political preferences. This data establishes the behavioural context, identifying the population of individuals who follow candidates via social media channels such as Facebook and Twitter. The corpus of campaign-related tweets and follower counts were collected from the Twitter application programming interface (API) over the 56 days between the formal closure of parliament on September 26th, 2011 and the election held on November 20th, 2011. The followers of the two presidential candidates, Rajoy and Rubalcaba, were collected 3 months after the election to determine if the accounts remained active beyond that period of campaign activity. The data were collected using a series of scripts written in the R and Python languages to track follower counts, search for campaign-related posts, download follower Ids and lookup their profiles. The data collection of user profiles was carried out using two computers on separate continents over a the same weekend so as enable simultaneous data collection of user profiles while minimizing the load on Twitter's servers. This database contains nearly 350,000 followers characterized along 22 dimensions.
Data Analysis
The data analysis is conducted using a series of statistical analyses and text mining routines in the R and Python.
1. Regression analyses of based on panel data. These regressions will identify the factors which motivate individuals to follow campaigns via social media channels and construct a profile of new social media users between the pre and post election panels.
2. Identification of dubious followers. For each candidate an estimation of the overall percentage of dubious followers is created using criteria such as the account creation date, whether it has a default profile, the number and diversity of posts, and evaluations of their campaign engagement.
3. Meme diffusion. The most common links and terminological clusters posted during the campaign on Facebook or Twitter are identified. The repetition of these links and clusters is charted over time and with respect to the role of bots with simultaneous identical posts or re-tweets in message and link repetition.
Results
Conclusions
At this point we can provide some preliminary observations based on the survey evidence and some of the data collected from Twitter. The survey evidence suggests that while interest in the election is a factor behind individuals following campaigns online, support for either of the main candidates was not. These results also hold true for whether or not a person uses social media. There are two areas that suggest relative Rubalcaba's Twitter following, Rajoy's is more likely to contain fake followers: the date on which their followers accounts were created and the level of identical statuses among their respective groups of followers.
Contrary to what one would expect from the survey evidence, a high proportion of Rajoy's followers had accounts created during the electoral campaign. Second, Rajoy's following contained a higher percentage of accounts with zero tweets, suggesting relatively inactive accounts, as well as exactly identical tweets, suggesting automated tweeting.