Merve Alanyali, University of Warwick
Tobias Preis, Helen Susannah Moat
In recent years, news reports have described a number of prominent outbursts of protests in countries around the world, which in some cases leading to political change. Much media attention has been focused on the increasing usage of social media to coordinate and provide instantly available reports on these protests. As a result of improved connectivity, posts to social media sites are steadily beginning to shift from solely text based reports to sharing of visual media such as photographs and videos. Here, we explore whether the data created through such widespread usage of online services may offer a valuable new source for measurements of behaviour during protests.
Specifically, we investigate whether data on photographs uploaded to the photo sharing website Flickr can be used to identify protest outbreaks around the world. We analyse a large corpus of metadata on the 25 million geotagged photographs taken and uploaded to Flickr in 2013. For each geotagged photograph, we retrieve data on both the time and the place at which the photograph was taken. For each week, for each of the 244 countries and regions we used in our analyses, we determine how many photographs were taken and uploaded with the word ‘protest’in either the title, photograph description or photograph tag. To increase the cover of our analyses, we also include the word ‘protest’translated into 33 further languages. The overall number of photos taken and uploaded to Flickr in different countries and regions may differ. To account for this, for each country and region in our analysis, we normalise the number of photographs tagged with a word signifying ‘protest’by the total number of geotagged photographs taken and uploaded to Flickr in that area during the corresponding week.
To determine whether we can find any evidence that changes in the number of protest tagged photographs taken and uploaded to Flickr relate to changes in the number of protest outbreaks, we require data on when and where protests have occurred. Such ground truth data can be difficult to obtain. Most studies of civil unrest therefore rely on data from newspaper reports as a proxy for ground truth. Following this approach, here we determine how many protest related articles for each of the 244 countries and regions were published in the online edition of The Guardian in each week in 2013.
We deem an article as protest related if it is tagged with the word ‘protest’, and we deem an article as covering news related to one of the countries and regions analysed if it is tagged with the country and region’s name. To account for differences in coverage of news in different places by The Guardian, we also determine the total number of articles published in each week and tagged with each place’s name. To determine whether we can find statistical evidence of a relationship between the number of ‘protest’labelled photographs taken and uploaded to Flickr and reports of protests in The Guardian, we consider both datasets at weekly granularity.
To quantify the link between the data mined from Flickr and protest related in The Guardian articles, we build a logistic regression panel model. To account for unobserved differences in coverage between countries and regions and weeks, we include them as fixed effects. Our results suggest that a greater number of ‘protest’labelled Flickr photographs in a given week and area corresponds to a greater proportion of The Guardian articles about that country and region being tagged with the word ‘protest’ (Flickr predictor: ▓’= 2.95, SE = 0.31, z = 9.48, N = 12932, p <� 0.001). The odds ratio corresponding to an increase of 0.1 in the normalised number of ‘protest’tagged Flickr photos is 1.34. This implies that if we fix the country and region and week effects, increasing the normalised number of ‘protest’tagged Flickr pictures by 0.1 will increase the odds of a protest related The Guardian article by 34%.
For comparison, we construct a simple baseline model which captures differences in protest frequency between countries and regions, and differences in protest frequencies across different weeks, by building a logistic regression panel model leaving out the Flickr predictor. We find that the model including data on the normalised number of ‘protest’labelled Flickr photographs accounts for more variance in the proportion of The Guardian articles tagged with the word ‘protest’than this simple baseline model. (McFadden R 2 for baseline model = 0.34, McFadden R 2 for Flickr model = 0.35, ╟’2 (1) = 84.48, p <� 0.001, Likelihood Ratio Test). Our results are in line with the striking hypothesis that data on photographs uploaded to Flickr may contain signs of protest outbreaks. Our findings underline the potential value of photographs uploaded to the Internet as a source of global, cheap and rapidly available measurements of human behaviour in the real world. DOI: 10.1371/journal.pone.0150466