social data science – The Policy and Internet Blog https://ensr.oii.ox.ac.uk Understanding public policy online Mon, 07 Dec 2020 14:24:50 +0000 en-GB hourly 1 Can we predict electoral outcomes from Wikipedia traffic? https://ensr.oii.ox.ac.uk/can-we-predict-electoral-outcomes-from-wikipedia-traffic/ Tue, 06 Dec 2016 15:34:31 +0000 http://blogs.oii.ox.ac.uk/policy/?p=3881 As digital technologies become increasingly integrated into the fabric of social life their ability to generate large amounts of information about the opinions and activities of the population increases. The opportunities in this area are enormous: predictions based on socially generated data are much cheaper than conventional opinion polling, offer the potential to avoid classic biases inherent in asking people to report their opinions and behaviour, and can deliver results much quicker and be updated more rapidly.

In their article published in EPJ Data Science, Taha Yasseri and Jonathan Bright develop a theoretically informed prediction of election results from socially generated data combined with an understanding of the social processes through which the data are generated. They can thereby explore the predictive power of socially generated data while enhancing theory about the relationship between socially generated data and real world outcomes. Their particular focus is on the readership statistics of politically relevant Wikipedia articles (such as those of individual political parties) in the time period just before an election.

By applying these methods to a variety of different European countries in the context of the 2009 and 2014 European Parliament elections they firstly show that the relative change in number of page views to the general Wikipedia page on the election can offer a reasonable estimate of the relative change in election turnout at the country level. This supports the idea that increases in online information seeking at election time are driven by voters who are considering voting.

Second, they show that a theoretically informed model based on previous national results, Wikipedia page views, news media mentions, and basic information about the political party in question can offer a good prediction of the overall vote share of the party in question. Third, they present a model for predicting change in vote share (i.e., voters swinging towards and away from a party), showing that Wikipedia page-view data provide an important increase in predictive power in this context.

This relationship is exaggerated in the case of newer parties — consistent with the idea that voters don’t seek information uniformly about all parties at election time. Rather, they behave like ‘cognitive misers’, being more likely to seek information on new political parties with which they do not have previous experience and being more likely to seek information only when they are actually changing the way they vote.

In contrast, there was no evidence of a ‘media effect’: there was little correlation between news media mentions and overall Wikipedia traffic patterns. Indeed, the news media and Wikipedia appeared to be biased towards different things: with the news favouring incumbent parties, and Wikipedia favouring new ones.

Read the full article: Yasseri, T. and Bright, J. (2016) Wikipedia traffic data and electoral prediction: towards theoretically informed models. EPJ Data Science. 5 (1).

We caught up with the authors to explore the implications of the work.

Ed: Wikipedia represents a vast amount of not just content, but also user behaviour data. How did you access the page view stats — but also: is anyone building dynamic visualisations of Wikipedia data in real time?

Taha and Jonathan: Wikipedia makes its page view data available for free (in the same way as it makes all of its information available!). You can find the data here, along with some visualisations

Ed: Why did you use Wikipedia data to examine election prediction rather than (the I suppose the more fashionable) Twitter? How do they compare as data sources?

Taha and Jonathan: One of the big problems with using Twitter to predict things like elections is that contributing on social media is a very public thing and people are quite conscious of this. For example, some parties are seen as unfashionable so people might not make their voting choice explicit. Hence overall social media might seem to be saying one thing whereas actually people are thinking another.

By contrast, looking for information online on a website like Wikipedia is an essentially private activity so there aren’t these social biases. In other words, on Wikipedia we can directly have access to transactional data on what people do, rather than what they say or prefer to say.

Ed: How did these results and findings compare with the social media analysis done as part of our UK General Election 2015 Election Night Data Hack? (long title..)

Taha and Jonathan: The GE2015 data hack looked at individual politicians. We found that having a Wikipedia page is becoming increasingly important — over 40% of Labour and Conservative Party candidates had an individual Wikipedia page. We also found that this was highly correlated with Twitter presence — being more active on one network also made you more likely to be active on the other one. And we found some initial evidence that social media reaction was correlated with votes, though there is a lot more work to do here!

Ed: Can you see digital social data analysis replacing (or maybe just complementing) opinion polling in any meaningful way? And what problems would need to be addressed before that happened: e.g. around representative sampling, data cleaning, and weeding out bots?

Taha and Jonathan: Most political pundits are starting to look at a range of indicators of popularity — for example, not just voting intention, but also ratings of leadership competence, economic performance, etc. We can see good potential for social data to become part of this range of popularity indicator. However we don’t think it will replace polling just yet; the use of social media is limited to certain demographics. Also, the data collected from social media are often very shallow, not allowing for validation. In the case of Wikipedia, for example, we only know how many times each page is viewed, but we don’t know by how many people and from where.

Ed: You do a lot of research with Wikipedia data — has that made you reflect on your own use of Wikipedia?

Taha and Jonathan: It’s interesting to think about this activity of getting direct information about politicians — it’s essentially a new activity, something you couldn’t do in the pre-digital age. I know that I personally [Jonathan] use it to find out things about politicians and political parties — it would be interesting to know more about why other people are using it as well. This could have a lot of impacts. One thing Wikipedia has is a really long memory, in a way that other means of getting information on politicians (such as newspapers) perhaps don’t. We could start to see this type of thing becoming more important in electoral politics.

[Taha] .. since my research has been mostly focused on Wikipedia edit wars between human and bot editors, I have naturally become more cautious about the information I find on Wikipedia. When it comes to sensitive topics, sach as politics, Wikipedia is a good point to start, but not a great point to end the search!


Taha Yasseri and Jonathan Bright were talking to blog editor David Sutcliffe.

]]>
Edit wars! Examining networks of negative social interaction https://ensr.oii.ox.ac.uk/edit-wars-examining-networks-of-negative-social-interaction/ Fri, 04 Nov 2016 10:05:06 +0000 http://blogs.oii.ox.ac.uk/policy/?p=3893
Network of all reverts done in the English language Wikipedia within one day (January 15, 2010). More details:
Network of all reverts done in the English language Wikipedia within one day (January 15, 2010). Read the full article for details.
While network science has significantly advanced our understanding of the structure and dynamics of the human social fabric, much of the research has focused on positive relations and interactions such as friendship and collaboration. Considerably less is known about networks of negative social interactions such as distrust, disapproval, and disagreement. While these interactions are less common, they strongly affect people’s psychological well-being, physical health, and work performance.

Negative interactions are also rarely explicitly declared and recorded, making them hard for scientists to study. In their new article on the structural and temporal features of negative interactions in the community, Milena Tsvetkova, Ruth García-Gavilanes and Taha Yasseri use complex network methods to analyze patterns in the timing and configuration of reverts of article edits to Wikipedia. In large online collaboration communities like Wikipedia, users sometimes undo or downrate contributions made by other users; most often to maintain and improve the collaborative project. However, it is also possible that these actions are social in nature, with previous research acknowledging that they could also imply negative social interactions.

The authors find evidence that Wikipedia editors systematically revert the same person, revert back their reverter, and come to defend a reverted editor. However, they don’t find evidence that editors “pay forward” a revert, coordinate with others to revert an editor, or revert different editors serially. These interactions can be related to the status of the editors. Even though the individual reverts might not necessarily be negative social interactions, their analysis points to the existence of certain patterns of negative social dynamics within the editorial community. Some of these patterns have not been previously explored and certainly carry implications for Wikipedia’s own knowledge collection practices — and can also be applied to other large-scale collaboration networks to identify the existence of negative social interactions.

Read the full article: Milena Tsvetkova, Ruth García-Gavilanes and Taha Yasseri (2016) Dynamics of Disagreement: Large-Scale Temporal Network Analysis Reveals Negative Interactions in Online Collaboration. Scientific Reports 6 doi:10.1038/srep36333

We caught up with the authors to explore the implications of the work.

Ed: You find that certain types of negative social interactions and status considerations interfere with knowledge production on Wikipedia. What could or should Wikipedia do about it — or is it not actually a significant problem?

Taha: We believe it is an issue to consider. While the Wikipedia community might not be able to directly cope with it, as negative social interactions are intrinsic to human societies, an important consequence of our report would be to use the information in Wikipedia articles with extra care — and also to bear in mind that Wikipedia content might carry more subjectivity compared to a professionally written encyclopaedia.

Ed: Does reverting behaviour correlate with higher quality articles (i.e. with a lot of editorial attention) or simply with controversial topics — i.e. do you see reverting behaviour as generally a positive or negative thing?

Taha: In a different project we looked at the correlation between controversy and quality. We observed that controversy, up to a certain level, is correlated with higher quality of the article, specifically as far as the completeness of the article is concerned. However, the articles with very high scores of controversy, started to show less quality. In short, a certain amount of controversy helps the articles to become more complete, but too much controversy is a bad sign.

Ed: Do you think your results say more about the structure of Wikipedia, the structure of crowds, or about individuals?

Taha: Our results shed light on some of the most fundamental patterns in human behavior. It is one of the few examples in which a large dataset of negative interactions is analysed and the dynamics of negativity are studied. In this sense, this article is more about human behavior in interaction with other community members in a collaborative environment. However, because our data come from Wikipedia, I believe there are also lessons to be learnt about Wikipedia itself.

Ed: You note that by focusing on the macro-level you miss the nuanced understanding that thick ethnographic descriptions can produce. How common is it for computational social scientists to work with ethnographers? What would you look at if you were to work with ethnographers on this project?

Taha: One of the drawbacks in big data analysis in computational social science is the small depth of the analysis. We are lacking any demographic information about the individuals that we study. We can draw conclusions about the community of Wikipedia editors in a certain language, but that is by no means specific enough. An ethnographic approach, which would benefit our research tremendously, would go deeper in analyzing individuals and studying the features and attributes which lead to certain behavior. For example, we report, at a high level, that “status” determines editors’ actions to a good extend, but of course the mechanisms behind this observation can only be explained based on ethnographic analysis.

Ed: I guess Wikipedia (whether or not unfairly) is commonly associated with edit wars — while obviously also being a gigantic success: how about other successful collaborative platforms — how does Wikipedia differ from Zooniverse, for example?

Taha: There is no doubt that Wikipedia is a huge success and probably the largest collaborative project in the history of mankind. Our research mostly focuses on its dark side, but it does not question its success and value. Compared to other collaborative projects, such as Zooniverse, the main difference is in the management model. Wikipedia is managed and run by the community of editors. Very little top-down management is employed in Wikipedia. Whereas in Zooniverse for instance, the overall structure of the project is designed by a few researchers and the crowd can only act within a pre-determined framework. For more of these sort of comparisons, I suggest to look at our HUMANE project, in which we provide a typology and comparison for a wide range of Human-Machine Networks.

Ed: Finally — do you edit Wikipedia? And have you been on the receiving end of reverts yourself?

Taha: I used to edit Wikipedia much more. And naturally I have had my own share of reverts, at both ends!


Taha Yasseri was talking to blog editor David Sutcliffe.

]]>