data science, digital politics, smart cities...|jonathan.bright@oii.ox.ac.uk

Understanding news story chains using information retrieval and network clustering techniques

I have a new draft paper out with my colleague Tom Nicholls, entitled Understanding news story chains using information retrieval and network clustering techniques. In it we address what we perceive as an important technical challenge in news media research, which is how to group together articles that all address the same individual news event. This challenge is unmet by most current approaches in unsupervised machine learning as applied to the news, which tend to focus on the broader (also important!) problem of grouping articles in topic categories. It is in general a difficult problem, as we are looking for what are typically small “chains” of content on the same event (e.g. four or five different articles) amongst a corpus of tens of thousands of articles, most of which are unrelated to each other.

Our approach makes use of algorithms and insight drawn from the fields of both information retrieval [IR] and network clustering to develop a novel unsupervised method of news story chain detection. IR techniques (which are used to build things like search engines) especially haven’t been much employed in the social sciences, where the focus has more been on machine learning. But these algorithms were much closer to our problem as connecting small amounts of news stories is quite similar to the task of searching a huge corpus of documents in response to a specific user query.

The resulting algorithm works pretty well, though it is very difficult to validate properly because of the nature of the data! We use it to pull out a couple of interesting first order descriptive statistics about news stories in the UK, for example the graphic above shows the typical evolution of news stories after the publication of an initial article.

Just a draft at the moment so all feedback welcome!

By |2018-01-31T13:28:52+00:00January 31st, 2018|News, Python, Research, Social Science Computing|0 Comments

Does Campaigning on Social Media Make a Difference?

I’ve got a new draft paper out with a host of colleagues here at the OII entitled Does Campaigning on Social Media Make a Difference? Evidence from candidate use of Twitter during the 2015 and 2017 UK Elections. There’s an enormous volume of research on the activities of politicians on social media, especially around election time, but not a lot of it has actually addressed whether this activity ‘makes a difference’, i.e. helps to win votes. Part of the reason for this is that measuring ‘campaign effects’ is quite difficult (unless you can convince campaigns themselves to participate in field experiments) and most of the data is purely cross-sectional which means a host of causality problems in this type of context.

Our study improves the situation by taking advantage of the fact that the UK has recently had two general elections in quick succession, and a considerable proportion of politicians (around 800 in fact) fought in both of them. This allowed us to create a panel dataset of politician social media use (in particular their Twitter activity) and electoral outcomes, which allows for much stronger causal claims (essentially we look at whether a change in the level of social media use by candidates was correlated with a change in vote share outcomes, controlling for factors such as the party they belong to).

The results were pretty interesting – we found a large amount of Twitter activity, spread throughout the country (see graphics), which support the idea that social media use is now a normal part of political campaigns. However the level of effort did vary quite a lot and this allowed us to explore our key interest, where we did indeed find that increasing Twitter activity correlates with increased levels of votes, even in this pretty strong panel data design. So – some good supporting evidence that politicians aren’t wasting their time on social media!

By |2018-01-31T13:39:14+00:00January 10th, 2018|Politics and Democracy, Social Media|0 Comments