data science, digital politics, smart cities...|

Understanding news story chains using information retrieval and network clustering techniques

I have a new draft paper out with my colleague Tom Nicholls, entitled Understanding news story chains using information retrieval and network clustering techniques. In it we address what we perceive as an important technical challenge in news media research, which is how to group together articles that all address the same individual news event. This challenge is unmet by most current approaches in unsupervised machine learning as applied to the news, which tend to focus on the broader (also important!) problem of grouping articles in topic categories. It is in general a difficult problem, as we are looking for what are typically small “chains” of content on the same event (e.g. four or five different articles) amongst a corpus of tens of thousands of articles, most of which are unrelated to each other.

Our approach makes use of algorithms and insight drawn from the fields of both information retrieval [IR] and network clustering to develop a novel unsupervised method of news story chain detection. IR techniques (which are used to build things like search engines) especially haven’t been much employed in the social sciences, where the focus has more been on machine learning. But these algorithms were much closer to our problem as connecting small amounts of news stories is quite similar to the task of searching a huge corpus of documents in response to a specific user query.

The resulting algorithm works pretty well, though it is very difficult to validate properly because of the nature of the data! We use it to pull out a couple of interesting first order descriptive statistics about news stories in the UK, for example the graphic above shows the typical evolution of news stories after the publication of an initial article.

Just a draft at the moment so all feedback welcome!

By |2018-01-31T13:28:52+00:00January 31st, 2018|News, Python, Research, Social Science Computing|0 Comments

The Social News Gap: New article in Journal of Communication

I have a new article out in the Journal of Communication which analyses which types of news get shared the most. Based on articles published in BBC news, the research shows that even though readership drives sharing in general, certain types of articles lend themselves more to being shared than others.

Figure 2

The graphic above gives a glimpse of some of the results, by visualising the relationship between reading and sharing for different categories of news article. We can see that reading and sharing are not in a linear relationship: rather some types of article are well shared but not well read, and vice versa. For example, stories about technology and social welfare seem to be shared more, whilst stories about violent crime and accidents are shared less. This creates a social “news gap” (following Boczkowski and Mitchelstein’s traditional news gap) whereby peoples preferences for sharing and their preferences for reading diverge. I suggest that, as more and more people start to consume news on social media, the implications of this become potentially more profound: as social media starts to filter out certain types of news whilst emphasising others.


By |2016-07-01T10:41:31+01:00July 1st, 2016|News, Research, Social Media|0 Comments

The History of Social News

I am giving a presentation tomorrow at the IJPP conference here in Oxford. It’s being hosted by the Reuters Institute who are world leaders in the study of contemporary news organisations, and I’m really excited to be going.

Together with Scott Hale I am giving a presentation on the “history” of social news. We have an 8 year long dataset (2002-2010) consisting of links to millions of news articles which we have used to trace the beginnings of social media news sharing. We are interested to know whether the types of news being shared have changed over time as social media platforms have massified; we’re also interested in looking at whether site design changes (such as bringing in sharing buttons) have had a major impact.

Twitter - Facebook Comparison

The project is at an early stage but the results are pretty interesting so far (to me). To give one tidbit, we show that in this large scale dataset there is only a weak correlation between sharing on Twitter and Facebook at the article level, with Twitter tending to share more sports news than Facebook (see image).

By |2015-09-16T12:56:18+01:00September 16th, 2015|News, Research, Social Media|0 Comments

Measuring Online News Consumption

Ofcom has just released a report on measuring online news consumption and supply, which I contributed to. It tackles the question of how to meaningfully measure the size of a news outlet’s audience in the digital age. This is a key issue for a regulator, in for example deciding whether to allow a takeover, and it’s also one that’s far from clear now that a lot of news consumption takes place online.

Ofcom - Measuring Online News Consumption and Supply

The report examines all sorts of different metrics which regulators could use, from amount of visitors to the website and time on page to amount of social sharing. It also highlights that while the best metric isn’t clear the detail offered is considerably better than what could be achieved in the offline age, and hence the digital environment also presents the opportunity to really understand audience behaviour as never before.

Read the report here.

By |2014-11-10T10:42:10+00:00November 10th, 2014|News|2 Comments

QR codes on ballot papers


I was asked to provide a brief comment on this BBC Oxford article about the insertion of QR codes onto ballot papers by a political party in the south east. A really smart idea (and the party is pretty interesting as well), though also one which challenges something about the way we think politics ought to work -> should people still be deciding as they hold the ballot paper in their hand?

By |2014-05-16T14:29:16+01:00May 16th, 2014|News|8 Comments