data science, digital politics, smart cities...|

Postcode sector counts of alcohol points of sale from OpenStreetMap data

I have a new article out in the journal Health & Place entitled OpenStreetMap data for alcohol research: Reliability assessment and quality indicators, written in conjunction with a number of people here at the OII and elsewhere. My colleague David Humphreys at SPI got me interested in the area when he told me about how difficult it was to construct local area indicators of alcohol availability in the UK, and how this was hampering research in the field. I wanted to see whether data in OpenStreetMap could fix the problem, as in general I’m pretty interested in the extent to which web data can be used as a valid proxy measurement for real life quantities of interest. Stefano de Sabbata, Sumin Lee and Bharath Ganesh all contributed to the analysis.

We did a few different things in the article: we conducted a validation of a random sample of 2,000 licenses we knew to exist, we used OSM data to duplicate a previous study in the area (E.A. Richardson et al. 2015 Is local alcohol outlet density related to alcohol-related morbidity and mortality in Scottish cities? Health Place, 33, 172-180), and we used a technique developed by Stefano to measure the ‘quality’ of OSM data in a given area. We showed that OSM is about 50% complete in terms of the amount of data it contains (in the specific case of alcohol licenses), and also that we could use the quality indicators to find areas with more complete alcohol data.

Quality of OpenStreetMap Data in Britain

Alongside the article, we are also releasing more general estimates of alcohol outlet prevalence across Britain, which are drawn from OpenStreetMap. We thought they might be of use to other researchers working in the area of the spatial availability of alcohol. They are a simple count of alcohol points of sale within each postcode sector in the UK, according to the data in OSM (see the paper for details of how they were counted). We’re also releasing an accompanying quality metric with each postcode sector so researchers can determine how trusted the OSM data should be (again see the paper for details on how it is constructed). The spatial distribution of the quality metric in the UK is mapped above. Feel free to reach out to me if you have any questions!

Get the estimates themselves here: OSM GB Alcohol Outlet Counts and Quality Index

The full reference for the paper:

J Bright, S De Sabbata, S Lee, B Ganesh, DK Humphreys. 2018. OpenStreetMap data for alcohol research: Reliability assessment and quality indicators. Health & Place 50, 130-136

and another related paper using the same dataset:

J Bright, S De Sabbata, S Lee. 2018. Geodemographic biases in crowdsourced knowledge websites: Do neighbours fill in the blanks? GeoJournal, 83, 3, 427–440

This research was partially funded by a grant from the ESRC (Grant no. ES/M010058/1).

By |2018-06-12T09:42:28+01:00June 11th, 2018|Research, Smart Cities, Social Web|0 Comments

Predicting elections with Wikipedia data: new article in EPJ Data Science

Taha Yasseri and I have a new article out in EPJ Data Science which looks at the subject of electoral prediction using page view data from Wikipedia. Forecasting electoral results with some form of novel internet data is really a growth area in the literature at the moment, with a huge amount of research teams trying out different approaches. However I think our paper nevertheless makes a novel contribution, in a couple of respects. First, our model is theory driven rather than taking a machine learning approach, by which I mean that we try and theorise the mechanism generating Wikipedia page view data and how that relates to electoral outcomes, rather than simply looking at a range of indicators to see if any of them offers any predictive power. Second, we test a reasonably large set of electoral results: a group of around 60 parties in the European Parliament elections in 2014, whereas many other studies look at prediction only in the case of one election.

We found a number of things: we are able to show that the majority of online information seeking happens in the couple of days before the election (left hand panel in the figure); we are also able to show that page views do seem to offer indicators of a number of things happening in the election, such as turnout levels (right hand panel in the figure) and overall electoral results. Wikipedia was particularly good at predicting the emergence of small parties which were shooting to prominence (something which has become a feature of European politics in the last decade), even if it did tend to overstate their final result.

In future work, we intend to spread the work out to more countries and more types of information seeking.

By |2016-08-26T16:48:27+01:00August 26th, 2016|Politics and Democracy, Research, Social Web|0 Comments

The real component of virtual learning

Monica Bulger, Cristobal Cobo and I have a new paper out in Information, Communication and Society where we investigate real world meetings organised by MOOC users. These meetings are sort of contradictory as of course one of the advantages of MOOCs is that they are online and can be accessed anywhere without the need to travel; yet lots of users are kind of building in this face to face component themselves, all over the world (see the map). We asked whether this was because they felt they were missing something from the MOOC experience (and were therefore sort of recreating classrooms) or whether it was more of an excuse to network and socialise (hence recreating the after school social experience). We find evidence for both motivations though the former is stronger.

Meetup - Map

These meetings show important potential to fix one of the strongest criticisms of MOOCs, which is that they are only for the really self-motivated and that many people drop out: by creating local learning communities, perhaps motivation can increase. Yet this also cuts against the idea of global learning: it was clear, for obvious reasons, that most meetings take place in big cities in the developed world. Those in urban areas or developing countries simply have less people to meet with.

By |2015-07-28T08:48:14+01:00July 28th, 2015|Research, Social Web|0 Comments

Information Seeking Behaviour and Election Predictions

My colleague Taha Yasseri and I recently received a grant from the Fell Fund to extend our work on information seeking behaviour around election time, which has allowed us to bring Eve Ahearn on board on the project. Over the next few months we’re going to be really expanding the amount of elections we cover in the research, and also look at different types of information seeking signal. We’ll also be firing up the project’s research blog which we started up a few months ago. Eve has just put up the first post on subjectivity in data collection.

By |2015-02-09T14:57:09+00:00February 9th, 2015|Research, Social Web|0 Comments

Can electoral popularity be predicted using socially generated big data?

New article published with Taha Yasseri in IT – Information Technology. A short piece making the case for theoretically informed social media predictions, which is part of a larger project we are running with support from the Fell Fund over the next year or so. Read it here:

By |2014-10-01T09:13:28+01:00October 1st, 2014|Research, Social Web|0 Comments

#indyref on Wikipedia

My colleague Taha Yasseri and I are currently working on a Fell Fund project on social media data and election prediction, looking especially at data from Google and Wikipedia (first paper out soon; will also be presenting on that at IPP 2014 which should be great). As part of that we thought we’d have a bit of fun looking at Scotland’s independence referendum on Wikipedia.

For election prediction the method is relatively straightforward: examine readership stats on the party Wikipedia pages of the country in question, and see which page is read the most (of course that doesn’t correspond straight away to election results – would that life were so simple – and the idea of the project is to see what corrections and biases need to be accounted for to make it work). It isn’t quite so clear how to do that for Scotland, but (just for fun really) we compared the following pages:


First we look at the UK and Scotland -> interesting how Scotland has leapfrogged the UK in the last days of the independence campaign. Points to a yes victory?


In terms of flags, though, the Union Jack is well ahead of the Saltire, peaking in the last few days. Is it a last minute outbreak of unionism?


In terms of national dishes, meanwhile, Haggis has been dominating Fish and Chips for the full period of the campaign, with interest in Haggis especially spiking in the last couple of days.

Well, one of these graphs will predict the winner of the referendum: we just don’t know which one 😉 More seriously, I think its interesting how most of these terms are spiking in the days before the vote, showing again how the social web really responds to political events.

UPDATE: Taha has passed me the comparison of the Yes and No campaign pages, as below. Yes for a narrow win following months of No dominance – you heard it here first.


By |2014-09-18T13:39:53+01:00September 18th, 2014|Social Web|0 Comments

Python and Social Media Data for the Social Sciences

In July I gave two short workshops at the OII’s Summer Doctoral Programme and also at the Digital Humanities at Oxford Summer School. I had two great groups of bright PhD students and postdocs to teach to. The sessions were only two hours long, and its a big challenge to teach some meaningful programming skills in such a period to complete beginners (in the end, I decided to walk them through a small example project of getting news articles from an RSS feed and checking how many times they have been shared on Facebook, providing most of the code myself). I also rely on lots of technology which I can’t fully control, which is a risk (I want to teach people to connect to things like the Facebook API, which means I need to rely on getting python working on their machine, on their machine connecting to the internet through the visitor wifi, and on the FB API being up and running during class). But the tech worked, mostly, and overall experience was really positive.


In the future however I strongly believe that social science needs a better way of integrating computer programming skills into undergraduate and postgraduate teaching, so that these doctoral level workshops can be more about mastering skills and less about training beginners. So I suppose the hope is that in a few years I won’t need to teach such courses any more, even if I do enjoy them.

Why do MOOC users meet face to face?

Last week Monica Bulger, Cristobal Cobo and I presented a paper at the ICA’s pre-conference on higher education innovation. Monica and Cris are the experts in this area and did most of the heavy lifting, but I was pleased to take part, mainly out of a professional curiosity about how Massively Open Online Courses may or may not be changing the face of higher education. In the paper we looked in particular at patterns of offline meetups amongst the users of these online courses, using data from the Meetup API (my role being to facilitate data gathering and manipulation). Meetup have an open and generous stance to API data, and after a bit of coding I was able to extract information on several thousand face to face meetings of students taking part in Coursera courses in over 100 countries around the world.

Meetup - Map

More clicks on Wordle produced a word cloud of the titles of each meetup, which I can’t resist because it looks so nice even if it probably isn’t a good way of doing science.Word Cloud - Titles

What does it all mean? Beyond showing the impressive worldwide reach of Coursera, and the fact that people like face to face interaction when they are learning, we are still deciding to be honest with you. Suggestions welcome.


Over the last couple of months I’ve been involved with the “euandi” project run by Alex Trechsel at the European University Institute. euandi is a voting advice application designed to offer information to users about the extent to which their political preferences overlap with those of political parties standing in the upcoming European Parliament elections. The application is, from the perspective of a political scientist, pretty cool – you can visualize your position in political “space” and also look at which other areas of Europe have political views which align with your own, at a super low level of granularity (see pictures below). Apparently there’s a place for me in every country though I’d be best off in Sweden. Who knew? 🙂

My political europe


It’s been a really interesting project to be a part of – there were over 100 people in the team spread across all 28 member states, so I was a relatively small cog in the machine. A couple of things have stuck in the mind. It is first of all pretty difficult to position parties accurately. A lot of questions on the profiler were quite nuanced (e.g. I would support green energy even at the cost of higher energy prices, social programmes at the expense of higher taxes, etc.). However contemporary political parties won’t ever present this nuance: green energy is presented as a way of lowering costs, social programmes can be maintained without tax, etc. Is this something that turns people off contemporary politics, or has it always been this way? Not sure.

My political space

Second, like all such applications euandi presents a purely issue based view of politics – no room for questions of competence, trustworthiness etc. Lots of people are surprised when using it that they are placed with an apparently minor or radical party (and of course many “far right” parties have very left leaning / socialist policies in terms of labour law, employment protection, etc.). Hence the results need to be handled with care and don’t directly replace knowledge of the political system.

Can we boost turnout with such mechanisms, or do they only appeal to those already interested in politics? I think there must be something in it, especially for those undecided or who perhaps want to find about a minor party. Nevertheless I think it’s also pretty clear by now that e-democracy isn’t going to lead to a turnout revolution: rather IMO it’s about nuancing and informing the decisions of those who are already interested.


By |2014-04-28T14:23:16+01:00April 28th, 2014|Research, Social Web|3 Comments

Can social data be used to predict elections?

I’ve just started a new research blog with my colleague Taha Yasseri. Two aims: we want to know if and when social data might be useful in election prediction; we want to see if this knowledge teaches us anything about the political process. It’s also interesting to experiment with the idea of blogging research rather than going the usual journal route (though I imagine a paper or two will result anyway). Much quicker, rougher, but definitely satisfies my urge to do things quickly. We hope it will make the finished output better as well.

all-wikipedia-euelections article-2

The above image is an excerpt from the first post, on electoral information seeking in 19 different countries. We find that, essentially, people look for information much more after the election has already finished than before, probably in response to the election itself as a media event.

I’ll be cross posting a bit more as the blog develops.