data science, digital politics, smart cities...|jonathan.bright@oii.ox.ac.uk

Postcode sector counts of alcohol points of sale from OpenStreetMap data

I have a new article out in the journal Health & Place entitled OpenStreetMap data for alcohol research: Reliability assessment and quality indicators, written in conjunction with a number of people here at the OII and elsewhere. My colleague David Humphreys at SPI got me interested in the area when he told me about how difficult it was to construct local area indicators of alcohol availability in the UK, and how this was hampering research in the field. I wanted to see whether data in OpenStreetMap could fix the problem, as in general I’m pretty interested in the extent to which web data can be used as a valid proxy measurement for real life quantities of interest. Stefano de Sabbata, Sumin Lee and Bharath Ganesh all contributed to the analysis.

We did a few different things in the article: we conducted a validation of a random sample of 2,000 licenses we knew to exist, we used OSM data to duplicate a previous study in the area (E.A. Richardson et al. 2015 Is local alcohol outlet density related to alcohol-related morbidity and mortality in Scottish cities? Health Place, 33, 172-180), and we used a technique developed by Stefano to measure the ‘quality’ of OSM data in a given area. We showed that OSM is about 50% complete in terms of the amount of data it contains (in the specific case of alcohol licenses), and also that we could use the quality indicators to find areas with more complete alcohol data.

Quality of OpenStreetMap Data in Britain

Alongside the article, we are also releasing more general estimates of alcohol outlet prevalence across Britain, which are drawn from OpenStreetMap. We thought they might be of use to other researchers working in the area of the spatial availability of alcohol. They are a simple count of alcohol points of sale within each postcode sector in the UK, according to the data in OSM (see the paper for details of how they were counted). We’re also releasing an accompanying quality metric with each postcode sector so researchers can determine how trusted the OSM data should be (again see the paper for details on how it is constructed). The spatial distribution of the quality metric in the UK is mapped above. Feel free to reach out to me if you have any questions!

Get the estimates themselves here: OSM GB Alcohol Outlet Counts and Quality Index

The full reference for the paper:

J Bright, S De Sabbata, S Lee, B Ganesh, DK Humphreys. 2018. OpenStreetMap data for alcohol research: Reliability assessment and quality indicators. Health & Place 50, 130-136

and another related paper using the same dataset:

J Bright, S De Sabbata, S Lee. 2018. Geodemographic biases in crowdsourced knowledge websites: Do neighbours fill in the blanks? GeoJournal, 83, 3, 427–440

This research was partially funded by a grant from the ESRC (Grant no. ES/M010058/1).

By |2018-06-12T09:42:28+00:00June 11th, 2018|Research, Smart Cities, Social Web|0 Comments

Estimating local commuting patterns from geolocated Twitter data

Over the last decade or so there has been an explosion of research interest in the area of measuring (and forecasting) of traffic and commuting patterns. Part of this is driven by ever increasing human mobility: in 2016 alone, people in the UK travelled a collective 800 billion kilometres [PDF], more than 60% of which was by car, and congestion on these networks costs billions of pounds a year. But also driving the research agenda is the emergence of a wide variety of new forms of data (which has built on and supplemented more traditional magnetic loop technologies): such as data re-purposed from mobile phone records, or collected through IoT enabled smart sensors, or emerging from freely contributed traces to social media platforms. These data sources offer huge potential to improve on existing methods of data collection, such as hated transport census (see picture).

As part of a research project entitled NEXUS: Real Time Data Fusion and Network Analysis for Urban Systems (funded by InnovateUK), myself and a team of researchers at the OII have been looking into some of these possibilities. Our first paper on the subject, entitled “Estimating Local Commuting Patterns from Geolocated Twitter Data“, has just been published in EPJ Data Science. The paper addresses the extent to which we can make use of geolocated Twitter data to estimate commuting flows between local authorities (you can have a play with some of the underlying data using the map below, which shows census commuting figures and Twitter based estimates for local authorities around Britain).

We draw two main conclusions from the paper. First we show that, making use of heuristics for mapping individuals making geolocated tweets to home and work areas, we can use Twitter to produce accurate representations of the overall structure of commuting in mainland Great Britain; estimates which improve considerably on other ‘low information’ methods of estimating commuting flows (we compared estimates in particular to the popular radiation model). Second, and probably most importantly, we show that these results are not particularly sensitive to demographic characteristics. When looking at commuting flows broken down by gender, age group and social class, we found that Twitter still offered reasonable estimations for all of these sub-categories. We think this is important because a key concern about using social media data for this type of proxy estimation is the extent to which the ‘demographic bias’ in social media users (who are often younger, better educated and wealthier than the population average) might also result in biased predictions (for example, better prediction of the travel patterns of younger people). We show that, at least in our context, this is not the case.

What’s next? There is plenty more to explore in this research area: looking at whether predictions can be made more granular, or perhaps whether sentiment from social media can be worked in, or whether other platforms can also contribute. We will also start to work on some other data sources, making use of some of the exciting datasets being made available by places like the ADRN and CDRC.

Graham McNeill, Jonathan Bright and Scott A Hale (2017) Estimating local commuting patterns from geolocated Twitter data, EPJ Data Science 20176:24.
https://doi.org/10.1140/epjds/s13688-017-0120-x

By |2017-10-25T21:42:15+00:00October 25th, 2017|Research, Smart Cities, Social Media|0 Comments