data science, digital politics, smart cities...|

Convex Hulls with ggplot

I found this code buried in an old google group discussion which I thought I would repost. As with everything ggplot wise hat tip to the incredible Hadley Wickham.

Often it’s nice to break down scatter plots by a third variable, especially if it’s categorical. However if you have lots of categories the space occupied by the different groups isn’t always straightforward to see:


A bounding box around the outermost points in each group (a ‘convex hull’) makes it easier to see the groups. To make them appear in ggplot you can use the following code:

find_hull <- function(df) df[chull(df$X, df$Y), ]
hulls <- ddply(somedata, “CategoryName”, find_hull)

Where X and Y are the names of your X and Y variables on the scatter plot. With ‘hulls’ in place you can simply add the following to your ggplot command (with a bit of alpha so you can still see the points):

geom_polygon(data = hulls, alpha = 0.2)


Is it better? On the one hand you can clearly see the groups. On the other hand outliers may give a bit of a distorted impression of what is going on. Perhaps something which trimmed outliers would be a bit better.

By |2013-06-10T12:41:04+01:00June 10th, 2013|R|0 Comments

New Post at the Oxford Internet Institute

I started a new position as a Research Fellow at the Oxford Internet Institute this month, which is a department of the University of Oxford. I will be working largely on ‘big data’ approaches to political science, looking especially at political communication and parliamentary behaviour.

The OII is a fantastic institute with a really interesting mix of researchers. In my opinion it is at the cutting edge of thinking about how to bring the opportunities offered by new technology into social science research, and I am really excited to be there.

By |2013-06-06T12:36:36+01:00June 6th, 2013|Research, Teaching|0 Comments