data science, digital politics, smart cities...|

Convex Hulls with ggplot

I found this code buried in an old google group discussion which I thought I would repost. As with everything ggplot wise hat tip to the incredible Hadley Wickham.

Often it’s nice to break down scatter plots by a third variable, especially if it’s categorical. However if you have lots of categories the space occupied by the different groups isn’t always straightforward to see:


A bounding box around the outermost points in each group (a ‘convex hull’) makes it easier to see the groups. To make them appear in ggplot you can use the following code:

find_hull <- function(df) df[chull(df$X, df$Y), ]
hulls <- ddply(somedata, “CategoryName”, find_hull)

Where X and Y are the names of your X and Y variables on the scatter plot. With ‘hulls’ in place you can simply add the following to your ggplot command (with a bit of alpha so you can still see the points):

geom_polygon(data = hulls, alpha = 0.2)


Is it better? On the one hand you can clearly see the groups. On the other hand outliers may give a bit of a distorted impression of what is going on. Perhaps something which trimmed outliers would be a bit better.

By |2013-06-10T12:41:04+01:00June 10th, 2013|R|0 Comments

About the Author:

Leave A Comment