ggplot2: Split Data Range into Multiple Chart Series
One of the techniques I frequently use in exploring relationship between product revenues and profitability is to draw a scatterplot to represent the data. Creating this chart in Excel involves a lot of tedious manual work.
Jon Peltier proposes a VBA loop to simplify the process of creating the chart series: read the first column of the range, grouping rows together by item. The resulting chart is seen below:
Now, let’s create the same chart using ggplot2.
First, the data. I will be using the same dataset to enhance the comparison of two charts created with different tools.
> rdata <- read.table(textConnection(" 1 City X Y 2 Atlanta 4 15 3 Atlanta 5 18 4 Boston 6 16 5 Boston 6 16 6 Boston 7 12 7 Boston 11 11 8 Chicago 10 13 9 Chicago 13 10 10 Chicago 15 8 11 Detroit 10 9 12 Detroit 15 5 13 Detroit 13 3 14 Detroit 14 6 "), header=TRUE) > closeAllConnections() |
Once the data has been imported into R, ggplot2 library needs to be loaded, and the first scatterplot drawn.
> library(ggplot2) > p <- ggplot(rdata, aes(x = X, y = Y, colour = City, shape = City, label = City)) |
Draw Default scatterplot:
> p1 <- p + geom_point() + xlab(NULL) + ylab(NULL) |
The legend can be replaced by labeling each observation. Obviously, if there are many observations this might not be practical.
Add labels & remove legend:
> p2 <- p1 + geom_text(aes(hjust = -0.1, vjust = 0.5)) + opts(legend.position = "none") |
Shapes and colours have been set automatically. Wonderful. We can see that the labels don’t fit in the plot area. Manual adjustment of axis limits is needed to work around this small problem.
Define max axis limits:
> maxl <- max(rdata$X, rdata$Y) |
Set x-axis and y-axis min&max limits:
> p3 <- p2 + scale_x_continuous(limits = c(0, maxl)) + scale_y_continuous(limits = c(0, maxl)) |
Now, only a few more formatting adjustments, and we will have a chart ready to be used.
> formatted <- p3 + scale_colour_brewer(palette = "Set1") + opts(panel.background = theme_rect(colour = "grey")) + opts(panel.grid.minor = theme_line(colour = NA)) + opts(panel.grid.major = theme_line(colour = NA)) |
Is it possible to change the size of the points? They seem too small.
The above plots use the default sizes, however you can explicitly set the shape size you like.
Try, for example:
formatted + geom_point(size=4)