Skip to content

ggplot2: Split Data Range into Multiple Chart Series

March 16, 2009

One of the techniques I frequently use in exploring relationship between product revenues and profitability is to draw a scatterplot to represent the data. Creating this chart in Excel involves a lot of tedious manual work.

Jon Peltier proposes a VBA loop to simplify the process of creating the chart series: read the first column of the range, grouping rows together by item. The resulting chart is seen below:

https://learnr.wordpress.com/wp-content/uploads/2009/03/vbachartbyname2.png

Now, let’s create the same chart using ggplot2.

First, the data. I will be using the same dataset to enhance the comparison of two charts created with different tools.

> rdata <- read.table(textConnection("
 1 City X Y
 2 Atlanta 4 15
 3 Atlanta 5 18
 4 Boston 6 16
 5 Boston 6 16
 6 Boston 7 12
 7 Boston 11 11
 8 Chicago 10 13
 9 Chicago 13 10
 10 Chicago 15 8
 11 Detroit 10 9
 12 Detroit 15 5
 13 Detroit 13 3
 14 Detroit 14 6 "), header=TRUE)
> closeAllConnections()

Once the data has been imported into R, ggplot2 library needs to be loaded, and the first scatterplot drawn.

> library(ggplot2)
> p <- ggplot(rdata, aes(x = X, y = Y, colour = City,
     shape = City, label = City))

Draw Default scatterplot:

> p1 <- p + geom_point() + xlab(NULL) + ylab(NULL)
https://learnr.wordpress.com/wp-content/uploads/2009/03/split_data_range_p1.png

The legend can be replaced by labeling each observation. Obviously, if there are many observations this might not be practical.

Add labels & remove legend:

> p2 <- p1 + geom_text(aes(hjust = -0.1, vjust = 0.5)) +
     opts(legend.position = "none")
https://learnr.wordpress.com/wp-content/uploads/2009/03/split_data_range_p2.png

Shapes and colours have been set automatically. Wonderful. We can see that the labels don’t fit in the plot area. Manual adjustment of axis limits is needed to work around this small problem.

Define max axis limits:

> maxl <- max(rdata$X, rdata$Y)

Set x-axis and y-axis min&max limits:

> p3 <- p2 + scale_x_continuous(limits = c(0,
     maxl)) + scale_y_continuous(limits = c(0,
     maxl))
https://learnr.wordpress.com/wp-content/uploads/2009/03/split_data_range_p3.png

Now, only a few more formatting adjustments, and we will have a chart ready to be used.

> formatted <- p3 + scale_colour_brewer(palette = "Set1") +
     opts(panel.background = theme_rect(colour = "grey")) +
     opts(panel.grid.minor = theme_line(colour = NA)) +
     opts(panel.grid.major = theme_line(colour = NA))
https://learnr.wordpress.com/wp-content/uploads/2009/03/split_data_range_formatted.png
6 Comments leave one →
  1. March 28, 2009 6:58 pm

    Is it possible to change the size of the points? They seem too small.

    • learnr permalink
      March 28, 2009 7:23 pm

      The above plots use the default sizes, however you can explicitly set the shape size you like.
      Try, for example:
      formatted + geom_point(size=4)

Trackbacks

  1. Excel’s Missing Factor « Charts & Graphs
  2. ggplot2: Don’t Try This With Excel « Learning R
  3. Opinions Not Backed by Money Are Not That Believable–Updated and with R » 统计代码银行
  4. ggplot2: Split Data Range into Multiple Chart Series » 统计代码银行

Leave a comment