Skip to content

ggplot2: Split Data Range into Multiple Chart Series

March 16, 2009

One of the techniques I frequently use in exploring relationship between product revenues and profitability is to draw a scatterplot to represent the data. Creating this chart in Excel involves a lot of tedious manual work.

Jon Peltier proposes a VBA loop to simplify the process of creating the chart series: read the first column of the range, grouping rows together by item. The resulting chart is seen below:

http://learnr.files.wordpress.com/2009/03/vbachartbyname2.png?w=600

Now, let’s create the same chart using ggplot2.

First, the data. I will be using the same dataset to enhance the comparison of two charts created with different tools.

> rdata <- read.table(textConnection("
 1 City X Y
 2 Atlanta 4 15
 3 Atlanta 5 18
 4 Boston 6 16
 5 Boston 6 16
 6 Boston 7 12
 7 Boston 11 11
 8 Chicago 10 13
 9 Chicago 13 10
 10 Chicago 15 8
 11 Detroit 10 9
 12 Detroit 15 5
 13 Detroit 13 3
 14 Detroit 14 6 "), header=TRUE)
> closeAllConnections()

Once the data has been imported into R, ggplot2 library needs to be loaded, and the first scatterplot drawn.

> library(ggplot2)
> p <- ggplot(rdata, aes(x = X, y = Y, colour = City,
     shape = City, label = City))

Draw Default scatterplot:

> p1 <- p + geom_point() + xlab(NULL) + ylab(NULL)
http://learnr.files.wordpress.com/2009/03/split_data_range_p1.png?w=600

The legend can be replaced by labeling each observation. Obviously, if there are many observations this might not be practical.

Add labels & remove legend:

> p2 <- p1 + geom_text(aes(hjust = -0.1, vjust = 0.5)) +
     opts(legend.position = "none")
http://learnr.files.wordpress.com/2009/03/split_data_range_p2.png?w=600

Shapes and colours have been set automatically. Wonderful. We can see that the labels don’t fit in the plot area. Manual adjustment of axis limits is needed to work around this small problem.

Define max axis limits:

> maxl <- max(rdata$X, rdata$Y)

Set x-axis and y-axis min&max limits:

> p3 <- p2 + scale_x_continuous(limits = c(0,
     maxl)) + scale_y_continuous(limits = c(0,
     maxl))
http://learnr.files.wordpress.com/2009/03/split_data_range_p3.png?w=600

Now, only a few more formatting adjustments, and we will have a chart ready to be used.

> formatted <- p3 + scale_colour_brewer(palette = "Set1") +
     opts(panel.background = theme_rect(colour = "grey")) +
     opts(panel.grid.minor = theme_line(colour = NA)) +
     opts(panel.grid.major = theme_line(colour = NA))
http://learnr.files.wordpress.com/2009/03/split_data_range_formatted.png?w=600
About these ads
6 Comments leave one →
  1. March 28, 2009 6:58 pm

    Is it possible to change the size of the points? They seem too small.

    • learnr permalink
      March 28, 2009 7:23 pm

      The above plots use the default sizes, however you can explicitly set the shape size you like.
      Try, for example:
      formatted + geom_point(size=4)

Trackbacks

  1. Excel’s Missing Factor « Charts & Graphs
  2. ggplot2: Don’t Try This With Excel « Learning R
  3. Opinions Not Backed by Money Are Not That Believable–Updated and with R » 统计代码银行
  4. ggplot2: Split Data Range into Multiple Chart Series » 统计代码银行

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 170 other followers

%d bloggers like this: