ggplot2: Labelling Data Series and Adding a Data Table
Stephen Few has posted on his website a few design examples on how to improve the presentation of quantitative information.
One of the examples is depicting the average monthly temperature in three cities.
This post tries to replicate the graph in ggplot2, and demonstrate how to label data series, and how to add a data table to the plot.
The first step after importing the data is to convert it from wide format to long format, and replace the long month names with abbreviations, after which it is time to have a first look at the data.
> library(ggplot2) > df <- structure(list(City = structure(c(2L, 3L, 1L), .Label = c("Minneapolis", "Phoenix", "Raleigh"), class = "factor"), January = c(52.1, 40.5, 12.2), February = c(55.1, 42.2, 16.5), March = c(59.7, 49.2, 28.3), April = c(67.7, 59.5, 45.1), May = c(76.3, 67.4, 57.1), June = c(84.6, 74.4, 66.9), July = c(91.2, 77.5, 71.9), August = c(89.1, 76.5, 70.2), September = c(83.8, 70.6, 60), October = c(72.2, 60.2, 50), November = c(59.8, 50, 32.4), December = c(52.5, 41.2, 18.6)), .Names = c("City", "January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"), class = "data.frame", row.names = c(NA, -3L)) |
> dfm <- melt(df, variable_name = "month") > levels(dfm$month) <- month.abb > p <- ggplot(dfm, aes(month, value, group = City, colour = City)) > (p1 <- p + geom_line(size = 1)) |
A few plot elements need changing: use black-white theme, format y-axes labels, add plot title, and remove gridlines, axis titles, plot background.
> dgr_fmt <- function(x, ...) { parse(text = paste(x, "*degree", sep = "")) } > none <- theme_blank() |
> p2 <- p1 + theme_bw() + scale_y_continuous(formatter = dgr_fmt, limits = c(0, 100), expand = c(0, 0), ) + opts(title = expression("Average Monthly Temperatures (" * degree * "F)")) + opts(panel.grid.major = none, panel.grid.minor = none) + opts(legend.position = "none") + opts(panel.background = none) + opts(panel.border = none) + opts(axis.line = theme_segment(colour = "grey50")) + xlab(NULL) + ylab(NULL) |
Next add the reference lines.
> (p3 <- p2 + geom_vline(xintercept = c(2.9, 5.9, 8.9, 11.9), colour = "grey85", alpha = 0.5) + geom_hline(yintercept = 32, colour = "grey80", alpha = 0.5) + annotate("text", x = 1.2, y = 35, label = "Freezing", colour = "grey80", size = 4) + annotate("text", x = c(1.5, 4.5, 7.5, 10.5), y = 97, label = c("Winter", "Spring", "Summer", "Autumn"), colour = "grey70", size = 4)) |
And finally the series labels. Note that a different dataset is used containing only the positions of labels.
> (p4 <- p3 + geom_text(data = dfm[dfm$month == "Dec", ], aes(label = City), hjust = 0.7, vjust = 1)) |
The original graph also includes a data table with all the values. It is possible to include a table of values on the plot using grid.text, however using geom_text() allows for more flexibility. Essentially all the values are plotted on a graph with all the background elements then removed.
> data_table <- ggplot(dfm, aes(x = month, y = factor(City), label = format(value, nsmall = 1), colour = City)) + geom_text(size = 3.5) + theme_bw() + scale_y_discrete(formatter = abbreviate, limits = c("Minneapolis", "Raleigh", "Phoenix")) + opts(panel.grid.major = none, legend.position = "none", panel.border = none, axis.text.x = none, axis.ticks = none) + opts(plot.margin = unit(c(-0.5, 1, 0, 0.5), "lines")) + xlab(NULL) + ylab(NULL) |
Now the only step remaining is to set up the viewports and combine the two plots into one.
> Layout <- grid.layout(nrow = 2, ncol = 1, heights = unit(c(2, 0.25), c("null", "null"))) > grid.show.layout(Layout) > vplayout <- function(...) { grid.newpage() pushViewport(viewport(layout = Layout)) } |
> subplot <- function(x, y) viewport(layout.pos.row = x, layout.pos.col = y) > mmplot <- function(a, b) { vplayout() print(a, vp = subplot(1, 1)) print(b, vp = subplot(2, 1)) } |
> mmplot(p4, data_table) |
Very great ! I did’t think that it was possible to add a table below plot… Your method is clever, thanks.
Beautiful !!
Thank you very much.
Nice!
You may be interested to try the directlabels package on R-forge which provides ggplot2 functions to replace the legend with coloured text alongside the curves as you did here.
The command for adding the reference lines produces the error:
Error: When _setting_ aesthetics, they may only take one value. Problems: label
Could you please correct the code?
Getting the same bug here… Error: When _setting_ aesthetics, they may only take one value. Problems: label
The code is correct, you need to load the library grid beforehand before it works.
library(grid)
I still have this issue when I’m trying to plot my own graph, even after loading grid. Is there something new I might be missing?
What sort of error messages do you get?
Damn… nicely done. I’m going to trace through this example as I need to do something similar for some of my graphs.
awesome!
Hi, I was following this great example and I found out that you have to do two variants if you work with R2.15.0:
p2 <- p1 + theme_bw() + scale_y_continuous(labels=math_format(.x * degree)) +
opts(title = expression("Average Monthly Temperatures (" * degree * "F)")) +
opts(panel.grid.major = none, panel.grid.minor = none) + opts(legend.position = "none") +
opts(panel.background = none) + opts(panel.border = none) + opts(axis.line = theme_segment(colour = "grey50")) +
xlab(NULL) + ylab(NULL)
p3 <- p2 + geom_vline(xintercept = c(2.9, 5.9, 8.9, 11.9), colour = "grey85", alpha = 0.5) +
geom_hline(yintercept = 32, colour = "grey80", alpha = 0.5) +
annotate("text", x = 1.2, y = 35, label = "Freezing", colour = "grey80", size = 4) +
annotate("text", x = c(1.5), y = 97, label = c("Winter"), colour = "grey70", size = 4) +
annotate("text", x = c(4.5), y = 97, label = c("Spring"), colour = "grey70", size = 4) +
annotate("text", x = c(7.5), y = 97, label = c("Summer"), colour = "grey70", size = 4) +
annotate("text", x = c(10.5), y = 97, label = c("Autumn"), colour = "grey70", size = 4)
and
data_table <- ggplot(dfm, aes(x = month, y = factor(City), label = format(value, nsmall = 1), colour = City)) +
geom_text(size = 3.5) + theme_bw() +
scale_y_discrete(labels = abbreviate, limits = c("Minneapolis", "Raleigh", "Phoenix")) +
opts(panel.grid.major = none, legend.position = "none", panel.border = none, axis.text.x = none, axis.ticks = none) +
opts(plot.margin = unit(c(-0.5, 1, 0, 0.5), "lines")) + xlab(NULL) + ylab(NULL)
LearnR, thanks for your cool tutorials.
/ambarrio
# for ggplot2 0.92 release
library(ggplot2)
library(reshape)
library(grid)
df <- structure(list(City = structure(c(2L, 3L, 1L), .Label = c("Minneapolis", "Phoenix", "Raleigh"), class = "factor"), January = c(52.1, 40.5, 12.2), February = c(55.1, 42.2, 16.5), March = c(59.7, 49.2, 28.3), April = c(67.7, 59.5, 45.1), May = c(76.3, 67.4, 57.1), June = c(84.6, 74.4, 66.9), July = c(91.2, 77.5, 71.9), August = c(89.1, 76.5, 70.2), September = c(83.8, 70.6, 60), October = c(72.2, 60.2, 50), November = c(59.8, 50, 32.4), December = c(52.5, 41.2, 18.6)), .Names = c("City", "January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"), class = "data.frame", row.names = c(NA, -3L))
dfm <- melt(df, variable_name = 'month')
levels(dfm$month) <- month.abb
p <- ggplot(dfm, aes(month, value, group = City, colour = City))
p1 <- p + geom_line(size = 1)
## formatter function
dgr_fmt <- function(x, …){
parse(text = paste(x, "*degree", sep = ''))
}
## end dgr_fmt
p2 <- p1 + theme_bw() + scale_y_continuous(labels = dgr_fmt, limits = c(0, 100), expand = c(0, 0) ) + labs(title = expression("Average Monthly Temperatures (" * degree * "F)"), x = NULL, y = NULL) + theme(panel.grid.major = none, panel.grid.minor = none, legend.position = "none", panel.background = none, panel.border = none, axis.line = element_line(colour = "grey50"))
p3 <- p2 + geom_vline(xintercept = c(2.9, 5.9, 8.9, 11.9), colour = "grey85", alpha = 0.5) + geom_hline(yintercept = 32, colour = "grey80", alpha = 0.5) + annotate("text", x = 1.2, y = 35, label = "Freezing", colour = "grey80", size = 4) + annotate("text", x = c(1.5, 4.5, 7.5, 10.5), y = 97, label = c("Winter", "Spring", "Summer", "Autumn"), colour = "grey70", size = 4)
p4 <- p3 + geom_text(data = dfm[dfm$month == "Dec", ], aes(label = City), hjust = 0.7, vjust = 1)
data_table <- ggplot(dfm, aes(x = month, y = factor(City), label = format(value, nsmall = 1), colour = City)) + geom_text(size = 3.5) + theme_bw() + scale_y_discrete(labels = abbreviate, limits = c("Minneapolis", "Raleigh", "Phoenix")) + theme(panel.grid.major = none, legend.position = "none", panel.border = none, axis.text.x = none, axis.ticks = none, plot.margin = unit(c(-0.5, 1, 0, 0.5), "lines")) + labs(x = NULL, y = NULL)
Layout <- grid.layout(nrow = 2, ncol = 1, heights = unit(c(2, 0.25), c("null", "null")))
grid.show.layout(Layout)
vplayout <- function(…) {
grid.newpage()
pushViewport(viewport(layout = Layout))
}
subplot <- function(x, y) viewport(layout.pos.row = x, layout.pos.col = y)
mmplot <- function(a, b) {
vplayout()
print(a, vp = subplot(1, 1))
print(b, vp = subplot(2, 1))
}
mmplot(p4, data_table)
That was a great example. Thanks for sharing with us. I am trying to add minor grid lines to the data table but having hard time with it. Is it because the x and y axis are not continuous ?
data_table + theme(panel.grid.minor.x = element_line(size = 2,color=”black”))
data_table + theme(panel.grid.minor.y = element_line(size = 2,color=”black”))
Great post. Thanks for sharing with us. I am trying to add minor grid lines to the data_table but they don’t show up the graph, Any suggestions how to do it ?
data_table + theme(panel.grid.minor.x = element_line(size = 2,color=”white”))
data_table + theme(panel.grid.minor.y = element_line(size = 2,color=”white”))
This seems correct to me, so maybe try a different colour?
Great post, I can’t seem to get the table to line up with the plot though without a great deal of manual fussing with the margins for the table (it either takes up too little or too much room), do you know how to set these programatically?
Great post, it was really useful. Nonetheless, some functions are outdated. I have updated the code with some minor changes, just in case is helpful for someone:
Thanks Markelgl – very helpful post! Everything worked, except I had to change function(…) to function(), and function(x, …) to function(x) for it to work in my R session (RStudio, version 3.4.4).