ggplot2 Version of Figures in “Lattice: Multivariate Data Visualization with R” (Part 5)
July 15, 2009
This is the 5th post in a series attempting to recreate the figures in Lattice: Multivariate Data Visualization with R (R code) with ggplot2.
Previous parts in this series: Part 1, Part 2, Part 3, Part 4.
Chapter 5 – Scatter Plots and Extensions
Topics covered:
- The standard scatter plot
- Using subscripts
- Using the type argument
- Variants for large data
- Scatter plot matrix
- Parallel coordinate plot
Figure 5.1
> library(lattice) > library(ggplot2) |
lattice
> pl <- xyplot(lat ~ long | cut(depth, 2), data = quakes) > print(pl) |
ggplot2
> quakes$Depth <- with(quakes, cut(depth, 2)) |
> pg <- ggplot(quakes, aes(long, lat)) + geom_point(shape = 1) + + facet_grid(~Depth) + opts(aspect.ratio = 1) > print(pg) |
Figure 5.2
lattice
> pl <- xyplot(lat ~ long | cut(depth, 3), data = quakes, + aspect = "iso", pch = ".", cex = 2, type = c("p", + "g"), xlab = "Longitude", ylab = "Latitude", + strip = strip.custom(strip.names = TRUE, var.name = "Depth")) > print(pl) |
ggplot2
> quakes$Depth <- with(quakes, cut(depth, 3)) |
> pg <- ggplot(quakes, aes(long, lat)) + geom_point() + + facet_grid(~Depth, labeller = label_both) + coord_equal() + + labs(x = "Longitude", y = "Latitude") > print(pg) |
Figure 5.3
lattice
> pl <- xyplot(lat ~ long, data = quakes, aspect = "iso", + groups = cut(depth, breaks = quantile(depth, ppoints(4, + 1))), auto.key = list(columns = 3, title = "Depth"), + xlab = "Longitude", ylab = "Latitude") > print(pl) |
ggplot2
> quakes$Depth <- with(quakes, cut(depth, breaks = quantile(depth, + ppoints(4, 1)), include.lowest = TRUE)) |
> pg <- ggplot(quakes, aes(long, lat, colour = Depth)) + + geom_point() + coord_equal() + labs(x = "Longitude", + y = "Latitude") + opts(title = "Depth") > print(pg) |
Figure 5.4
lattice
> depth.col <- gray.colors(100)[cut(quakes$depth, 100, + label = FALSE)] > depth.ord <- rev(order(quakes$depth)) |
> pl <- xyplot(lat ~ long, data = quakes[depth.ord, ], + aspect = "iso", type = c("p", "g"), pch = 21, fill = depth.col[depth.ord], + cex = 2, xlab = "Longitude", ylab = "Latitude") > print(pl) |
ggplot2
> pg <- ggplot(quakes, aes(long, lat, colour = factor(cut(quakes$depth, + 100, label = FALSE)))) + geom_point(size = 4) + geom_point(size = 4, + shape = 1, colour = "steelblue", alpha = 0.4) + labs(x = "Longitude", + y = "Latitude") + scale_colour_grey() + theme_bw() + + opts(legend.position = "none") + coord_equal() > print(pg) |
Figure 5.5
lattice
> quakes$Magnitude <- equal.count(quakes$mag, 4) > quakes$color <- depth.col > quakes.ordered <- quakes[depth.ord, ] |
> pl <- xyplot(lat ~ long | Magnitude, data = quakes.ordered, + aspect = "iso", fill.color = quakes.ordered$color, + cex = 2, panel = function(x, y, fill.color, ..., + subscripts) { + fill <- fill.color[subscripts] + panel.grid(h = -1, v = -1) + panel.xyplot(x, y, pch = 21, fill = fill, ...) + }, xlab = "Longitude", ylab = "Latitude") > print(pl) |
ggplot2
> fn <- function(data = quakes$mag, number = 4, ...) { + intrv <<- as.data.frame(co.intervals(data, number, + ...)) + mag <- sort(unique(data)) + intervals <- ldply(mag, function(x) { + t(as.numeric(x < intrv$V2 & x > intrv$V1)) + }) + tmp <- melt(cbind(mag, intervals), id.var = 1) + tmp[tmp$value > 0, 1:2] + } > quakes.ordered <- merge(quakes, fn()) > intrv <- with(intrv, paste(V1, V2, sep = "-")) > quakes.ordered <- rename(quakes.ordered, c(variable = "magnitude")) > quakes.ordered$magnitude <- factor(quakes.ordered$magnitude, + labels = intrv) |
> pg <- ggplot(quakes.ordered, aes(long, lat, colour = factor(cut(depth, + 100, label = FALSE)))) + geom_point(size = 4) + facet_grid(~magnitude, + labeller = label_both) + scale_colour_grey() + theme_bw() + + labs(x = "Longitude", y = "Latitude") + opts(legend.position = "none") + + coord_equal() > print(pg) |
Note |
Custom wrapper function fn() used to break the data into intervals. |
Figure 5.6
lattice
> depth.breaks <- do.breaks(range(quakes.ordered$depth), + 50) > quakes.ordered$color <- level.colors(quakes.ordered$depth, + at = depth.breaks, col.regions = gray.colors) |
> pl <- xyplot(lat ~ long | Magnitude, data = quakes.ordered, + aspect = "iso", groups = color, cex = 2, panel = function(x, + y, groups, ..., subscripts) { + fill <- groups[subscripts] + panel.grid(h = -1, v = -1) + panel.xyplot(x, y, pch = 21, fill = fill, ...) + }, legend = list(right = list(fun = draw.colorkey, + args = list(key = list(col = gray.colors, at = depth.breaks), + draw = FALSE))), xlab = "Longitude", ylab = "Latitude") > print(pl) |
ggplot2
> pg <- ggplot(quakes.ordered, aes(long, lat, colour = depth)) + + geom_point(size = 4) + facet_grid(~magnitude, labeller = label_both) + + coord_equal() + scale_colour_gradient(low = "grey30", + high = "grey90") + labs(x = "Longitude", y = "Latitude") + + theme_bw() > print(pg) |
Figure 5.7
lattice
> types.plain <- c("p", "l", "o", "r", "g", "s", "S", "h", + "a", "smooth") > types.horiz <- c("s", "S", "h", "a", "smooth") > horiz <- rep(c(FALSE, TRUE), c(length(types.plain), length(types.horiz))) > types <- c(types.plain, types.horiz) > set.seed(2007041) > x <- sample(seq(-10, 10, length = 15), 30, TRUE) > y <- x + 0.25 * (x + 1)^2 + rnorm(length(x), sd = 5) |
> pl <- xyplot(y ~ x | gl(1, length(types)), xlab = "type", + ylab = list(c("horizontal=TRUE", "horizontal=FALSE"), + y = c(1/6, 4/6)), as.table = TRUE, layout = c(5, + 3), between = list(y = c(0, 1)), strip = function(...) { + panel.fill(trellis.par.get("strip.background")$col[1]) + type <- types[panel.number()] + grid.text(lab = sprintf("\"%s\"", type), x = 0.5, + y = 0.5) + grid.rect() + }, scales = list(alternating = c(0, 2), tck = c(0, + 0.7), draw = FALSE), par.settings = list(layout.widths = list(strip.left = c(1, + 0, 0, 0, 0))), panel = function(...) { + type <- types[panel.number()] + horizontal <- horiz[panel.number()] + panel.xyplot(..., type = type, horizontal = horizontal) + })[rep(1, length(types))] > print(pl) |
ggplot2
No direct support - one would need to draw 15 separate graphs and combine these into one using grid.page() |
Figure 5.8
> data(Earthquake, package = "MEMSS") |
lattice
> pl <- xyplot(accel ~ distance, data = Earthquake, panel = function(...) { + panel.grid(h = -1, v = -1) + panel.xyplot(...) + panel.loess(...) + }, xlab = "Distance From Epicenter (km)", ylab = "Maximum Horizontal Acceleration (g)") > print(pl) |
ggplot2
> pg <- ggplot(Earthquake, aes(distance, accel)) + geom_point() + + geom_smooth(method = "loess", se = FALSE) + xlab("Distance From Epicenter (km)") + + ylab("Maximum Horizontal Acceleration (g)") > print(pg) |
Figure 5.9
lattice
> pl <- xyplot(accel ~ distance, data = Earthquake, type = c("g", + "p", "smooth"), scales = list(log = 2), xlab = "Distance From Epicenter (km)", + ylab = "Maximum Horizontal Acceleration (g)") > print(pl) |
ggplot2
> pg <- pg + scale_x_log2() + scale_y_log2() > print(pg) |
Figure 5.10
> library(locfit) |
lattice
> Earthquake$Magnitude <- equal.count(Earthquake$Richter, + 3, overlap = 0.1) > coef <- coef(lm(log2(accel) ~ log2(distance), data = Earthquake)) |
> pl <- xyplot(accel ~ distance | Magnitude, data = Earthquake, + scales = list(log = 2), col.line = "grey", lwd = 2, + panel = function(...) { + panel.abline(reg = coef) + panel.locfit(...) + }, xlab = "Distance From Epicenter (km)", ylab = "Maximum Horizontal Acceleration (g)") > print(pl) |
ggplot2
> Earthquake2 <- merge(Earthquake, fn(Earthquake$Richter, + 3, overlap = 0.1), by.x = "Richter", by.y = "mag", + all.x = TRUE) |
> pg <- ggplot(Earthquake2, aes(distance, accel)) + facet_grid(~variable, + labeller = label_both) + geom_smooth(method = "lm", + se = F, fullrange = T, colour = "steelblue", size = 1) + + geom_smooth(method = "locfit", formula = y ~ x, se = F) + + geom_point() + scale_x_log2() + scale_y_log2() + + xlab("Distance From Epicenter (km)") + ylab("Maximum Horizontal Acceleration (g)") > print(pg) |
Note |
Custom wrapper function fn() used to break the data into intervals. See Figure 5.5. |
Figure 5.11
> data(SeatacWeather, package = "latticeExtra") |
lattice
> pl <- xyplot(min.temp + max.temp + precip ~ day | month, + ylab = "Temperature and Rainfall", data = SeatacWeather, + type = "l", lty = 1, col = "black") > print(pl) |
ggplot2
> p.precip <- ggplot(SeatacWeather, aes(day)) + facet_grid(~month) + + geom_line(aes(y = min.temp)) + geom_line(aes(y = max.temp)) + + ylab("Temperature and Rainfall") > pg <- p.precip + geom_line(aes(y = precip)) > print(pg) |
Figure 5.12
> maxp <- max(SeatacWeather$precip, na.rm = TRUE) |
lattice
> pl <- xyplot(min.temp + max.temp + I(80 * precip/maxp) ~ + day | month, data = SeatacWeather, lty = 1, col = "black", + ylab = "Temperature and Rainfall", type = c("l", + "l", "h"), distribute.type = TRUE) > print(pl) |
ggplot2
> pg <- p.precip + geom_linerange(aes(ymin = 0, ymax = 80 * + precip/maxp)) > print(pg) |
Figure 5.13
lattice
> pl <- update(trellis.last.object(), ylab = "Temperature (Fahrenheit) \n and Rainfall (inches)", + panel = function(...) { + panel.xyplot(...) + if (panel.number() == 2) { + at <- pretty(c(0, maxp)) + panel.axis("right", half = FALSE, at = at * + 80/maxp, labels = at) + } + }) > print(pl) |
ggplot2
ggplot2 does not support the addition of a secondary axis. |
Figure 5.14
> library(hexbin) > data(gvhd10, package = "latticeExtra") |
lattice
> pl <- xyplot(asinh(SSC.H) ~ asinh(FL2.H) | Days, gvhd10, + aspect = 1, panel = panel.hexbinplot, .aspect.ratio = 1, + trans = sqrt) > print(pl) |
ggplot2
> pg <- ggplot(gvhd10, aes(asinh(FL2.H), asinh(SSC.H), + fill = sqrt(..count..))) + geom_hex() + facet_wrap(~Days, + nrow = 2) + opts(legend.position = "none") > print(pg) |
Figure 5.15 – Scatter Plot Matrix
lattice
> pl <- splom(USArrests) > print(pl) |
ggplot2
> pg <- plotmatrix(USArrests) > print(pg) |
Note |
plotmatrix function is still at experimental stage. |
Figure 5.16
lattice
> pl <- splom(~USArrests[c(3, 1, 2, 4)] | state.region, + pscales = 0, type = c("g", "p", "smooth")) > print(pl) |
ggplot2
There is currently no easy way of achieving the same in ggplot2 |
Figure 5.17
lattice
> pl <- splom(~data.frame(mpg, disp, hp, drat, wt, qsec), + data = mtcars, groups = cyl, pscales = 0, varnames = c("Miles\nper\ngallon", + "Displacement\n(cu. in.)", "Gross\nhorsepower", + "Rear\naxle\nratio", "Weight", "1/4 mile\ntime"), + auto.key = list(columns = 3, title = "Number of Cylinders")) > print(pl) |
ggplot2
> pg <- plotmatrix(with(mtcars, data.frame(mpg, disp, hp, + drat, wt, qsec))) > print(pg) |
Note |
plotmatrix function is still at experimental stage. Colour mapping is a planned future feature. |
Figure 5.18
lattice
> pl <- parallel(~mtcars[c(1, 3, 4, 5, 6, 7)] | factor(cyl), + mtcars, groups = carb, key = simpleKey(levels(factor(mtcars$carb)), + points = FALSE, lines = TRUE, space = "top", + columns = 3), layout = c(3, 1)) > print(pl) |
ggplot2
> mtcars <- namerows(mtcars, col.name = "car") > df <- melt(mtcars[c(-8:-10)], id.var = c("cyl", "carb", + "car")) > dfm <- ddply(df, .(variable), transform, rng = rescaler(value, + type = "range")) |
> pg <- ggplot(dfm, aes(group = car, colour = factor(carb))) + + geom_line(aes(variable, rng)) + facet_grid(~cyl) + + coord_flip() > print(pg) |
Figure 5.19
lattice
> pl <- parallel(~asinh(gvhd10[c(3, 2, 4, 1, 5)]), data = gvhd10, + subset = Days == "13", alpha = 0.01, lty = 1) > print(pl) |
ggplot2
> df <- gvhd10[gvhd10$Days == "13", c(1:5)] > df$id <- seq_along(df[, 1]) > df <- melt(df, id.vars = c("id")) > df$variable <- factor(df$variable, levels = names(gvhd10)[c(3, + 2, 4, 1, 5)]) > df <- ddply(df, .(variable), transform, value = rescaler(asinh(value), + type = "range")) |
> pg <- ggplot(df, aes(value, variable, group = id)) + + geom_path(alpha = 0.01) + theme_bw() > print(pg) |
Note |
Built-in black and white theme is used, otherwise the thin grey lines would be invisible on a grey background. |
19 Comments
leave one →
Trackbacks
- links for 2009-07-16 | dekay.org
- ggplot2 Version of Figures in “Lattice: Multivariate Data Visualization with R” (Part 6) « Learning R
- ggplot2 Version of Figures in “Lattice: Multivariate Data Visualization with R” (Part 10) « Learning R
- R scatterplot matrix with nonparametric density | Question and Answer
This is so great! You seem to learn R very fast! There was one minor problem with 5.14:
> pg print(pg)
Error in eval(expr, envir, enclos) : object ‘..count..’ not found
Strange, I don’t seem to get this error.
Nice comparisons. Aspect ratio is important. Is there not an aspect=”iso”, for example, in ggplot?
You can set aspect ratio using one of the two options:
opts(aspect.ratio = 1)
or
coord_equal(ratio = 1)
Thanks. In your plots above, you are using “opts(aspect.ratio = 1)” to mimic lattice plots that use “aspect=’iso'”. This is not correct (ok, it would be if the values in both variables spanned the same range). “opts(aspect.ratio = 1)” only causes the bounding box of each facet to be a square. The idea behind aspect=’iso’, however, is, quoting the documentation: “where the relation between physical distance on the device and distance in the data scale are forced to be the same for both axes”.
So the alternative you gave me – “coord_equal(ratio = 1)” is the ggplot way to get isometric scales. But playing around with this I see that it still forces the facets to be square, extending the axis limits of whatever axis comes up short. This is very strange and constraining behavior if you ask me.
Great Job! I wonder if at the end of the series you are going to collect all the ‘episodes’ in a single pdf so one can read it offline easily.
That’s a good idea. I need to think how to best accomplish it.
I suggest to use the literate programming approach that allows to produce both the data analysis (the images in you case) and the Report at the same time.
I use it extensively for reporting taking advantage of the R Sweave function:
http://www.stat.uni-muenchen.de/~leisch/Sweave/
If you need some help don’t hesitate to ask! 🙂
Very good suggestion!
Nice work! It would help if you could cut & paste your code without the “>” symbols to facilitate our cutting/pasting into an R session…
In Windows version of R you could paste the code using “Paste commands only” option from the right-click menu. This option gets rid of the “>” symbols.
Thanks for the tutorials, they are great:
I was wondering if you could help me, how I can modify the number of tickmarcks in a ggplot in the x-axis?. I have 10 scatterplots placed in two rows, and the tickmarcks of the x-axis are overlapping.
Regards,
Roberto
p <- ggplot(bdhier, aes(lescudo.dorsal, lperna2b)) +
opts(axis.test.x= theme_text(angle = 90, size=4, colour = "black"))+
opts(aspect.ratio = 1)+
facet_wrap(~pop.num,nrow=2) + geom_point() +
labs(x= "Log Dorsal Plate length (mm)", y = "Log Leg 2 length (mm)")
One option would be to specify the breaks/labels manually.
Have a look, there was a thread on a similar topic on ggplot2 mailing list just last week.
This is a great website! thanks. I am currently searching for a trick to obtain lat/long degrees in the labels of a ggplot(of a spatial grid) x and y axis e.g. labels like in the spplot of a spatial data.frame() like 5◦S etc. Any ideas how to do this with ggplot?
Sorry, I am not that familiar with spplot, and thus cannot really help.