Excel Charts Blog posted a video tutorial of how to create a circumplex or rose or dougnut chart in Excel. Apparently this type of chart is very popular in the consulting industry, hence the “Consultants’ Chart”. It is very easy to make this chart in Excel 2010, but it involves countless number of clicks and formulas to format both the source data and the chart itself.

In ggplot2 the same can be achieved with around 10 lines of code, as can be seen below.

Waterfall charts are often used for analytical purposes in the business setting to show the effect of sequentially introduced negative and/or positive values. Sometimes waterfall charts are also referred to as cascade charts.

In the next few paragraphs I will show how to plot a waterfall chart using ggplot2.

“How to change the order of legend labels” is a question that gets asked relatively often on ggplot2 mailing list. A variation of this question is how to change the order of series in stacked bar/lineplots.

While these two questions seem to be related, in fact they are separate as the legend is controlled by scales, whereas stacking is controlled by the order of values in the data.

Recently I spent some time getting my head around this, and below is a quick recap.

Plotting timeseries with dates on x-axis and times on y-axis can be a bit tricky in ggplot2. However, with a little trick this problem can be easily overcome.

A post on FlowingData blog demonstrated how to quickly make a heatmap below using R base graphics.

This post shows how to achieve a very similar result using ggplot2. Statistical Algorithms blog attempted to recreate a graph depicting the growing colour selection of Crayola crayons in ggplot2 (original graph below via FlowingData).

He also asked the following questions: Is there an easier way to do this? How can I make the axes more like the original? What about the white lines between boxes and the gradual change between years? The sort order is also different.

I will present my version in this post, trying to address some of these questions. tags:

Just before Christmas ggplot2 version 0.8.5 was released, closely following the release of version 0.8.4 a week or so earlier. Whilst both versions included included numerous bugfixes (25 in 0.8.4 and 17 in 0.8.5), the latest version also incorporated some new features.

As ggplot2 is all about graphical display, so I went through the list of new features and below is a visual example of each new feature, plotted most often utilising the code examples included in the respective bugtracker issues.

Sometimes it is preferable to label data series instead of using a legend. This post demonstrates one way of using labels instead of legend in a ggplot2 plot.

 `> library(ggplot2)`
 ```> p <- ggplot(dfm, aes(month, value, group = City, colour = City)) + geom_line(size = 1) + opts(legend.position = "none")```
 ```> p + geom_text(data = dfm[dfm\$month == "Dec", ], aes(label = City), hjust = 0.7, vjust = 1)``` The addition of labels requires manual calculation of the label positions which are then passed on to geom_text(). If one wanted to move the labels around, the code would need manual adjustment – label positions need to be recalculated..

This problem is easily solved with the help of directlabels package by Toby Dylan Hocking that “is an attempt to make direct labeling a reality in everyday statistical practice by making available a body of useful functions that make direct labeling of common plots easy to do with high-level plotting systems such as lattice and ggplot2”.

 `> install.packages("directlabels", repos = "http://r-forge.r-project.org")`
 `> library(directlabels)`

The above plot can be reproduced with one line of code.

 ```> direct.label(p, list(last.points, hjust = 0.7, vjust = 1))```

In addition to several predefined positioning functions, one can also write their own positioning function. For example, placing the rotated labels at the starting values of each series.

 ```> angled.firstpoints <- list("first.points", rot = 45, hjust = 0, vjust = -0.7) > direct.label(p, angled.firstpoints)``` I agree with the author’s conclusion that the directlabels package simplifies and makes more convenient the labeling of data series in both lattice and ggplot2.

Thanks to Baptiste for bringing this package to my attention.

tags: , , ,

In 2006 UserR conference Jim Porzak gave a presentation on data profiling with R. He showed how to draw summary panels of the data using a combination of grid and base graphics. Unfortunately the code has not (yet) been released as a package, so when I recently needed to quickly review several datasets at the beginning of an analysis project I started to look for alternatives. A quick search revealed two options that offer similar functionality: r2lUniv package and describe() function in Hmisc package.

Hadley Wickham recently shared a nice tip on how to get a faceted scatterplot plot with all points in the background of each plot.

This technique makes a clever use of setting the faceting variable to NULL so that all points are plotted in light grey in all the facets.

 `> library(ggplot2)`
 ```> ggplot(mtcars, aes(cyl, mpg)) + geom_point(data = transform(mtcars, gear = NULL), colour = "grey80") + geom_point() + facet_grid(~gear) + theme_bw()``` Update 17 May 2010

bch asked in the comments below, how to achieve the same when there are two facets. The method is the same, now one would need to exclude both of the facetting variables from the dataset used to draw the light grey points.

 ```> ggplot(mtcars, aes(cyl, mpg)) + geom_point(data = mtcars[, !names(mtcars) %in% c("am", "gear")], colour = "grey80") + geom_point() + facet_grid(am ~ gear) + theme_bw()``` 