Skip to content

ggplot2: Two Color XY-Area Combo Chart

October 22, 2009

David@Work blog shows how to fill in the area between two crossing lines in an Excel chart. This post was also published as a guest-post on PTS blog.

TwoColorXYAreaChart.jpg

Let’s try to replicate this graph in ggplot2.

Read more…

Export Data Frames To Multi-worksheet Excel File

October 6, 2009

A few weeks ago I needed to export a number of data frames to separate worksheets in an Excel file. Although one could output csv-files from R and then import them manually or with the help of VBA into Excel, I was after a more streamlined solution, as I would need to repeat this process quite regularly in the future.

CRAN has several packages that offer the functionality of creating an Excel file, however several of them provide only the very basic functionality. The R-wiki page on exchanging data between R and Windows applications focuses mainly on the data import problem.

My objective was to find an export method that would allow me to easily split a larger dataframe by values of a given variable so that each subset would be exported to its own worksheet in the same Excel file. I tried out the different ways of achieving this and documented my findings below.

Read more…

WordPress Blogging with R in 3 Steps

September 29, 2009

A few people have emailed me and enquired about the use of tools mentioned at the end of this post to make blogposts with embedded R-commands. Below is a small step-by-step walkthrough of how to accomplish this.

Read more…

ggplot2: Back-to-back Bar Charts

September 24, 2009

On the ggplot2 mailing-list the following question was asked:

How to create a back-to-back bar chart with ggplot2?

For anyone who don’t know what I am talking about, have a look on a recent paper from the EU. I’d like to create plots like the graphs 5,6,18 in the paper.

An example graph from the above report is below:

export_import_graph5.png

Let’s create the same graph in ggplot2.

Read more…

brew: Creating Repetitive Reports

September 9, 2009

United Nations report World Population Prospects: The 2008 Revision (highlights available here) provides data about the historical and forecasted population of the country. In exploring the future and past population trends it is relatively easy to subset the dataset by your selected variable.

> file <- c("UNdata_Population.csv")
> population <- read.csv(file)
> names(population) <- c("code", "country", "year",
+     "variant", "value")
> df <- subset(population, year <= 2005)

Likewise, it is straightforward to produce a 263 page pdf-file that shows the population trend between 1950 – 2005 for all the countries in the dataset.

> pdf("population_growth.pdf", paper = "a4")
> d_ply(df, .(country), function(x) plot(x$year,
+     x$value, type = "l", main = unique(x$country)))
> dev.off()

However, if one would like to create a report showing some summary tables and graphs along with some textual description for all the countries, then the process becomes more complicated. This is exactly what the brew() package written by Jeffrey Horner was designed to help the users with. brew “implements a templating framework for mixing text and R code for report generation” and makes it very easy to generate repetitive reports which is of great help for example in performing exploratory analysis on a large dataset with many variables.

Read more…

ggplot2 Version of Figures in “Lattice: Multivariate Data Visualization with R” (Final Part)

August 26, 2009

Over the past weeks I have tried to replicate the figures in Lattice: Multivariate Data Visualization with R using Hadley Wickham’s ggplot2.

With the exception of a few graph types (e.g. ggplot2 doesn’t support 3d-graphs, and there were a few other cases), it was possible to create ggplot2 versions of almost all the figures. Sometimes this required data manipulation before plotting in order to get data into a suitable form to feed into ggplot2, but more often than not ggplot2 provided satisfactory out-of-the-box visualisation very closely comparable to that of lattice.

I would like to conclude this series with comments on a few keywords that stuck to my mind while preparing all these graphs.

Speed

Both lattice and ggplot2 are running on top of the grid graphics, however lattice is a lot faster. A lot. Whilst drawing one or two graphs, one might not even notice the difference in speed, but once the number of graphs increases or the datasets get bigger the relative slowness of ggplot2 becomes more clearly recognisable (have a look at the pdf-s linked to at the end of this post for comparative timings).

Reader Ben Bolker emailed Hadley Wickham about the issue, and the response he got was “So far I have been completely focused on functionality, and not at all on speed. I would really like to spend some time profiling and optimised ggplot2 (I suspect an order of magnitude speed increase would be possible), but unfortunately my summer is filling up rapidly and I am feeling some pressure to write papers rather than (more) code.”

It is good news that speed can be improved, now let’s hope Hadley finds some time to look into this.

Output Customisation

Almost every element of the output of both packages is highly customisable. lattice has more options to tinker with the finest details of the plot, allowing to make sure that the final graph looks exactly the way one wants. Such fine-tuning requires, though, a very good knowledge of the inner workings of the program, as the available options are not always so obvious. I find fine-tuning a graph using the ggplot2’s approach a lot easier, as it is clear which element of the plot is being adjusted.

Still, as always, there is room for improvement – the ability to better manipulate the heights/widths/aspect ratios of facets (facet_grid has the space="fixed" argument, but not facet_wrap); and better control over size and positioning of legends are the two main items that surfaced during this exercise.

Syntax

As already mentioned lattice has a jungle of parameters one can manipulate to achieve the best output possible. Lattice to me is more cluttered with all of its rich options (panel/prepanel functions), and I personally prefer the ggplot2 approach of building up a graph layer by layer using “human-readable” expressions. Compared to the use of various specialised functions in lattice I find this more intuitive and easier to follow.

The lattice panel functions in capable hands make it an extremely powerful tool. However, having seen the lattice examples, only now did I come to fully appreciate the power of the ggplot2 equivalents: stat_summary and stat_function.

Again, if there was one thing to add to my wish-list, it would be the ability to use formulas/functions (e.g. reorder) as facetting variables – allowing to skip one data pre-processing step.

Documentation

ggplot2 has a very good website with many useful examples (the same information without the rendered graphs is included in the help file), as well as a book with good explanations. Using a combination of all these, one gets a good overview of the available options, and answers to the questions that may arise. I especially like the examples on the website, that often highlight the more intricate features of the program.

lattice manual explains all the available options in great detail, sometimes requiring a good amount of concentration and will to go through the instructions. Apart from the book website, one can also make use of R Graphical Manual that includes “a collection of graphics from all R packages”.

Pdf-version of the posts

Some readers requested a pdf-version of the posts – all the chapters have been compiled into one pdf-file that can be downloaded here (6mb).

Another version of the same file which also includes the system.time results for most of the print statements used to generate the images is available here (6mb).

And yet another version with no images can be downloaded here (800 kb).

Tools

I will also list the tools I used to create the blog posts as well as the pdf-files:

  • asciidoc – a text document format for writing short documents, articles, books and UNIX man pages. AsciiDoc files can be translated to HTML and DocBook markups.
  • ascii – ascii is an R package that replaces R results in AsciiDoc document with AsciiDoc markup.
  • blogpost – a command-line weblog client for publishing AsciiDoc documents to WordPress blog hosts. It creates and updates weblog posts and pages directly from AsciiDoc source documents.
  • dblatex – PDF output was generated by passing AsciiDoc generated DocBook through dblatex/pdftex.

ggplot2 Version of Figures in “Lattice: Multivariate Data Visualization with R” (Part 13)

August 20, 2009

This is the 13th post in a series attempting to recreate the figures in Lattice: Multivariate Data Visualization with R (R code available here) with ggplot2.

Previous parts in this series: Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 7, Part 8, Part 9, Part 10, Part 11, Part 12.


Chapter 14 – New Trellis Displays

Topics covered:

  • Examples of S3 and S4 methods
  • Examples of new high level functions

Read more…

ggplot2 Version of Figures in “Lattice: Multivariate Data Visualization with R” (Part 12)

August 18, 2009

This is the 12th post in a series attempting to recreate the figures in Lattice: Multivariate Data Visualization with R (R code available here) with ggplot2.

Previous parts in this series: Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 7, Part 8, Part 9, Part 10, Part 11.


Chapter 13 – Advanced Panel Functions

Topics covered:

  • Built-in panel and accessors functions
  • Examples

Read more…

ggplot2 Version of Figures in “Lattice: Multivariate Data Visualization with R” (Part 11)

August 13, 2009

This is the 11th post in a series attempting to recreate the figures in Lattice: Multivariate Data Visualization with R (R code available here) with ggplot2.

Previous parts in this series: Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 7, Part 8, Part 9, Part 10.


Chapter 11 – Manipulating the “trellis” object

Topics covered:

  • Methods for “trellis” objects
  • Tukey mean-difference plot
  • Other specialized manipulations

Read more…

Block-processing a data frame with plyr

August 12, 2009
tags:

David Smith at REvolutions blog shows how to split the data frame by the values of a variable, and how to perform some kind of operation on each segment using isplit function in iterators package in combination with foreach package. The example below creates three pdf-files in the working directory.

As it happens I was doing something similar when reading David’s post, so I present another alternative to accomplish the same task using plyr. Below are both the isplit and plyr versions for easy comparison – as you can see, the syntax is very similar. When dealing with large datasets I expect isplit to be faster on computers with multiple processors as paired with foreach it makes use of parallel computing capabilities of the latter.

Read more…