Skip to content

ggplot2 Version of Figures in “Lattice: Multivariate Data Visualization with R” (Final Part)

August 26, 2009

Over the past weeks I have tried to replicate the figures in Lattice: Multivariate Data Visualization with R using Hadley Wickham’s ggplot2.

With the exception of a few graph types (e.g. ggplot2 doesn’t support 3d-graphs, and there were a few other cases), it was possible to create ggplot2 versions of almost all the figures. Sometimes this required data manipulation before plotting in order to get data into a suitable form to feed into ggplot2, but more often than not ggplot2 provided satisfactory out-of-the-box visualisation very closely comparable to that of lattice.

I would like to conclude this series with comments on a few keywords that stuck to my mind while preparing all these graphs.

Speed

Both lattice and ggplot2 are running on top of the grid graphics, however lattice is a lot faster. A lot. Whilst drawing one or two graphs, one might not even notice the difference in speed, but once the number of graphs increases or the datasets get bigger the relative slowness of ggplot2 becomes more clearly recognisable (have a look at the pdf-s linked to at the end of this post for comparative timings).

Reader Ben Bolker emailed Hadley Wickham about the issue, and the response he got was “So far I have been completely focused on functionality, and not at all on speed. I would really like to spend some time profiling and optimised ggplot2 (I suspect an order of magnitude speed increase would be possible), but unfortunately my summer is filling up rapidly and I am feeling some pressure to write papers rather than (more) code.”

It is good news that speed can be improved, now let’s hope Hadley finds some time to look into this.

Output Customisation

Almost every element of the output of both packages is highly customisable. lattice has more options to tinker with the finest details of the plot, allowing to make sure that the final graph looks exactly the way one wants. Such fine-tuning requires, though, a very good knowledge of the inner workings of the program, as the available options are not always so obvious. I find fine-tuning a graph using the ggplot2’s approach a lot easier, as it is clear which element of the plot is being adjusted.

Still, as always, there is room for improvement – the ability to better manipulate the heights/widths/aspect ratios of facets (facet_grid has the space="fixed" argument, but not facet_wrap); and better control over size and positioning of legends are the two main items that surfaced during this exercise.

Syntax

As already mentioned lattice has a jungle of parameters one can manipulate to achieve the best output possible. Lattice to me is more cluttered with all of its rich options (panel/prepanel functions), and I personally prefer the ggplot2 approach of building up a graph layer by layer using “human-readable” expressions. Compared to the use of various specialised functions in lattice I find this more intuitive and easier to follow.

The lattice panel functions in capable hands make it an extremely powerful tool. However, having seen the lattice examples, only now did I come to fully appreciate the power of the ggplot2 equivalents: stat_summary and stat_function.

Again, if there was one thing to add to my wish-list, it would be the ability to use formulas/functions (e.g. reorder) as facetting variables – allowing to skip one data pre-processing step.

Documentation

ggplot2 has a very good website with many useful examples (the same information without the rendered graphs is included in the help file), as well as a book with good explanations. Using a combination of all these, one gets a good overview of the available options, and answers to the questions that may arise. I especially like the examples on the website, that often highlight the more intricate features of the program.

lattice manual explains all the available options in great detail, sometimes requiring a good amount of concentration and will to go through the instructions. Apart from the book website, one can also make use of R Graphical Manual that includes “a collection of graphics from all R packages”.

Pdf-version of the posts

Some readers requested a pdf-version of the posts – all the chapters have been compiled into one pdf-file that can be downloaded here (6mb).

Another version of the same file which also includes the system.time results for most of the print statements used to generate the images is available here (6mb).

And yet another version with no images can be downloaded here (800 kb).

Tools

I will also list the tools I used to create the blog posts as well as the pdf-files:

  • asciidoc – a text document format for writing short documents, articles, books and UNIX man pages. AsciiDoc files can be translated to HTML and DocBook markups.
  • ascii – ascii is an R package that replaces R results in AsciiDoc document with AsciiDoc markup.
  • blogpost – a command-line weblog client for publishing AsciiDoc documents to WordPress blog hosts. It creates and updates weblog posts and pages directly from AsciiDoc source documents.
  • dblatex – PDF output was generated by passing AsciiDoc generated DocBook through dblatex/pdftex.
About these ads
31 Comments leave one →
  1. August 26, 2009 1:20 pm

    Thanks for this. I believe your blog is very influential and a great learning resource. I even learned about asciidoc thanks to you :-) Didn’t see that coming.

    I’m beginning to get a simple grasp of the ggplot syntax – so I’ll be sticking with that. Besides speed, what bothers me a bit though, is that with ggoplot I cant layout my graphs with e.g. par(mfrow), and I cant easily add e.g. mtext(). So viewports and grid is on my to-learn list :-) – well worth it though.

    • learnr permalink*
      August 26, 2009 3:22 pm

      Thank you for the kind words.

      1. There are two options for laying out multiple ggplot2 graphs:
      a) look at this example.

      b) use the arrange() function from ggextra which is basically a wrapper around the method referred to above.

      2) Have you tried geom_text() and annotate() in ggplot2 which in my view offer quite a flexible way of annotating plots?

  2. David permalink
    August 26, 2009 5:28 pm

    I think this blog has been excellent and will prove to be a valuable source of information for many. I certainly agree with the speed issue of ggplot; it is the main reason I have stuck with lattice. I for example often plot time series of data e.g. a few years of hourly data and ggplot2 is just too slow. Level-type plots are the same.

    I also wonder how useful ggplot is for use in packages. For example, I can pass arguments for plot details in lattice using … – for example, type = “l”, pch = 16. Where you want user control for some of these parameters, I found ggplot less convenient and has to write quite a lot of additional code to capture these things. The same is true for colours.

    Having said that, ggplot does take care of many of these issues.

    Thanks again for such a useful series of posts.

    David

    • August 26, 2009 6:45 pm

      If no-one tells me which particular cases are slow, I can not fix them. I know ggplot2 is somewhat slower that base and lattice graphics, but it shouldn’t be _that_ much slower (I’d expect 10-20% in most cases). If you have an example where ggplot2 is much much slower that lattice, please send it to me so I can add it to my bug list.

      • learnr permalink*
        August 26, 2009 7:34 pm

        Have a look at this file that includes the comparative system.time results for generating almost all of the graphs (still need to figure out how to get the results in a table). The speed difference between lattice and ggplot2 is never close to 10-20%.

        That being said, I am not 100% sure whether system.time() is the right function to measure the performance, and it could well be that the results are different on computers running different operating systems.

        Quickly browsing through the first chapters, for example figures 3.4, 3.8, 5.14, 5.19, 6.9 seem to have quite big differences in terms of the system.time results.

        I am running WinXP with 3gb RAM, R 2.9.1.

      • David permalink
        August 26, 2009 7:42 pm

        Hadley – have a look at learnr’ timings in the document with no images (the 800k one above). Some of the ggplot times are much slower than lattice ~ factor of 7 typically.

      • August 26, 2009 7:58 pm

        Hmmm, ok, that’s much slower than I had thought (thanks so much for the concrete numbers!). It would be really helpful if you could select say the 5 plots with the biggest difference between lattice and ggplot2 and put the code into a single R file. That way I’ll have that much extra time to spend optimising the code.

      • August 27, 2009 6:06 pm

        Hmm. It looks like about 40% of the time is doing data preparation (computing the statistics, scales, etc), about 40% laying out the plot and 20% actually drawing the plot. So the bulk of the time is spent in layout and drawing and it should be possible to do this much faster. Unfortunately it will require quite a big rewrite of the layout system which will take some time.

  3. August 26, 2009 5:42 pm

    Thanks for the great job! I predict the pdf-version will soon become a handy reference for many R users.

  4. Juliet permalink
    August 26, 2009 10:53 pm

    Thanks for the great blog. I’ve found it very helpful. I find the ggplot syntax easy to remember.

  5. richardprichard permalink
    August 27, 2009 12:25 pm

    This is good work, thank you for sharing. I have found the learning curve on ggplot2 to be less savage than I was expecting.

    My problem is with the data-wrangling that has to happen with real datasets prior to getting to graph – all the cleaning, subsetting and summarising. While there are bits & bobs all over the place, I’m still looking for the hand-holding I need to understand which tools are available and when to use them. Something that would take me a few seconds in a pivot table has me (currently) stumped – NAs and datasets of differing lengths.

    Grrr.

    • August 27, 2009 4:07 pm

      Two pieces of advice:

      1. Read Phil Spector’s “Data manipulation with R“.

      2. Check out my plyr and reshape packages

      • richardprichard permalink
        September 6, 2009 2:22 pm

        Thanks for the tip, Hadley. I’d noticed the book, but had held off as its part of an expensive series.
        It came through the post yesterday – it’s great. The tone reminds me of the original K&R ‘C’ language book – even the fonts are similar. It’s at the opposite end of the spectrum to the ‘head first’ book I read last. Probably showing my age, but I think I prefer it.

        cheers. Rich

  6. August 27, 2009 5:01 pm

    Regarding speed. I have the impression that there are big differences between different computers. On my acer aspire one, I did an install.packages(“ggplot2″, dep=T) – and on this computer every chart takes FOREVER!!! – On my somewhat older, but maybe equally powerless stationary workcomputer everything goes much faster (still not as fast as base graphics – but still). So maybe it’s hardware, maybe some dependencies make ggplot work slow… I dont know – it’s just an observation.

    Nothing takes away the fact that ggplot2 ROCKS! and that Hadley and Learnr together (I guess most credit should go to Hadley) have made a better statistician.

    p.s. running various ubuntu distros

  7. August 27, 2009 6:09 pm

    Thanks again for taking the time to do this – you have made a fantastic resource for ggplot2 and lattice users.

    I agree it would be great to able to use formulas when faceting – unfortunately it’s tricky to get everything working right when you have both formulas in facets and layers that don’t include some of the facetting variables.

  8. Juliet permalink
    September 4, 2009 7:36 pm

    So what is your next blog topic going to be?

    • learnr permalink*
      September 7, 2009 10:05 am

      Any recommendations / wishes / ideas?

      • Arun Eamani permalink
        September 9, 2009 7:54 pm

        We want a series on data mining please

      • learnr permalink*
        September 9, 2009 11:00 pm

        Could you be a bit more specific?

  9. baptiste permalink
    September 10, 2009 1:34 pm

    Thank you for this great reference!

    The list of tools is also very interesting.

    I guess the next thing to try is converting the entire R graph gallery to ggplot2, clearly ;-)

    [*] http://addictedtor.free.fr/graphiques/

    • learnr permalink*
      September 10, 2009 6:24 pm

      Thanks for the interesting suggestion. Hmm, need to think about it. Never say never. :)

  10. Roberto permalink
    September 23, 2009 9:52 pm

    Thanks for preparing these tutorials, they are very useful.

    Roberto

  11. Kevin Wright permalink
    February 11, 2010 12:17 am

    From my perspective, the biggest obstacle holding me back from using ggplot is speed. A plot that takes 1-2 seconds in lattice can take about 10 seconds in ggplot. Doesn’t sound long, until you do it dozens of times per hour…

    One thing I found fascinating about this series of posts is how similar is the amount of lattice code and ggplot code needed to produce the same plot. There were some variations, but all in all the amount of code was similar.

    • learnr permalink*
      February 11, 2010 10:09 am

      You are right, speed is an issue – hopefully we may see some changes soon.

    • February 11, 2010 6:18 pm

      I’m pretty surprised that the length of code is similar too – generally when you translate from one system to another, the translation is always longer. If you’d expect lattice code to be shorter anywhere, it’s in the examples shown in the book. What you don’t see is all the plots that are very difficult to do in lattice, but easy to do in ggplot2.

      I agree about the speed, and I have a generous donation to work on speed this summer so I hope we will see some big gains then.

  12. Richard Kittler permalink
    January 22, 2011 6:19 am

    Thank you for the excellent work in making these comparisons. Any chance to obtain a compendium of the scripts for ongoing use in benchmarking?

    • learnr permalink*
      February 7, 2011 3:29 am

      Sorry for the late reply.

      I could provide you the code extracted via Stangle() for each chapter. Would this be something you had in mind?

      • Richard Kittler permalink
        February 10, 2011 6:05 am

        Thanks for the offer but I won’t need it after all. I chose to just extract all of the examples from the help topics since my real goal was to compare performance across platforms rather than across packages.

  13. John permalink
    March 19, 2012 9:13 pm

    Thank you very very much.

Trackbacks

  1. Wordpress Blogging with R in 3 Steps « Learning R
  2. R-ohjelmointi.org » Blog Archive » Blogi: learning R

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 167 other followers

%d bloggers like this: