Skip to content

ggplot2: Changing the Default Order of Legend Labels and Stacking of Data

March 23, 2010

“How to change the order of legend labels” is a question that gets asked relatively often on ggplot2 mailing list. A variation of this question is how to change the order of series in stacked bar/lineplots.

While these two questions seem to be related, in fact they are separate as the legend is controlled by scales, whereas stacking is controlled by the order of values in the data.

Recently I spent some time getting my head around this, and below is a quick recap.


Changing the Ordering of Legend Labels

The standard stacked barplot looks like this:

> library(ggplot2)
> ggplot(diamonds, aes(clarity, fill = cut)) + geom_bar()
order_variable-004.png

You notice that in the legend “Fair” is at the top and “Ideal” at the bottom. But what if I would like to order the labels in the reverse order, so that “Ideal” would be at the top?

The order of legend labels can be manipulated by reordering the factor levels of the cut variable mapped to fill aesthetic.

> levels(diamonds$cut)
[1] "Fair"      "Good"      "Very Good" "Premium"
[5] "Ideal"
> diamonds$cut <- factor(diamonds$cut, levels = rev(levels(diamonds$cut)))
> levels(diamonds$cut)
[1] "Ideal"     "Premium"   "Very Good" "Good"
[5] "Fair"
> ggplot(diamonds, aes(clarity, fill = cut)) + geom_bar()
order_variable-007.png

The legend entries are now in reverse order (and so is the stacking).


Changing Data Stacking Order

The order aesthetic changes the order in which the areas are stacked on top of each other.

The following aligns the order of both the labels and the stacking.

> ggplot(diamonds, aes(clarity, fill = cut, order = -as.numeric(cut))) +
+     geom_bar()
order_variable-011.png

Or, alternatively, reordering the factor levels again:

> diamonds$cut <- factor(diamonds$cut, levels = rev(levels(diamonds$cut)))
> ggplot(diamonds, aes(clarity, fill = cut, order = -as.numeric(cut))) +
+     geom_bar()
order_variable-013.png
Advertisement
24 Comments leave one →
  1. Raivo Kolde permalink
    March 24, 2010 11:30 am

    The things are not so simple. You are messing up the data by changing levels as you do. At first plot, the number of ideal diamonds is the smallest, on the second plot it is the largest. So you would have to make the reordering of the levels a bit more cleverly. However, I’m always in trouble when I have to do this, it would be nice to see your solution.

    • learnr permalink*
      March 24, 2010 12:28 pm

      Strange, on all the plots I can see the Ideal diamond count is the largest.

      Please refresh your browser (image) cache, as initially the wrong image files were uploaded, but I have deleted these since.

  2. Jack Tanner permalink
    March 26, 2010 1:45 am

    What’s the incantation to manually specify the order of the levels? I’d rather just type it in.

    How do you change the order of facets in a faceted plot?

    • learnr permalink*
      March 28, 2010 12:17 pm

      There’s nothing wrong with manually specifying the order of the levels, but it becomes more prone to errors if the level names are long.

      Similarly to changing the legend keys, you can change the order of facets by changing the order of the underlying faceting factor.

  3. andy permalink
    May 10, 2010 8:02 pm

    I would like to argue against using this kind of visualization. Obviously, this isn’t a forum for visualization but I feel compelled to write.

    There is an inherent ambiguity whether the values represented by colored areas are absolute or relative. If absolute, the story the data is telling would be significantly different; if they’re relative, determining values is nearly impossible without relative scales.

    So whether or not you can move labels around is, in my opinion, eclipsed by whether you should be trying to in the first place.

    • learnr permalink*
      May 11, 2010 10:10 am

      I agree that stacked plots should be used with caution and care.

      I also think that they are a good tool in performing exploratory data analysis. Consider the following two charts

      qplot(clarity, data=diamonds)
      vs
      qplot(clarity, data=diamonds, fill=cut)

      In my view the second plot (which is a stacked plot as well) conveys considerably more information about the composition of the dataset compared to the first plot.

  4. John G permalink
    August 23, 2010 11:26 pm

    I’ve been using this method with great pleasure. But in version 0.8.8 the order parameter doesn’t affect the order of the bars so this technique doesn’t work. It has been reported as a bug and presumably will be fixed in a future version of ggplot2.

  5. Yannick Pouliot permalink
    September 13, 2010 10:27 pm

    Regarding reordering of legend labels, I’m not having much with the approach described above when using qplot (below). Should I be doing this in ggplot instead? I’ve yet to master that one… 🙂

    Cheers,

    Yannick

    qplot(as.character(PRR), Activity, fill=factor(Assay), data=final5, geom=”boxplot”, position=”dodge”) + theme_bw() + scale_x_discrete(name=’Binarized PRR’,breaks=c(0,1), labels=c(“0″,”1”)) + scale_y_continuous(name = ‘Activity (Z-score)’) + opts(legend.key = theme_blank())

    • learnr permalink*
      September 13, 2010 11:13 pm

      I suggest you have another look at your source data, and see if the factors are ordered properly. What works with qplot works with ggplot as well, I just have decided that setting up ggplot layers is clearer even if it requires more typing.

  6. Monet'sChemist permalink
    March 9, 2011 8:59 pm

    This is a wonderfully useful explanation of how to address the need to order stuff in graphs.

    I wonder if you would consider elaborating further on two points.

    First, in your diamond example “cut” has a kind of natural ranking “Fair”..”Ideal” (or the other way around), so the question I pose will sound a bit odd, but what if I wanted to order cut as “Premium”, “Good”, “Ideal”, “Fair”, “Very Good” (in terms of stacking order and legend order both).

    Second, there is another “ordering surprise” that I have recently encountered, and that is ordering of facets in a multi-facet graph.

    And a closing comment: it seems to me that the majority of stacked bar graphs and facets that I would prepare would have a categorical ordering that I would want to explicitly control. Seldom do I have categories that have any kind of “natural ordering” (like “Fair”..”Ideal”); most often my categories need to be ordered in aid of presentation – in other words, I need to plot the graph and then figure out how to arrange the ordering of stacking (especially) and faceting to present the comparison in its most visually compelling fashion.

    Therefore, boldly assuming the rest of the bar / facet graphers out there are like me 🙂 I humbly suggest that this topic be forever after covered in great and gory detail in all explanatory writings on ggplot 🙂 🙂

    Thanks again for this wonderful set of examples!

    • learnr permalink*
      March 10, 2011 5:03 pm

      All you need to do is to manually change the order of the factor levels.
      For example, something like this:
      diamonds$cut <- factor(diamonds$cut, levels = c("Fair", "Ideal", "Premium", "Very Good", "Good")))

      • Rose permalink
        July 31, 2013 5:33 pm

        Thanks, this helped me a lot! And the distinction between the two cases saved me a lot of trial and error trying to fix what I initially thought was an issue with the legend, but was really an issue with the order of the series I added to the plot.

  7. Alex Brown permalink
    July 15, 2011 1:57 am

    I think we need to way to change the legend order, so factor level 1 appears at the bottom.

    I understand the solutions given above, but for a graph where I want the most significant item at the bottom, it makes sense for the legend order of colors to follow the graph order of colors (level 1 at the bottom).

    To ensure the stability of the color most significant item does not change with respect to the relative levels of the others, and with respect to changing numbers of factor levels, I need the most significant to have factor level 1 (not n).

  8. Daniel permalink
    September 10, 2011 12:56 am

    In a plot with a color scheme (tile plot), why does the legend go from low number at top to high number at bottom? How do I reverse that? And who decided that? That goes contrary to every number line I have ever seen! It is contrary to convention. Using order, and negating whatever is used in this example won’t work for me because the scale is set by ggplot automatically for me.

    • learnr permalink*
      September 12, 2011 1:02 pm

      You can change the order of legend labels either manually in ggplot2 or by reordering the underlying factor.

  9. September 13, 2011 6:55 am

    Thank you. Thank you. This was needed.

  10. Marco permalink
    July 29, 2012 1:09 pm

    If you want to change order of the legend only just type

    + scale_fill_hue(guide = guide_legend(reverse=TRUE))

    • learnr permalink*
      July 29, 2012 6:19 pm

      This works for ggplot2 versions 0.9.0 and above.

    • Rasmus permalink
      November 5, 2013 12:38 pm

      @ Marco: Great 🙂 This new addition to ggplot2 should be on the top of this page.

  11. Faidherbard permalink
    December 17, 2014 6:49 pm

    In the cookbook it also says: http://www.cookbook-r.com/Graphs/Legends_%28ggplot2%29/

    # These two methods are equivalent:
    bp + guides(fill = guide_legend(reverse=TRUE))
    bp + scale_fill_discrete(guide = guide_legend(reverse=TRUE))

    # You can also modify the scale directly:
    bp + scale_fill_discrete(breaks = rev(levels(PlantGrowth$group)))

  12. Faidherbard permalink
    December 17, 2014 6:52 pm

    If you want to avoid modifying your data and/or want to modifiy the order manually (in my case I have a line graph and I want the legend to be ordered in the same way as the last values of the lines), check the cookbook for version 0.9.3: http://www.cookbook-r.com/Graphs/Legends_%28ggplot2%29/

    # These two methods are equivalent:
    bp + guides(fill = guide_legend(reverse=TRUE))
    bp + scale_fill_discrete(guide = guide_legend(reverse=TRUE))

    # You can also modify the scale directly:
    bp + scale_fill_discrete(breaks = rev(levels(PlantGrowth$group)))

    • July 15, 2016 7:45 am

      Thank you, the `aes(order=)` syntax no longer looks to be supported, but this works.

  13. November 9, 2015 3:57 pm

    Great stuff. Just what I needed. Thanks for the R wizardry!

Trackbacks

  1. Daily Digest for March 23rd at dandube.com

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: