Skip to content

ggplot2: Crayola Crayon Colours

January 21, 2010

Statistical Algorithms blog attempted to recreate a graph depicting the growing colour selection of Crayola crayons in ggplot2 (original graph below via FlowingData).

He also asked the following questions: Is there an easier way to do this? How can I make the axes more like the original? What about the white lines between boxes and the gradual change between years? The sort order is also different.

I will present my version in this post, trying to address some of these questions.

crayons_small.png

Data Import

The list of Crayola crayon colours is available on Wikipedia, and also contains one duplicate colour (#FF1DCE) that was excluded to make further processing easier.

> library(XML)
> library(ggplot2)
> theurl <- "http://en.wikipedia.org/wiki/List_of_Crayola_crayon_colors"
> html <- htmlParse(theurl)
> crayola <- readHTMLTable(html, stringsAsFactors = FALSE)[[2]]
> crayola <- crayola[, c("Hex Code", "Issued", "Retired")]
> names(crayola) <- c("colour", "issued", "retired")
> crayola <- crayola[!duplicated(crayola$colour),
+     ]
> crayola$retired[crayola$retired == ""] <- 2010

Plotting

Instead of geom_rect() I will show two options of plotting the same data using geom_bar() and geom_area() to plot the data, and need to ensure that there’s one entry per colour per year it was(is) in the production.

> colours <- ddply(crayola, .(colour), transform,
+     year = issued:retired)

The plot colours are manually mapped to the original colours using scale_fill_identity().

> p <- ggplot(colours, aes(year, 1, fill = colour)) +
+     geom_bar(width = 1, position = "fill", binwidth = 1) +
+     theme_bw() + scale_fill_identity()
crayola_colours-006.png

And now the geom_area() version:

> p1 <- ggplot(colours, aes(year, 1, fill = colour)) +
+     geom_area(position = "fill", colour = "white") +
+     theme_bw() + scale_fill_identity()
crayola_colours-008.png

Final Formatting

Next, the x-axis labels suggested by ggplot2 will be manualy overridden. Also I use a little trick to make sure that the labels are properly aligned.

> labels <- c(1903, 1949, 1958, 1972, 1990, 1998,
+     2010)
> breaks <- labels - 1
> x <- scale_x_continuous("", breaks = breaks, labels = labels,
+     expand = c(0, 0))
> y <- scale_y_continuous("", expand = c(0, 0))
> ops <- opts(axis.text.y = theme_blank(), axis.ticks = theme_blank())
> p + x + y + ops
crayola_colours-011.png
> p1 + x + y + ops
crayola_colours-013.png

The order of colours could be changed by sorting the colours by some common feature, unfortunately I did not find an automated way of doing this.

Sorting by Colour

Thanks to Baptiste who showed a way to sort the colours, the final version of the area plot resembles the original even more closely.

> library(colorspace)
> sort.colours <- function(col) {
+     c.rgb = col2rgb(col)
+     c.RGB = RGB(t(c.rgb) %*% diag(rep(1/255, 3)))
+     c.HSV = as(c.RGB, "HSV")@coords
+     order(c.HSV[, 1], c.HSV[, 2], c.HSV[, 3])
+ }
> colours = ddply(colours, .(year), function(d) d[rev(sort.colours(d$colour)),
+     ])
> last_plot() %+% colours
crayola_colours-017.png
About these ads
22 Comments leave one →
  1. January 21, 2010 5:32 pm

    That looks great! I think the key thing missing (beyond the color sort) is the white lines above each color. Any ideas there?

    • learnr permalink*
      January 21, 2010 7:24 pm

      White lines above each colour are possible only for the area plot.
      I have updated the post accordingly.

      • January 21, 2010 8:26 pm

        Fantastic. That looks much closer to the original.

  2. Jay permalink
    January 21, 2010 5:54 pm

    Nice chart and great blog!

    I’m curious if you have any ideas on smoothing the edges out in this chart?

    • learnr permalink*
      January 21, 2010 7:31 pm

      Unfortunately, I don’t think it is possible to smooth the edges.
      I would like to be proven wrong, though.

      • Tobias permalink
        January 22, 2010 3:32 pm

        It is smooth using Cairo. Cairo is much slower, of course.

      • Peter S. permalink
        July 20, 2010 8:48 pm

        I think it has to do with whether anti-aliasing is turned on with your build of R. I couldn’t figure out what one of my friends was on about when he said all his graphics in R were jagged looking. It turns out that anti-aliasing is on by default on Mac OS X, but not some other platforms.

        As the other poster mentioned, Cairo seems to fix this (at least on Linux?). I found this website that talks about how to use Cairo: http://www.mailund.dk/index.php/2009/01/25/antialias-plotting-in-r-using-cairo/

        I started working through examples yesterday (only finished the Playmate BMI graph so far), but I can confirm that the PDFs dumped out of my Mac with a stock download of R do look crisp and antialiased.

  3. Jay permalink
    January 21, 2010 7:38 pm

    p1 vs. p certainly cleans up much of the issue of the “rough” edges.

  4. January 21, 2010 8:19 pm

    Color sort should be easy in HSV, but it looks from a quick googling like there might be some trickiness in converting between color spaces in R. I suspect that an actual color sort will make some of the branches seem less gnarled than the original.

    The original colors look better than their uncorrected hex representations because of color profiles (this would be an easy improvement).

    • January 21, 2010 10:52 pm

      Wikipedia also has RGB, if that would help. I just pulled in HEX in the original version because I didn’t realize that sorting on HEX would result in this crazy configuration. That being said, I think that it’s clear that sorting colors is no easy matter.

  5. baptiste permalink
    January 22, 2010 12:23 am

    I believe the colorspace package could help in sorting the colours. A quick test follows,

    library(colorspace)
    # convert to RGB
    c.rgb = col2rgb(crayola$colour)
    c.RGB = RGB(t(c.rgb) %*% diag(rep(1/255,3)))
    # convert to HSV
    c.HSV = as(c.RGB, “HSV”)@coords

    # sort the colours by hue
    c.HSV.s = c.HSV[order(c.HSV[,1]),]

    # utility to draw colours
    colorStrip =
    function (fill = 1:3, colour = “white”, draw = TRUE)
    {
    x <- seq(0, 1 – 1/length(fill), length = length(fill))
    y <- rep(0.5, length(fill))
    my.grob <- grid.rect(x = unit(x, "npc"), y = unit(y, "npc"),
    width = unit(1/length(colors), "npc"), height = unit(1,
    "npc"), just = "left", hjust = NULL, vjust = NULL,
    default.units = "npc", name = NULL, gp = gpar(fill = fill,
    col = colour), draw = draw, vp = NULL)
    my.grob
    }

    # original colours
    g1 = colorStrip(crayola$colour)

    # we still have them at the end
    g2 = colorStrip(hsv(c.HSV[,1]/360,
    c.HSV[,2],
    c.HSV[,3]))

    # but they can be sorted by Hue
    g3 = colorStrip(hsv(c.HSV.s[,1]/360,
    c.HSV.s[,2],
    c.HSV.s[,3]))

    # comparison
    library(gridExtra)
    arrange(g1,g2,g3,ncol=1)

    • baptiste permalink
      January 22, 2010 2:07 am

      … further to this suggestion, adding the following comes closer to the original,

      library(colorspace)

      sort.colours <- function(col){
      ## convert to RGB
      c.rgb = col2rgb(col)
      c.RGB = RGB(t(c.rgb) %*% diag(rep(1/255,3)))
      ## convert to HSV
      c.HSV = as(c.RGB, "HSV")@coords
      ## sorting by h, s, v
      order(c.HSV[,1], c.HSV[,2], c.HSV[,3])
      }

      colours = ddply(colours, .(year), function(d) d[rev(sort.colours(d$colour)), ])

  6. January 22, 2010 1:42 am

    Wow – you never cease to amaze me in your adroit application of ggplot. I didn’t think this chart was possible in R.

    Still it seems that this plot would benefit from post-production. That is – export it as an SVG and do some polishing in Inkscape. That would be a good way to get the anti-aliasing and maybe reposition the labels.

  7. January 23, 2010 9:58 am

    So the last thing – and it’s an important one – is the font. Is there any way to bring proper type to R graphics? Allegedly the tikz device will make latex code that would then be able to us os fonts via xetex. Not at all an optimal solution though.

    The original just looks like Myriad Semibold. Even so, Myriad would be a welcome addition to R graphics in any grdev.

  8. baptiste permalink
    January 30, 2010 3:44 pm

    The colorspace package is actually not necessary. One can instead use this function,

    sort.colours <- function(col) {
    RGBColors <- col2rgb(col)
    HSVColors <- rgb2hsv(RGBColors[1,], RGBColors[2,], RGBColors[3,],
    maxColorValue=255)
    HueOrder <- order( HSVColors[1,], HSVColors[2,], HSVColors[3,] )
    return(HueOrder)
    }

    to achieve the same result. I just found this code here:

    http://research.stowers-institute.org/efg/R/Color/Chart/

  9. February 2, 2010 1:43 pm

    Very nice bit of programming. I thought that the gantt.chart function might provide an alternative format, and with a tweak or two, I produced the chart at:

    Jim

  10. John Henry permalink
    April 12, 2013 4:39 pm

    Just started looking into R.
    This is an amazing demonstration of R’s ability to generate such output from just a few lines of code.
    Top job

Trackbacks

  1. Mosaic time series in R » Statistical Algorithms
  2. Tweets that mention ggplot2: Crayola Crayon Colours « Learning R -- Topsy.com
  3. blag » Crayola colours
  4. Japanese Fisheries Scientists on Japanese Media Coverage of the CITES Bluefin Tuna Decision « achikule!
  5. Fargestifter - Lilly Apps

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 172 other followers

%d bloggers like this: