Skip to content

ggplot2: LOESS smoothing

March 10, 2009

Jon Peltier writes about the LOESS smoothing in Excel, and presents a utility to facilitate adding smoothers to the data. He goes on to show how to use smoothing to help analyze the body mass indexes (BMI) of Playboy playmates – a topic recently discussed in Flowingdata forums.

He also created the following graph in Excel with the help of a user defined function (UDF).

https://learnr.wordpress.com/wp-content/uploads/2009/03/playmate_loess.png

Now, let’s try to recreate this chart in ggplot2.

First, download the Excel datafile from Wired. I have tried to comment the code so that it is easier to follow.

Load necessary libraries: ggplot2 for graphing, xlsReadWrite for Excel import (the original data on wired.com website is in an Excel file).

> library(ggplot2)
> library(xlsReadWrite)

Import data, and select the first 662 rows, as the data also contains some summaries we don’t need.

> df <- read.xls("1702_Infoporn_Playmate_Data.xls")
> df <- df[1:662, ]

Revise variable names, convert to lower case.

> names(df)[3] <- "bust"
> names(df)[4] <- "cupsize"
> names(df) <- tolower(names(df))

Combine year and month + convert to date format

> df$date <- as.Date(paste(1, df$month, df$year),
+     "%d %B %Y")

First try

> qplot(data = df, date, bmi)
loess1

Replace zero bmi values with NA

> df[, "bmi"] <- ifelse(df[, "bmi"] == 0, NA, df[,
+     "bmi"])

Second try

> (p <- qplot(data = df, date, bmi, xlab = "", ylab = ""))
loess2

Add LOESS smoother

> (p1 <- p + geom_smooth(method = "loess", size = 1.5))
loess3

Add black&white theme, add title, remove axis labels

> p1 + theme_bw() + opts(title = "Playmate BMI - Loess Analysis")
loess4
6 Comments leave one →
  1. March 14, 2009 7:21 pm

    I came to the same conclusion regarding the original, linear regression analysis, and got similar results to yours. fitting a kernel regression in MATLAB. It is worth noting, though, the substantial spread about any of these trend curves.

  2. March 16, 2009 8:11 pm

    I think the reason that the theme_bw() looks much more readable on these webpages is that it matches the background of the page (i.e. white)? With thick text theme_gray() works ok. But it’s great to know that one can alter things easily.

  3. Felipe Carrillo permalink
    August 22, 2009 11:41 pm

    Do you by any chance have the dataset to recreate this example? the Wired website is not longer available.

    • learnr permalink*
      August 23, 2009 10:47 am

      There was a problem with the link in the post – I have now corrected this.

      The Wired story is available here.
      And the data can be downloaded from here.

  4. Matt Neibaur permalink
    October 17, 2009 5:48 pm

    Nice blog! Thanks for sharing your experience with R.

    I found the playmate stats revealing (pun). Keep in mind that a BMI under 20 is considered unhealthy. The trend has been from around 20 in 1953 to 18 today. This is contrasting the general population’s increasing BMI.

    Another interesting graph from the data is the waist hip ratio. The Greeks considered the golden ratio the standard for beauty at 0.61. As the playmate’s BMI decreases, the ratio is being shifted to 0.7. We were closer to the Greek ideal in the 1950’s. (http://en.wikipedia.org/wiki/Golden_ratio)

    There are a number of interesting insights from this data set. A couple come to mind:
    1. our ideal of beauty and the golden ratio
    2. dropping BMI in the models vs the increasing population’s BMI
    3. health implications idealizing an unhealthy & unobtainable ideal for women

    Someone should write an article.

Trackbacks

  1. Statistics and Playmates

Leave a comment