Skip to content

brew: Creating Repetitive Reports

September 9, 2009

United Nations report World Population Prospects: The 2008 Revision (highlights available here) provides data about the historical and forecasted population of the country. In exploring the future and past population trends it is relatively easy to subset the dataset by your selected variable.

> file <- c("UNdata_Population.csv")
> population <- read.csv(file)
> names(population) <- c("code", "country", "year",
+     "variant", "value")
> df <- subset(population, year <= 2005)

Likewise, it is straightforward to produce a 263 page pdf-file that shows the population trend between 1950 – 2005 for all the countries in the dataset.

> pdf("population_growth.pdf", paper = "a4")
> d_ply(df, .(country), function(x) plot(x$year,
+     x$value, type = "l", main = unique(x$country)))
> dev.off()

However, if one would like to create a report showing some summary tables and graphs along with some textual description for all the countries, then the process becomes more complicated. This is exactly what the brew() package written by Jeffrey Horner was designed to help the users with. brew “implements a templating framework for mixing text and R code for report generation” and makes it very easy to generate repetitive reports which is of great help for example in performing exploratory analysis on a large dataset with many variables.

The report-generation can be split into three parts:

  1. Prepare the data.
    1. Add Regional Information to Population Data.
    2. Generate Graphs and Data to be included in the report
  2. Prepare the report template.
    1. Helper Functions
    2. Brew template – population.brew
  3. Produce the report.

Data Preparation

Data preparation script is saved in the popreportdata.R file which is later sourced into the brew template.

Add Regional Information to Population Data

In fact I would like to explore the data by continent rather than by country, so in order to do this the first step is to group the countries by continent with the help of Isocodes package that contains standard country or area codes and geographical regions used by UN.

> library(ggplot2)
> library(ISOcodes)
> data("UN_M.49_Countries")
> data("UN_M.49_Regions")
> Regions <- subset(UN_M.49_Regions, Type == "Region")
> regionsdf <- ddply(Regions, .(Code, Name, Parent),
+     function(x) {
+         df <- data.frame(strsplit(x$Children,
+             ", "))
+         names(df) <- "countrycode"
+         df
+     })
> countries <- merge(regionsdf, UN_M.49_Countries,
+     by.x = "countrycode", by.y = "Code")
> countries <- rename(countries, c(Name.x = "region",
+     Name.y = "country", Code = "regioncode", Parent = "parentcode"))
> countries <- merge(countries, Regions[, 1:2],
+     by.x = "parentcode", by.y = "Code")
> countries <- rename(countries, c(Name = "continent"))
> countries$countrycode <- as.numeric(as.character(countries$countrycode))

countries dataframe contains now regional classification for each country. Next step is to merge a subset of this information with the population data.

> population <- merge(population, countries[, c("countrycode",
+     "continent")], by.x = "code", by.y = "countrycode")
> population$value <- population$value/1000

Generate Graphs and Data to be included in the report

In this step the graphs are saved to the disk using ggsave, and a list of lists with four dataframes about each continent is returned.

> popreportdata <- dlply(population, .(continent),
+     function(df) {
+         continent <- gsub(" ", "_", unique(df$continent))
+         filename <- function(y) {
+             paste("graphs\", continent, y, ".pdf",
+                 sep = "")
+         }
+         forecast <- subset(df, variant != "Estimate variant")
+         forecast$variant <- forecast$variant[,
+             drop = TRUE]
+         historic <- subset(df, variant == "Estimate variant")
+         historic <- ddply(historic, .(continent,
+             year), transform, cont_value = sum(value))
+         current <- subset(df, year == 2005)
+         growthrate <- function(df) {
+             rng <- range(df$year)
+             min_value <- df[df$year == rng[1],
+                 "value"]
+             max_value <- df[df$year == rng[2],
+                 "value"]
+             abs_growth <- max_value/min_value
+             yr5_growth <- abs_growth^(1/length(df$year))
+             growthdf <- data.frame(min_value,
+                 max_value, abs_growth, yr5_growth)
+             names(growthdf)[1:2] <- c(rng[1],
+                 rng[2])
+             growthdf
+         }
+         growth <- ddply(forecast, .(continent,
+             country, variant), growthrate)
+         growth$variant <- factor(growth$variant,
+             levels = c("Constant-fertility scenario",
+                 "High variant", "Medium variant",
+                 "Low variant"))
+         growth <- sort_df(growth, vars = c("continent",
+             "variant", "abs_growth"))
+         blabel <- c(0.01, 0.1, 1, 10, 100)
+         alabel <- "Population (in millions)"
+         phist <- ggplot(current, aes(value)) +
+             geom_histogram(binwidth = 0.5, fill = NA,
+                 colour = "black") + scale_x_log10(breaks = blabel,
+             labels = blabel) + labs(x = alabel)
+         ggsave(filename("_hist"), phist, dpi = 100)
+         prank <- ggplot(current, aes(seq_along(country),
+             rev(sort(value)))) + geom_point() +
+             scale_y_log10(breaks = blabel, labels = blabel) +
+             labs(x = "Rank", y = alabel)
+         ggsave(filename("_rank"), prank, dpi = 100)
+         pbox <- ggplot(historic, aes(factor(year),
+             value)) + geom_boxplot() + labs(x = "",
+             y = alabel)
+         ggsave(filename("_box"), pbox, dpi = 100)
+         ptrend <- ggplot(historic, aes(year, value)) +
+             stat_summary(fun.y = "sum", geom = "line",
+                 colour = "red", size = 1) + stat_summary(data = forecast,
+             aes(y = value, group = variant, colour = variant),
+             fun.y = "sum", geom = "line", size = 1) +
+             labs(y = alabel, x = "") + opts(legend.position = c(0.8,
+             0.3), legend.background = theme_blank(),
+             legend.key = theme_blank()) + scale_colour_hue("Forecast")
+         ggsave(filename("_trend"), ptrend, width = 8,
+             height = 4, dpi = 100)
+         pgrowth <- ggplot(growth, aes(variant,
+             abs_growth, colour = variant)) + geom_boxplot() +
+             xlab("") + opts(legend.position = "none")
+         ggsave(filename("_abs_growth"), pgrowth,
+             dpi = 100)
+         pann_growth <- ggplot(growth, aes(variant,
+             yr5_growth, colour = variant)) + geom_boxplot() +
+             xlab("") + opts(legend.position = "none")
+         ggsave(filename("_ann_growth"), pann_growth,
+             dpi = 100)
+         current <- current[with(current, order(-value)),
+             ]
+         top <- head(current, 5)[, c("country",
+             "value")]
+         bottom <- tail(current, 5)[, c("country",
+             "value")]
+         growth <- growth[growth$variant == "Medium variant",
+             c(2, 4:7)]
+         growth <- growth[order(-growth$abs_growth),
+             ]
+         names(growth)[c(4:5)] <- c("Abs.Growth",
+             "Compound Growth")
+         growthtop <- head(growth, 5)
+         growthbottom <- tail(growth, 5)
+         list(top = top, bottom = bottom, growthtop = growthtop,
+             growthbottom = growthbottom)
+     })

Report Template

My report template is essentially a latex document which includes an R code loop wrapped in brew syntax. To make it easier to generate the latex statements for inclusion of graphs and tables, a few helper functions are defined first.

Helper functions

> include_graph <- function(width = 1, filename) {
+     paste("\includegraphics[width=", width, "\linewidth]{",
+         filename, "}", sep = "")
+ }
> include_tbl <- function(width = 1, filename) {
+     print(xtable(filename), table.placement = "",
+         latex.environments = "", include.rownames = FALSE,
+         floating = FALSE)
+ }
> subfloat_graph <- function(width, filename, caption = "") {
+     paste("\subfloat[", caption, "]{", "\begin{minipage}[h]{",
+         width, "\linewidth}\centering", include_graph(width = 1,
+             filename), "\end{minipage}}", sep = "")
+ }
> subfloat_tbl <- function(width, filename, caption) {
+     paste("\subfloat[", caption, "]{", "\begin{minipage}[h]{",
+         width, "\linewidth}\centering", print(xtable(filename),
+             file = stderr(), table.placement = "",
+             latex.environments = "", include.rownames = FALSE,
+             floating = FALSE), "\end{minipage}}",
+         sep = "")
+ }

Brew template – population.brew

Brew syntax is really very simple to use. From the help file:

  1. All text that falls outside of the delimiters is printed as-is.
  2. R expressions between the <% and %> delimiters are executed in-place.
  3. The value of the R expression between the <%= and %> delimiters is printed
\documentclass[oneside]{article}

\usepackage[margin=2cm,nohead]{geometry}
\usepackage[pdftex]{graphicx}
\usepackage{subfig}
\usepackage{float}
\usepackage{verbatim}

\usepackage{hyperref}
\hypersetup{
  colorlinks=true,
  pdfauthor={http://learnr.wordpress.com}
  }

\graphicspath{{./graphs/}}

\title{World Population Trends}
\author{\url{http://learnr.wordpress.com}}
\date{\today}
\raggedbottom
\setcounter{tocdepth}{1}

\begin{document}

\maketitle

This report has been compiled based on the United Nations report World Population Prospects: The 2008 Revision (highlights available \href{http://www.un.org/esa/population/publications/wpp2008/wpp2008_highlights.pdf}{here}). The dataset can be accessed \href{http://data.un.org/Data.aspx?d=PopDiv&f=variableID%3a12&c=1,2,4,6,7&s=_crEngNameOrderBy:asc,_timeEngNameOrderBy:desc,_varEngNameOrderBy:asc&v=1}{here}.

\tableofcontents

<% library(xtable); library(ggplot2)%>

<% for (i in seq_along(names(popreportdata))) { -%>

\pagebreak

<% i = names(popreportdata)[i] %>
<% reportlist <- popreportdata[match(i,names(popreportdata))][[1]] %>
<% filename <- function(y){paste(gsub(" ", "_", i) , y, ".pdf", sep="")} %>

<%=cat("\section{", i, "}", sep="") %>

\begin{figure}[H]
  \centering
  <%= include_graph(width = 1, filename("_trend")) %>
  <%= subfloat_graph(0.33, filename("_hist"), "Histogram") %>
  <%= subfloat_graph(0.33, filename("_rank"), "Rank Curve") %>
  <%= subfloat_graph(0.33, filename("_box"), "Boxplot") %>
  \caption{Distribution plots}
\end{figure}

\begin{table}[h]
  \centering
  <%= subfloat_tbl(0.4, reportlist[[1]], "Top 5 Countries") %>
  <%= subfloat_tbl(0.4, reportlist[[2]], "Bottom 5 Countries") %>
  \caption{Population in 2005}
\end{table}

\begin{figure}
  \centering
  <%= subfloat_graph(0.5, filename("_abs_growth"), "Absolute Growth") %>
  <%= subfloat_graph(0.5, filename("_ann_growth"), "Annual Compound Growth") %>
  \caption{Growth charts 2010 - 2050}
\end{figure}

\begin{table}[H]
  \centering
  <%= subfloat_tbl(1, reportlist[[3]], "Top 5 Growing Countries") %>
  \quad
  <%= subfloat_tbl(1, reportlist[[4]], "Bottom 5 Growing Countries") %>
  \caption{Growth tables 2010 - 2050}
\end{table}

<% } -%>

\end{document}

Produce the report

> library(tools)
> library(brew)
> brew("population.brew", "population.tex")
> texi2dvi("population.tex", pdf = TRUE)

The resulting pdf-file can be seen here

About these ads
31 Comments leave one →
  1. September 8, 2009 5:42 pm

    What are the differences/advantages over weave?

    • learnr permalink*
      September 8, 2009 6:09 pm

      If there are no repetitive sections with multiple elements in the report, Sweave does the job as well as brew does.

      The main advantage of brew is the option of looping over code chunks. I was not able to achieve the same functionality in Sweave.

      In the example above I decided to have 2 separate files, instead I could have easily had just one .brew file including both the code to generate and display the report.

    • October 30, 2010 6:41 am

      Another advantage of brew is that it is not married to LaTeX, unlike Sweave. I have used it to good effect with the ascii package to weave R with emacs org-mode files and reStructured Text (pydocs) files. Of course, these paths aren’t quite as powerful as LaTeX, but for quick reports which are reasonably easy to write and edit, it’s a pretty good solution.

      • nanounanue permalink
        October 25, 2011 1:13 am

        Hi!

        Could you provide an example of using brew and emacs org-mode?

        Thanks in advance

      • learnr permalink*
        November 17, 2011 12:42 am

        Unfortunately I am not an emacs user myself, so would struggle to fulfill your request.

      • nanounanue permalink
        November 17, 2011 8:51 pm

        Thank you anyway :)

        Yous site is a real gem!

        Keep it in that way

  2. r'ish permalink
    September 8, 2009 10:34 pm

    You really learn r fast. Thanks for this great blog!

  3. November 25, 2009 1:15 am

    Good article highlighting writing functions to automate repetitive tasks in R – a bit of planning at the start of an analysis can save a lot of work later if you build in flexibility rather than having to copy and paste and keep changing small parts of the code.

  4. Toby Popenfoose permalink
    December 1, 2009 10:43 pm

    Excellent Site. Thank you for the ggplot2 examples!!!!

    I made the following changes to popreportdata.R (helper functions only) and population.Rnw to get Sweave to work with this population example:

    # To make it easier to generate the latex statements for
    # inclusion of graphs and tables, a few helper functions are defined first.
    include_graph <- function(width = 1, filename) {
    paste("\\includegraphics[width=", width, "\\linewidth]{", filename, "}", sep = "")
    }
    include_tbl <- function(width = 1, filename) {
    print(xtable(filename), table.placement = "", latex.environments = "", include.rownames = FALSE, floating = FALSE)
    }
    subfloat_graph <- function(width, filename, caption = "") {
    cat(paste("\\subfloat[", caption, "]{", "\\begin{minipage}[h]{",
    width, "\\linewidth}\\centering", include_graph(width = 1, filename), "\\end{minipage}}\n", sep = ""))
    }
    subfloat_tbl <- function(width, filename, caption) {
    cat(paste("\\subfloat[", caption, "]{", "\\begin{minipage}[h]{",
    width, "\\linewidth}\\centering", print(xtable(filename),
    file = stderr(), table.placement = "",
    latex.environments = "", include.rownames = FALSE,
    floating = FALSE), "\\end{minipage}}\n", sep = ""))
    }

    \documentclass[oneside]{article}

    \usepackage[margin=2cm,nohead]{geometry}
    \usepackage[pdftex]{graphicx}
    \usepackage{subfig}
    \usepackage{float}
    \usepackage{verbatim}

    \usepackage{hyperref}
    \hypersetup{
    colorlinks=true,
    pdfauthor={http://learnr.wordpress.com}
    }

    \graphicspath{{./graphs/}}

    \title{World Population Trends}
    \author{\url{http://learnr.wordpress.com}}
    \date{\today}
    \raggedbottom
    \setcounter{tocdepth}{1}

    \begin{document}

    \maketitle

    This report has been compiled based on the United Nations report World Population Prospects: The 2008 Revision (highlights available \href{http://www.un.org/esa/population/publications/wpp2008/wpp2008_highlights.pdf}{here}). The dataset can be accessed \href{http://data.un.org/Data.aspx?d=PopDiv&f=variableID%3a12&c=1,2,4,6,7&s=_crEngNameOrderBy:asc,_timeEngNameOrderBy:desc,_varEngNameOrderBy:asc&v=1}{here}.

    \tableofcontents

    <>=
    library(xtable); library(ggplot2)
    source(‘popreportdata.R’)

    for (i in seq_along(names(popreportdata))) {

    cat(“\\pagebreak\n”)

    i = names(popreportdata)[i]
    reportlist <- popreportdata[match(i,names(popreportdata))][[1]]
    filename <- function(y){paste(gsub(" ", "_", i) , y, ".pdf", sep="")}

    cat("\\section{", i, "}\n", sep="")

    cat("\\begin{figure}[H]\n")
    cat("\\centering\n")
    cat(include_graph(width = 1, filename("_trend")), '\n')
    subfloat_graph(0.33, filename("_hist"), "Histogram")
    subfloat_graph(0.33, filename("_rank"), "Rank Curve")
    subfloat_graph(0.33, filename("_box"), "Boxplot")
    cat("\\caption{Distribution plots}\n")
    cat("\\end{figure}\n")

    cat("\\begin{table}[h]\n")
    cat("\\centering\n")
    subfloat_tbl(0.4, reportlist[[1]], "Top 5 Countries")
    subfloat_tbl(0.4, reportlist[[2]], "Bottom 5 Countries")
    cat("\\caption{Population in 2005}\n")
    cat("\\end{table}\n")

    cat("\\begin{figure}\n")
    cat("\\centering\n")
    subfloat_graph(0.5, filename("_abs_growth"), "Absolute Growth")
    subfloat_graph(0.5, filename("_ann_growth"), "Annual Compound Growth")
    cat("\\caption{Growth charts 2010 – 2050}\n")
    cat("\\end{figure}\n")

    cat("\\begin{table}[H]\n")
    cat("\\centering\n")
    subfloat_tbl(1, reportlist[[3]], "Top 5 Growing Countries")
    cat("\\quad\n")
    subfloat_tbl(1, reportlist[[4]], "Bottom 5 Growing Countries")
    cat("\\caption{Growth tables 2010 – 2050}\n")
    cat("\\end{table}\n")

    }
    @

    \end{document}

    and then used:

    library(tools)
    Sweave("population")
    texi2dvi("population.tex", pdf = TRUE)

  5. Toby Popenfoose permalink
    December 1, 2009 10:49 pm

    “echo=FALSE,results=tex” and enclosing was removed from population.Rnw when I submitted

  6. Jay permalink
    February 10, 2010 11:01 pm

    Hi,

    Have you found anyway to control the font size in graphics when the graphics are scaled like in minipage/subfig as in this example?

    I posted on Stackoverflow and got one answer about how to set font size in tikz:

    http://stackoverflow.com/questions/2237979/how-to-control-font-sizes-in-pgf-tikz-graphics-in-latex

    but I am a still hunting for a solution to the issue with scaling in an environment:

    http://stackoverflow.com/questions/2239328/control-font-size-in-graphics-in-latex-when-scaling-in-minipage-subfig

    If you have any ideas your feedback would be most appreciated.

    Many thanks,

    Jay

    • learnr permalink*
      February 11, 2010 10:16 am

      I haven’t experimented with this.
      The answer to your second question on Stackoverflow gives quite a few examples, and explains the options well.

      • Jay permalink
        February 11, 2010 6:51 pm

        It does, stack overflow is a great resource. It was a sticky issue tikzDevice lets you ge thte font used to match but the size is the most apparent thing to the audiance.

        Thanks for the reply.

  7. June 13, 2011 4:32 pm

    I’m trying to do something similar at the moment. Thanks for this post, it’s invaluable.

  8. September 16, 2011 4:32 am

    Excellent method. Although, I think you should also post a shorter version of the above. As a fairly new user of this batch method, going through the code and understanding it took some time. A simpler example with say, just the .brew file and some R code would have made the learning faster.

    Thanks again for an excellent guide !

  9. richardprichard permalink
    September 17, 2011 1:41 am

    Yes, this is a big dollop of meaty goodness that requires extended digestion.

    A word to fellow noobs – wordpress has eaten one half of the esc characters in the helper functions (perhaps elsewhere, I can’t remember now). Took me half a holiday to work out why I was getting strange latex errors.

    Toby’s post above has the appropriate number so you can use that as a guide.

  10. zach permalink
    October 14, 2011 5:27 am

    hi learnr,

    I was trying to adopt your Brew code but am running into a syntax issue. I posted it onStackExchange at http://stackoverflow.com/questions/7762025/r-and-brewsyntax-issue and I am wondering of you encountered any syntax problems. Do you think these issue may be related to the version of R or the version of Brew?

    thanks!

    • learnr permalink*
      October 18, 2011 11:45 pm

      I see your question was answered on StackOverflow.

  11. kien permalink
    January 4, 2012 11:00 pm

    I try your brew example until
    \begin{figure}[H]
    \centering

    \caption{Distribution plots}
    \end{figure}

    I got the error
    texi2dvi(“test.tex”, pdf = TRUE)
    Warning message:
    running command ‘”C:\PROGRA~1\MIKTEX~1.8\miktex\bin\texi2dvi.exe” –quiet –pdf “test.tex” -I “C:/PROGRA~1/R/R-214~1.1/share/texmf/tex/latex” -I “C:/PROGRA~1/R/R-214~1.1/share/texmf/bibtex/bst”‘ had status 1
    Please help, I use window XP + Miktex

  12. kien permalink
    January 4, 2012 11:21 pm

    Sorry the code shoud be

    \begin{figure}[H]
    \centering

    \caption{Distribution plots}
    \end{figure}

    I got the error
    texi2dvi(“test.tex”, pdf = TRUE)
    Warning message:
    running command ‘”C:\PROGRA~1\MIKTEX~1.8\miktex\bin\texi2dvi.exe” –quiet –pdf “test.tex” -I “C:/PROGRA~1/R/R-214~1.1/share/texmf/tex/latex” -I “C:/PROGRA~1/R/R-214~1.1/share/texmf/bibtex/bst”‘ had status 1
    Please help, I use window XP + Miktex

    • learnr permalink*
      January 19, 2012 12:20 pm

      Do you have Latex installed on your machine?

  13. August 3, 2012 11:41 pm

    Hello learnr!

    First – Thank you very much for your blog!

    Do you have an idea why the code of your brew example create this error message:

    Error in source(“brewtest.r”) : brewtest.r:47:45: unexpected symbol
    46: filename <- function(y) {
    47: paste("graphs\", continent, y, ".pdf
    ^

    ( sign ^ is standing under " before .pdf)

    Tested with R 2.15.0 und R 2.15.1 and your code did I fetched by copy&paste and pagedump.

    Greegings,
    Jo

    • learnr permalink*
      October 3, 2012 2:41 pm

      Hmm, could it be that you need to escape the path?

  14. August 4, 2012 12:11 am

    Hello learnr!

    First – Thank you very much for your blog!

    I need help with an error by testing “brew-creating-repetitive-reports.
    Excuse me, I can’t use emails at the moment.

    Do you have an idea why the code of your brew example create this error message:
    Error in source(“brewtest.r”) : brewtest.r:47:45: unexpected symbol
    46: filename <- function(y) {
    47: paste("graphs\", continent, y, ".pdf
    ^

    ( sign ^ is standing under " before .pdf)

    Tested with R 2.15.0 und R 2.15.1 and your code did I fetched by copy&paste and pagedump.

    Greegings,
    Jo

    In case of help I will try to find interesting sources/tricks for you in futur ;)

  15. January 21, 2013 10:37 am

    What a great post. I will definitely try to use this for my upcoming project. This looks awesome. But why have you not updated your blog for a long time ?

    • learnr permalink*
      June 6, 2013 9:55 am

      The primary reason I guess is that I have run out of ideas.

  16. June 5, 2013 1:04 am

    Thank you for this post it has helped me greatly. I have a question. I am using brew to generate a repetitive report and would like to suppress the caption label. I know you can do it using latex, but I have to generate a thousand or so tables and don’t want to have to do it for each one. Is there a way to suppress the caption label using either xtable or brew?
    Thanks for any help you can gave me.

    • learnr permalink*
      June 6, 2013 9:46 am

      Xtable allows you to specify whether you want caption or not.

Trackbacks

  1. dati demografici : UNdata - associazionerospo.org
  2. #Rstats for Business Intelligence « DECISION STATS
  3. ¡Hola mundo! « L'illa del logaritme

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 178 other followers

%d bloggers like this: