brew: Creating Repetitive Reports

September 9, 2009

United Nations report World Population Prospects: The 2008 Revision (highlights available here) provides data about the historical and forecasted population of the country. In exploring the future and past population trends it is relatively easy to subset the dataset by your selected variable.

> file <- c("UNdata_Population.csv")
> population <- read.csv(file)
> names(population) <- c("code", "country", "year",
+     "variant", "value")
> df <- subset(population, year <= 2005)

Likewise, it is straightforward to produce a 263 page pdf-file that shows the population trend between 1950 – 2005 for all the countries in the dataset.

> pdf("population_growth.pdf", paper = "a4")
> d_ply(df, .(country), function(x) plot(x$year,
+     x$value, type = "l", main = unique(x$country)))
> dev.off()

However, if one would like to create a report showing some summary tables and graphs along with some textual description for all the countries, then the process becomes more complicated. This is exactly what the brew() package written by Jeffrey Horner was designed to help the users with. brew “implements a templating framework for mixing text and R code for report generation” and makes it very easy to generate repetitive reports which is of great help for example in performing exploratory analysis on a large dataset with many variables.

The report-generation can be split into three parts:

Prepare the data.
1. Add Regional Information to Population Data.
2. Generate Graphs and Data to be included in the report
Prepare the report template.
1. Helper Functions
2. Brew template – population.brew
Produce the report.

Data Preparation

Data preparation script is saved in the popreportdata.R file which is later sourced into the brew template.

Add Regional Information to Population Data

In fact I would like to explore the data by continent rather than by country, so in order to do this the first step is to group the countries by continent with the help of Isocodes package that contains standard country or area codes and geographical regions used by UN.

> library(ggplot2)
> library(ISOcodes)
> data("UN_M.49_Countries")
> data("UN_M.49_Regions")
> Regions <- subset(UN_M.49_Regions, Type == "Region")
> regionsdf <- ddply(Regions, .(Code, Name, Parent),
+     function(x) {
+         df <- data.frame(strsplit(x$Children,
+             ", "))
+         names(df) <- "countrycode"
+         df
+     })
> countries <- merge(regionsdf, UN_M.49_Countries,
+     by.x = "countrycode", by.y = "Code")
> countries <- rename(countries, c(Name.x = "region",
+     Name.y = "country", Code = "regioncode", Parent = "parentcode"))
> countries <- merge(countries, Regions[, 1:2],
+     by.x = "parentcode", by.y = "Code")
> countries <- rename(countries, c(Name = "continent"))
> countries$countrycode <- as.numeric(as.character(countries$countrycode))

countries dataframe contains now regional classification for each country. Next step is to merge a subset of this information with the population data.

> population <- merge(population, countries[, c("countrycode",
+     "continent")], by.x = "code", by.y = "countrycode")
> population$value <- population$value/1000

Generate Graphs and Data to be included in the report

In this step the graphs are saved to the disk using ggsave, and a list of lists with four dataframes about each continent is returned.

> popreportdata <- dlply(population, .(continent),
+     function(df) {
+         continent <- gsub(" ", "_", unique(df$continent))
+         filename <- function(y) {
+             paste("graphs\", continent, y, ".pdf",
+                 sep = "")
+         }
+         forecast <- subset(df, variant != "Estimate variant")
+         forecast$variant <- forecast$variant[,
+             drop = TRUE]
+         historic <- subset(df, variant == "Estimate variant")
+         historic <- ddply(historic, .(continent,
+             year), transform, cont_value = sum(value))
+         current <- subset(df, year == 2005)
+         growthrate <- function(df) {
+             rng <- range(df$year)
+             min_value <- df[df$year == rng[1],
+                 "value"]
+             max_value <- df[df$year == rng[2],
+                 "value"]
+             abs_growth <- max_value/min_value
+             yr5_growth <- abs_growth^(1/length(df$year))
+             growthdf <- data.frame(min_value,
+                 max_value, abs_growth, yr5_growth)
+             names(growthdf)[1:2] <- c(rng[1],
+                 rng[2])
+             growthdf
+         }
+         growth <- ddply(forecast, .(continent,
+             country, variant), growthrate)
+         growth$variant <- factor(growth$variant,
+             levels = c("Constant-fertility scenario",
+                 "High variant", "Medium variant",
+                 "Low variant"))
+         growth <- sort_df(growth, vars = c("continent",
+             "variant", "abs_growth"))
+         blabel <- c(0.01, 0.1, 1, 10, 100)
+         alabel <- "Population (in millions)"
+         phist <- ggplot(current, aes(value)) +
+             geom_histogram(binwidth = 0.5, fill = NA,
+                 colour = "black") + scale_x_log10(breaks = blabel,
+             labels = blabel) + labs(x = alabel)
+         ggsave(filename("_hist"), phist, dpi = 100)
+         prank <- ggplot(current, aes(seq_along(country),
+             rev(sort(value)))) + geom_point() +
+             scale_y_log10(breaks = blabel, labels = blabel) +
+             labs(x = "Rank", y = alabel)
+         ggsave(filename("_rank"), prank, dpi = 100)
+         pbox <- ggplot(historic, aes(factor(year),
+             value)) + geom_boxplot() + labs(x = "",
+             y = alabel)
+         ggsave(filename("_box"), pbox, dpi = 100)
+         ptrend <- ggplot(historic, aes(year, value)) +
+             stat_summary(fun.y = "sum", geom = "line",
+                 colour = "red", size = 1) + stat_summary(data = forecast,
+             aes(y = value, group = variant, colour = variant),
+             fun.y = "sum", geom = "line", size = 1) +
+             labs(y = alabel, x = "") + opts(legend.position = c(0.8,
+             0.3), legend.background = theme_blank(),
+             legend.key = theme_blank()) + scale_colour_hue("Forecast")
+         ggsave(filename("_trend"), ptrend, width = 8,
+             height = 4, dpi = 100)
+         pgrowth <- ggplot(growth, aes(variant,
+             abs_growth, colour = variant)) + geom_boxplot() +
+             xlab("") + opts(legend.position = "none")
+         ggsave(filename("_abs_growth"), pgrowth,
+             dpi = 100)
+         pann_growth <- ggplot(growth, aes(variant,
+             yr5_growth, colour = variant)) + geom_boxplot() +
+             xlab("") + opts(legend.position = "none")
+         ggsave(filename("_ann_growth"), pann_growth,
+             dpi = 100)
+         current <- current[with(current, order(-value)),
+             ]
+         top <- head(current, 5)[, c("country",
+             "value")]
+         bottom <- tail(current, 5)[, c("country",
+             "value")]
+         growth <- growth[growth$variant == "Medium variant",
+             c(2, 4:7)]
+         growth <- growth[order(-growth$abs_growth),
+             ]
+         names(growth)[c(4:5)] <- c("Abs.Growth",
+             "Compound Growth")
+         growthtop <- head(growth, 5)
+         growthbottom <- tail(growth, 5)
+         list(top = top, bottom = bottom, growthtop = growthtop,
+             growthbottom = growthbottom)
+     })

Report Template

My report template is essentially a latex document which includes an R code loop wrapped in brew syntax. To make it easier to generate the latex statements for inclusion of graphs and tables, a few helper functions are defined first.

Helper functions

> include_graph <- function(width = 1, filename) {
+     paste("\includegraphics[width=", width, "\linewidth]{",
+         filename, "}", sep = "")
+ }
> include_tbl <- function(width = 1, filename) {
+     print(xtable(filename), table.placement = "",
+         latex.environments = "", include.rownames = FALSE,
+         floating = FALSE)
+ }
> subfloat_graph <- function(width, filename, caption = "") {
+     paste("\subfloat[", caption, "]{", "\begin{minipage}[h]{",
+         width, "\linewidth}\centering", include_graph(width = 1,
+             filename), "\end{minipage}}", sep = "")
+ }
> subfloat_tbl <- function(width, filename, caption) {
+     paste("\subfloat[", caption, "]{", "\begin{minipage}[h]{",
+         width, "\linewidth}\centering", print(xtable(filename),
+             file = stderr(), table.placement = "",
+             latex.environments = "", include.rownames = FALSE,
+             floating = FALSE), "\end{minipage}}",
+         sep = "")
+ }

`Brew` template – `population.brew`

Brew syntax is really very simple to use. From the help file:

All text that falls outside of the delimiters is printed as-is.
R expressions between the <% and %> delimiters are executed in-place.
The value of the R expression between the <%= and %> delimiters is printed

\documentclass[oneside]{article}

\usepackage[margin=2cm,nohead]{geometry}
\usepackage[pdftex]{graphicx}
\usepackage{subfig}
\usepackage{float}
\usepackage{verbatim}

\usepackage{hyperref}
\hypersetup{
  colorlinks=true,
  pdfauthor={https://learnr.wordpress.com}
  }

\graphicspath{{./graphs/}}

\title{World Population Trends}
\author{\url{https://learnr.wordpress.com}}
\date{\today}
\raggedbottom
\setcounter{tocdepth}{1}

\begin{document}

\maketitle

This report has been compiled based on the United Nations report World Population Prospects: The 2008 Revision (highlights available \href{http://www.un.org/esa/population/publications/wpp2008/wpp2008_highlights.pdf}{here}). The dataset can be accessed \href{http://data.un.org/Data.aspx?d=PopDiv&f=variableID%3a12&c=1,2,4,6,7&s=_crEngNameOrderBy:asc,_timeEngNameOrderBy:desc,_varEngNameOrderBy:asc&v=1}{here}.

\tableofcontents

<% library(xtable); library(ggplot2)%>

<% for (i in seq_along(names(popreportdata))) { -%>

\pagebreak

<% i = names(popreportdata)[i] %>
<% reportlist <- popreportdata[match(i,names(popreportdata))][[1]] %>
<% filename <- function(y){paste(gsub(" ", "_", i) , y, ".pdf", sep="")} %>

<%=cat("\section{", i, "}", sep="") %>

\begin{figure}[H]
  \centering
  <%= include_graph(width = 1, filename("_trend")) %>
  <%= subfloat_graph(0.33, filename("_hist"), "Histogram") %>
  <%= subfloat_graph(0.33, filename("_rank"), "Rank Curve") %>
  <%= subfloat_graph(0.33, filename("_box"), "Boxplot") %>
  \caption{Distribution plots}
\end{figure}

\begin{table}[h]
  \centering
  <%= subfloat_tbl(0.4, reportlist[[1]], "Top 5 Countries") %>
  <%= subfloat_tbl(0.4, reportlist[[2]], "Bottom 5 Countries") %>
  \caption{Population in 2005}
\end{table}

\begin{figure}
  \centering
  <%= subfloat_graph(0.5, filename("_abs_growth"), "Absolute Growth") %>
  <%= subfloat_graph(0.5, filename("_ann_growth"), "Annual Compound Growth") %>
  \caption{Growth charts 2010 - 2050}
\end{figure}

\begin{table}[H]
  \centering
  <%= subfloat_tbl(1, reportlist[[3]], "Top 5 Growing Countries") %>
  \quad
  <%= subfloat_tbl(1, reportlist[[4]], "Bottom 5 Growing Countries") %>
  \caption{Growth tables 2010 - 2050}
\end{table}

<% } -%>

\end{document}

Produce the report

> library(tools)
> library(brew)
> brew("population.brew", "population.tex")
> texi2dvi("population.tex", pdf = TRUE)

The resulting pdf-file can be seen here

36 Comments leave one →

randomjohn permalink

September 8, 2009 5:42 pm

What are the differences/advantages over weave?

Reply
- learnr permalink*
  
  September 8, 2009 6:09 pm
  
  If there are no repetitive sections with multiple elements in the report, Sweave does the job as well as brew does.
  
  The main advantage of brew is the option of looping over code chunks. I was not able to achieve the same functionality in Sweave.
  
  In the example above I decided to have 2 separate files, instead I could have easily had just one .brew file including both the code to generate and display the report.
  
  Reply
- Abhijit Dasgupta permalink
  
  October 30, 2010 6:41 am
  
  Another advantage of brew is that it is not married to LaTeX, unlike Sweave. I have used it to good effect with the ascii package to weave R with emacs org-mode files and reStructured Text (pydocs) files. Of course, these paths aren’t quite as powerful as LaTeX, but for quick reports which are reasonably easy to write and edit, it’s a pretty good solution.
  
  Reply
  - nanounanue permalink
    
    October 25, 2011 1:13 am
    
    Hi!
    
    Could you provide an example of using brew and emacs org-mode?
    
    Thanks in advance
  - learnr permalink*
    
    November 17, 2011 12:42 am
    
    Unfortunately I am not an emacs user myself, so would struggle to fulfill your request.
  - nanounanue permalink
    
    November 17, 2011 8:51 pm
    
    Thank you anyway 🙂
    
    Yous site is a real gem!
    
    Keep it in that way
r'ish permalink

September 8, 2009 10:34 pm

You really learn r fast. Thanks for this great blog!

Reply
Ralph permalink

November 25, 2009 1:15 am

Good article highlighting writing functions to automate repetitive tasks in R – a bit of planning at the start of an analysis can save a lot of work later if you build in flexibility rather than having to copy and paste and keep changing small parts of the code.

Reply
Toby Popenfoose permalink

December 1, 2009 10:43 pm

Excellent Site. Thank you for the ggplot2 examples!!!!

I made the following changes to popreportdata.R (helper functions only) and population.Rnw to get Sweave to work with this population example:

# To make it easier to generate the latex statements for # inclusion of graphs and tables, a few helper functions are defined first. include_graph <- function(width = 1, filename) { paste("\\includegraphics[width=", width, "\\linewidth]{", filename, "}", sep = "") } include_tbl <- function(width = 1, filename) { print(xtable(filename), table.placement = "", latex.environments = "", include.rownames = FALSE, floating = FALSE) } subfloat_graph <- function(width, filename, caption = "") { cat(paste("\\subfloat[", caption, "]{", "\\begin{minipage}[h]{", width, "\\linewidth}\\centering", include_graph(width = 1, filename), "\\end{minipage}}\n", sep = "")) } subfloat_tbl <- function(width, filename, caption) { cat(paste("\\subfloat[", caption, "]{", "\\begin{minipage}[h]{", width, "\\linewidth}\\centering", print(xtable(filename), file = stderr(), table.placement = "", latex.environments = "", include.rownames = FALSE, floating = FALSE), "\\end{minipage}}\n", sep = "")) }

\documentclass[oneside]{article}

\usepackage[margin=2cm,nohead]{geometry}
\usepackage[pdftex]{graphicx}
\usepackage{subfig}
\usepackage{float}
\usepackage{verbatim}

\usepackage{hyperref}
\hypersetup{
colorlinks=true,
pdfauthor={https://learnr.wordpress.com}
}

\graphicspath{{./graphs/}}

\title{World Population Trends}
\author{\url{https://learnr.wordpress.com}}
\date{\today}
\raggedbottom
\setcounter{tocdepth}{1}

\begin{document}

\maketitle

This report has been compiled based on the United Nations report World Population Prospects: The 2008 Revision (highlights available \href{http://www.un.org/esa/population/publications/wpp2008/wpp2008_highlights.pdf}{here}). The dataset can be accessed \href{http://data.un.org/Data.aspx?d=PopDiv&f=variableID%3a12&c=1,2,4,6,7&s=_crEngNameOrderBy:asc,_timeEngNameOrderBy:desc,_varEngNameOrderBy:asc&v=1}{here}.

\tableofcontents

<>=
library(xtable); library(ggplot2)
source(‘popreportdata.R’)

for (i in seq_along(names(popreportdata))) {

cat(“\\pagebreak\n”)

i = names(popreportdata)[i]
reportlist <- popreportdata[match(i,names(popreportdata))][[1]]
filename <- function(y){paste(gsub(" ", "_", i) , y, ".pdf", sep="")}

cat("\\section{", i, "}\n", sep="")

cat("\\begin{figure}[H]\n")
cat("\\centering\n")
cat(include_graph(width = 1, filename("_trend")), '\n')
subfloat_graph(0.33, filename("_hist"), "Histogram")
subfloat_graph(0.33, filename("_rank"), "Rank Curve")
subfloat_graph(0.33, filename("_box"), "Boxplot")
cat("\\caption{Distribution plots}\n")
cat("\\end{figure}\n")

cat("\\begin{table}[h]\n")
cat("\\centering\n")
subfloat_tbl(0.4, reportlist[[1]], "Top 5 Countries")
subfloat_tbl(0.4, reportlist[[2]], "Bottom 5 Countries")
cat("\\caption{Population in 2005}\n")
cat("\\end{table}\n")

cat("\\begin{figure}\n")
cat("\\centering\n")
subfloat_graph(0.5, filename("_abs_growth"), "Absolute Growth")
subfloat_graph(0.5, filename("_ann_growth"), "Annual Compound Growth")
cat("\\caption{Growth charts 2010 – 2050}\n")
cat("\\end{figure}\n")

cat("\\begin{table}[H]\n")
cat("\\centering\n")
subfloat_tbl(1, reportlist[[3]], "Top 5 Growing Countries")
cat("\\quad\n")
subfloat_tbl(1, reportlist[[4]], "Bottom 5 Growing Countries")
cat("\\caption{Growth tables 2010 – 2050}\n")
cat("\\end{table}\n")

}
@

\end{document}

and then used:

library(tools)
Sweave("population")
texi2dvi("population.tex", pdf = TRUE)

Reply
Toby Popenfoose permalink

December 1, 2009 10:49 pm

“echo=FALSE,results=tex” and enclosing was removed from population.Rnw when I submitted

Reply
Jay permalink

February 10, 2010 11:01 pm

Hi,

Have you found anyway to control the font size in graphics when the graphics are scaled like in minipage/subfig as in this example?

I posted on Stackoverflow and got one answer about how to set font size in tikz:
http://stackoverflow.com/questions/2237979/how-to-control-font-sizes-in-pgf-tikz-graphics-in-latex

but I am a still hunting for a solution to the issue with scaling in an environment:
http://stackoverflow.com/questions/2239328/control-font-size-in-graphics-in-latex-when-scaling-in-minipage-subfig

If you have any ideas your feedback would be most appreciated.

Many thanks,

Jay

Reply
- learnr permalink*
  
  February 11, 2010 10:16 am
  
  I haven’t experimented with this.
  The answer to your second question on Stackoverflow gives quite a few examples, and explains the options well.
  
  Reply
  - Jay permalink
    
    February 11, 2010 6:51 pm
    
    It does, stack overflow is a great resource. It was a sticky issue tikzDevice lets you ge thte font used to match but the size is the most apparent thing to the audiance.
    
    Thanks for the reply.
Chris Beeley permalink

June 13, 2011 4:32 pm

I’m trying to do something similar at the moment. Thanks for this post, it’s invaluable.

Reply
Nataraj Dasgupta permalink

September 16, 2011 4:32 am

Excellent method. Although, I think you should also post a shorter version of the above. As a fairly new user of this batch method, going through the code and understanding it took some time. A simpler example with say, just the .brew file and some R code would have made the learning faster.

Thanks again for an excellent guide !

Reply
richardprichard permalink

September 17, 2011 1:41 am

Yes, this is a big dollop of meaty goodness that requires extended digestion.

A word to fellow noobs – wordpress has eaten one half of the esc characters in the helper functions (perhaps elsewhere, I can’t remember now). Took me half a holiday to work out why I was getting strange latex errors.

Toby’s post above has the appropriate number so you can use that as a guide.

Reply
zach permalink

October 14, 2011 5:27 am

hi learnr,

I was trying to adopt your Brew code but am running into a syntax issue. I posted it onStackExchange at http://stackoverflow.com/questions/7762025/r-and-brewsyntax-issue and I am wondering of you encountered any syntax problems. Do you think these issue may be related to the version of R or the version of Brew?

thanks!

Reply
- learnr permalink*
  
  October 18, 2011 11:45 pm
  
  I see your question was answered on StackOverflow.
  
  Reply
kien permalink

January 4, 2012 11:00 pm

I try your brew example until
\begin{figure}[H]
\centering

\caption{Distribution plots}
\end{figure}

I got the error
texi2dvi(“test.tex”, pdf = TRUE)
Warning message:
running command ‘”C:\PROGRA~1\MIKTEX~1.8\miktex\bin\texi2dvi.exe” –quiet –pdf “test.tex” -I “C:/PROGRA~1/R/R-214~1.1/share/texmf/tex/latex” -I “C:/PROGRA~1/R/R-214~1.1/share/texmf/bibtex/bst”‘ had status 1
Please help, I use window XP + Miktex

Reply
kien permalink

January 4, 2012 11:21 pm

Sorry the code shoud be

\begin{figure}[H]
\centering

\caption{Distribution plots}
\end{figure}

I got the error
texi2dvi(“test.tex”, pdf = TRUE)
Warning message:
running command ‘”C:\PROGRA~1\MIKTEX~1.8\miktex\bin\texi2dvi.exe” –quiet –pdf “test.tex” -I “C:/PROGRA~1/R/R-214~1.1/share/texmf/tex/latex” -I “C:/PROGRA~1/R/R-214~1.1/share/texmf/bibtex/bst”‘ had status 1
Please help, I use window XP + Miktex

Reply
- learnr permalink*
  
  January 19, 2012 12:20 pm
  
  Do you have Latex installed on your machine?
  
  Reply
a permalink

August 3, 2012 11:41 pm

Hello learnr!

First – Thank you very much for your blog!

Do you have an idea why the code of your brew example create this error message:

Error in source(“brewtest.r”) : brewtest.r:47:45: unexpected symbol
46: filename <- function(y) {
47: paste("graphs\", continent, y, ".pdf
^

( sign ^ is standing under " before .pdf)

Tested with R 2.15.0 und R 2.15.1 and your code did I fetched by copy&paste and pagedump.

Greegings,
Jo

Reply
- learnr permalink*
  
  October 3, 2012 2:41 pm
  
  Hmm, could it be that you need to escape the path?
  
  Reply
jo permalink

August 4, 2012 12:11 am

Hello learnr!

First – Thank you very much for your blog!

I need help with an error by testing “brew-creating-repetitive-reports.
Excuse me, I can’t use emails at the moment.

Do you have an idea why the code of your brew example create this error message:
Error in source(“brewtest.r”) : brewtest.r:47:45: unexpected symbol
46: filename <- function(y) {
47: paste("graphs\", continent, y, ".pdf
^

( sign ^ is standing under " before .pdf)

Tested with R 2.15.0 und R 2.15.1 and your code did I fetched by copy&paste and pagedump.

Greegings,
Jo

In case of help I will try to find interesting sources/tricks for you in futur 😉

Reply
Jdbaba permalink

January 21, 2013 10:37 am

What a great post. I will definitely try to use this for my upcoming project. This looks awesome. But why have you not updated your blog for a long time ?

Reply
- learnr permalink*
  
  June 6, 2013 9:55 am
  
  The primary reason I guess is that I have run out of ideas.
  
  Reply
robertdcarlisle permalink

June 5, 2013 1:04 am

Thank you for this post it has helped me greatly. I have a question. I am using brew to generate a repetitive report and would like to suppress the caption label. I know you can do it using latex, but I have to generate a thousand or so tables and don’t want to have to do it for each one. Is there a way to suppress the caption label using either xtable or brew?
Thanks for any help you can gave me.

Reply
- learnr permalink*
  
  June 6, 2013 9:46 am
  
  Xtable allows you to specify whether you want caption or not.
  
  Reply
xero permalink

December 7, 2015 8:15 pm

Hi, I’m trying to implement your example to learn something on automated report generation. Could you place somewhere the .csv file you are using? I downloaded some files from http://esa.un.org/unpd/wpp/DVD/ but none matches your format and I receive error due to incompatible vector dimensions. Thanks

Reply
- learnr permalink*
  
  November 14, 2016 4:17 am
  
  Try the link in the beginning of the article.
  
  Reply

	S.Stender on ggplot2: Labelling Data Series…
	ggplot séparer la lé… on ggplot2: Two Or More Plots Sha…
	9 Useful R Data Visu… on ggplot2 Version of Figures in…
	Mandar on Data Manipulation in R to Crea…
	Mandar on Data Manipulation in R to Crea…

Finding my way around R

brew: Creating Repetitive Reports

Data Preparation

Add Regional Information to Population Data

Generate Graphs and Data to be included in the report

Report Template

Helper functions

`Brew` template – `population.brew`

Produce the report

Trackbacks

Leave a comment Cancel reply

Subscribe

Recent Posts

Recent Comments

Archives

Finding my way around R

brew: Creating Repetitive Reports

Data Preparation

Add Regional Information to Population Data

Generate Graphs and Data to be included in the report

Report Template

Helper functions

Brew template – population.brew

Produce the report

Share this:

Related

Trackbacks

Leave a comment Cancel reply

Subscribe

Recent Posts

Recent Comments

Archives

`Brew` template – `population.brew`