brew: Creating Repetitive Reports
United Nations report World Population Prospects: The 2008 Revision (highlights available here) provides data about the historical and forecasted population of the country. In exploring the future and past population trends it is relatively easy to subset the dataset by your selected variable.
> file <- c("UNdata_Population.csv") > population <- read.csv(file) > names(population) <- c("code", "country", "year", + "variant", "value") > df <- subset(population, year <= 2005) |
Likewise, it is straightforward to produce a 263 page pdf-file that shows the population trend between 1950 – 2005 for all the countries in the dataset.
> pdf("population_growth.pdf", paper = "a4") > d_ply(df, .(country), function(x) plot(x$year, + x$value, type = "l", main = unique(x$country))) > dev.off() |
However, if one would like to create a report showing some summary tables and graphs along with some textual description for all the countries, then the process becomes more complicated. This is exactly what the brew() package written by Jeffrey Horner was designed to help the users with. brew “implements a templating framework for mixing text and R code for report generation” and makes it very easy to generate repetitive reports which is of great help for example in performing exploratory analysis on a large dataset with many variables.
The report-generation can be split into three parts:
Data Preparation
Data preparation script is saved in the popreportdata.R file which is later sourced into the brew template.
Add Regional Information to Population Data
In fact I would like to explore the data by continent rather than by country, so in order to do this the first step is to group the countries by continent with the help of Isocodes package that contains standard country or area codes and geographical regions used by UN.
> library(ggplot2) > library(ISOcodes) > data("UN_M.49_Countries") > data("UN_M.49_Regions") > Regions <- subset(UN_M.49_Regions, Type == "Region") > regionsdf <- ddply(Regions, .(Code, Name, Parent), + function(x) { + df <- data.frame(strsplit(x$Children, + ", ")) + names(df) <- "countrycode" + df + }) > countries <- merge(regionsdf, UN_M.49_Countries, + by.x = "countrycode", by.y = "Code") > countries <- rename(countries, c(Name.x = "region", + Name.y = "country", Code = "regioncode", Parent = "parentcode")) > countries <- merge(countries, Regions[, 1:2], + by.x = "parentcode", by.y = "Code") > countries <- rename(countries, c(Name = "continent")) > countries$countrycode <- as.numeric(as.character(countries$countrycode)) |
countries dataframe contains now regional classification for each country. Next step is to merge a subset of this information with the population data.
> population <- merge(population, countries[, c("countrycode", + "continent")], by.x = "code", by.y = "countrycode") > population$value <- population$value/1000 |
Generate Graphs and Data to be included in the report
In this step the graphs are saved to the disk using ggsave, and a list of lists with four dataframes about each continent is returned.
> popreportdata <- dlply(population, .(continent), + function(df) { + continent <- gsub(" ", "_", unique(df$continent)) + filename <- function(y) { + paste("graphs\", continent, y, ".pdf", + sep = "") + } + forecast <- subset(df, variant != "Estimate variant") + forecast$variant <- forecast$variant[, + drop = TRUE] + historic <- subset(df, variant == "Estimate variant") + historic <- ddply(historic, .(continent, + year), transform, cont_value = sum(value)) + current <- subset(df, year == 2005) + growthrate <- function(df) { + rng <- range(df$year) + min_value <- df[df$year == rng[1], + "value"] + max_value <- df[df$year == rng[2], + "value"] + abs_growth <- max_value/min_value + yr5_growth <- abs_growth^(1/length(df$year)) + growthdf <- data.frame(min_value, + max_value, abs_growth, yr5_growth) + names(growthdf)[1:2] <- c(rng[1], + rng[2]) + growthdf + } + growth <- ddply(forecast, .(continent, + country, variant), growthrate) + growth$variant <- factor(growth$variant, + levels = c("Constant-fertility scenario", + "High variant", "Medium variant", + "Low variant")) + growth <- sort_df(growth, vars = c("continent", + "variant", "abs_growth")) + blabel <- c(0.01, 0.1, 1, 10, 100) + alabel <- "Population (in millions)" + phist <- ggplot(current, aes(value)) + + geom_histogram(binwidth = 0.5, fill = NA, + colour = "black") + scale_x_log10(breaks = blabel, + labels = blabel) + labs(x = alabel) + ggsave(filename("_hist"), phist, dpi = 100) + prank <- ggplot(current, aes(seq_along(country), + rev(sort(value)))) + geom_point() + + scale_y_log10(breaks = blabel, labels = blabel) + + labs(x = "Rank", y = alabel) + ggsave(filename("_rank"), prank, dpi = 100) + pbox <- ggplot(historic, aes(factor(year), + value)) + geom_boxplot() + labs(x = "", + y = alabel) + ggsave(filename("_box"), pbox, dpi = 100) + ptrend <- ggplot(historic, aes(year, value)) + + stat_summary(fun.y = "sum", geom = "line", + colour = "red", size = 1) + stat_summary(data = forecast, + aes(y = value, group = variant, colour = variant), + fun.y = "sum", geom = "line", size = 1) + + labs(y = alabel, x = "") + opts(legend.position = c(0.8, + 0.3), legend.background = theme_blank(), + legend.key = theme_blank()) + scale_colour_hue("Forecast") + ggsave(filename("_trend"), ptrend, width = 8, + height = 4, dpi = 100) + pgrowth <- ggplot(growth, aes(variant, + abs_growth, colour = variant)) + geom_boxplot() + + xlab("") + opts(legend.position = "none") + ggsave(filename("_abs_growth"), pgrowth, + dpi = 100) + pann_growth <- ggplot(growth, aes(variant, + yr5_growth, colour = variant)) + geom_boxplot() + + xlab("") + opts(legend.position = "none") + ggsave(filename("_ann_growth"), pann_growth, + dpi = 100) + current <- current[with(current, order(-value)), + ] + top <- head(current, 5)[, c("country", + "value")] + bottom <- tail(current, 5)[, c("country", + "value")] + growth <- growth[growth$variant == "Medium variant", + c(2, 4:7)] + growth <- growth[order(-growth$abs_growth), + ] + names(growth)[c(4:5)] <- c("Abs.Growth", + "Compound Growth") + growthtop <- head(growth, 5) + growthbottom <- tail(growth, 5) + list(top = top, bottom = bottom, growthtop = growthtop, + growthbottom = growthbottom) + }) |
Report Template
My report template is essentially a latex document which includes an R code loop wrapped in brew syntax. To make it easier to generate the latex statements for inclusion of graphs and tables, a few helper functions are defined first.
Helper functions
> include_graph <- function(width = 1, filename) { + paste("\includegraphics[width=", width, "\linewidth]{", + filename, "}", sep = "") + } > include_tbl <- function(width = 1, filename) { + print(xtable(filename), table.placement = "", + latex.environments = "", include.rownames = FALSE, + floating = FALSE) + } > subfloat_graph <- function(width, filename, caption = "") { + paste("\subfloat[", caption, "]{", "\begin{minipage}[h]{", + width, "\linewidth}\centering", include_graph(width = 1, + filename), "\end{minipage}}", sep = "") + } > subfloat_tbl <- function(width, filename, caption) { + paste("\subfloat[", caption, "]{", "\begin{minipage}[h]{", + width, "\linewidth}\centering", print(xtable(filename), + file = stderr(), table.placement = "", + latex.environments = "", include.rownames = FALSE, + floating = FALSE), "\end{minipage}}", + sep = "") + } |
Brew template – population.brew
Brew syntax is really very simple to use. From the help file:
- All text that falls outside of the delimiters is printed as-is.
- R expressions between the <% and %> delimiters are executed in-place.
- The value of the R expression between the <%= and %> delimiters is printed
\documentclass[oneside]{article} \usepackage[margin=2cm,nohead]{geometry} \usepackage[pdftex]{graphicx} \usepackage{subfig} \usepackage{float} \usepackage{verbatim} \usepackage{hyperref} \hypersetup{ colorlinks=true, pdfauthor={https://learnr.wordpress.com} } \graphicspath{{./graphs/}} \title{World Population Trends} \author{\url{https://learnr.wordpress.com}} \date{\today} \raggedbottom \setcounter{tocdepth}{1} \begin{document} \maketitle This report has been compiled based on the United Nations report World Population Prospects: The 2008 Revision (highlights available \href{http://www.un.org/esa/population/publications/wpp2008/wpp2008_highlights.pdf}{here}). The dataset can be accessed \href{http://data.un.org/Data.aspx?d=PopDiv&f=variableID%3a12&c=1,2,4,6,7&s=_crEngNameOrderBy:asc,_timeEngNameOrderBy:desc,_varEngNameOrderBy:asc&v=1}{here}. \tableofcontents <% library(xtable); library(ggplot2)%> <% for (i in seq_along(names(popreportdata))) { -%> \pagebreak <% i = names(popreportdata)[i] %> <% reportlist <- popreportdata[match(i,names(popreportdata))][[1]] %> <% filename <- function(y){paste(gsub(" ", "_", i) , y, ".pdf", sep="")} %> <%=cat("\section{", i, "}", sep="") %> \begin{figure}[H] \centering <%= include_graph(width = 1, filename("_trend")) %> <%= subfloat_graph(0.33, filename("_hist"), "Histogram") %> <%= subfloat_graph(0.33, filename("_rank"), "Rank Curve") %> <%= subfloat_graph(0.33, filename("_box"), "Boxplot") %> \caption{Distribution plots} \end{figure} \begin{table}[h] \centering <%= subfloat_tbl(0.4, reportlist[[1]], "Top 5 Countries") %> <%= subfloat_tbl(0.4, reportlist[[2]], "Bottom 5 Countries") %> \caption{Population in 2005} \end{table} \begin{figure} \centering <%= subfloat_graph(0.5, filename("_abs_growth"), "Absolute Growth") %> <%= subfloat_graph(0.5, filename("_ann_growth"), "Annual Compound Growth") %> \caption{Growth charts 2010 - 2050} \end{figure} \begin{table}[H] \centering <%= subfloat_tbl(1, reportlist[[3]], "Top 5 Growing Countries") %> \quad <%= subfloat_tbl(1, reportlist[[4]], "Bottom 5 Growing Countries") %> \caption{Growth tables 2010 - 2050} \end{table} <% } -%> \end{document} |
Produce the report
> library(tools) > library(brew) > brew("population.brew", "population.tex") > texi2dvi("population.tex", pdf = TRUE) |
The resulting pdf-file can be seen here
Trackbacks
- dati demografici : UNdata - associazionerospo.org
- #Rstats for Business Intelligence « DECISION STATS
- ¡Hola mundo! « L'illa del logaritme
- Is it Tableau vs. R? - nandeshwar.info
- How to: Workflow for statistical analysis and report writing | SevenNet
- Workflow for statistical analysis and report writing – CodingBlog
What are the differences/advantages over weave?
If there are no repetitive sections with multiple elements in the report, Sweave does the job as well as brew does.
The main advantage of brew is the option of looping over code chunks. I was not able to achieve the same functionality in Sweave.
In the example above I decided to have 2 separate files, instead I could have easily had just one .brew file including both the code to generate and display the report.
Another advantage of brew is that it is not married to LaTeX, unlike Sweave. I have used it to good effect with the ascii package to weave R with emacs org-mode files and reStructured Text (pydocs) files. Of course, these paths aren’t quite as powerful as LaTeX, but for quick reports which are reasonably easy to write and edit, it’s a pretty good solution.
Hi!
Could you provide an example of using brew and emacs org-mode?
Thanks in advance
Unfortunately I am not an emacs user myself, so would struggle to fulfill your request.
Thank you anyway 🙂
Yous site is a real gem!
Keep it in that way
You really learn r fast. Thanks for this great blog!
Good article highlighting writing functions to automate repetitive tasks in R – a bit of planning at the start of an analysis can save a lot of work later if you build in flexibility rather than having to copy and paste and keep changing small parts of the code.
Excellent Site. Thank you for the ggplot2 examples!!!!
I made the following changes to popreportdata.R (helper functions only) and population.Rnw to get Sweave to work with this population example:
# To make it easier to generate the latex statements for
# inclusion of graphs and tables, a few helper functions are defined first.
include_graph <- function(width = 1, filename) {
paste("\\includegraphics[width=", width, "\\linewidth]{", filename, "}", sep = "")
}
include_tbl <- function(width = 1, filename) {
print(xtable(filename), table.placement = "", latex.environments = "", include.rownames = FALSE, floating = FALSE)
}
subfloat_graph <- function(width, filename, caption = "") {
cat(paste("\\subfloat[", caption, "]{", "\\begin{minipage}[h]{",
width, "\\linewidth}\\centering", include_graph(width = 1, filename), "\\end{minipage}}\n", sep = ""))
}
subfloat_tbl <- function(width, filename, caption) {
cat(paste("\\subfloat[", caption, "]{", "\\begin{minipage}[h]{",
width, "\\linewidth}\\centering", print(xtable(filename),
file = stderr(), table.placement = "",
latex.environments = "", include.rownames = FALSE,
floating = FALSE), "\\end{minipage}}\n", sep = ""))
}
\documentclass[oneside]{article}
\usepackage[margin=2cm,nohead]{geometry}
\usepackage[pdftex]{graphicx}
\usepackage{subfig}
\usepackage{float}
\usepackage{verbatim}
\usepackage{hyperref}
\hypersetup{
colorlinks=true,
pdfauthor={https://learnr.wordpress.com}
}
\graphicspath{{./graphs/}}
\title{World Population Trends}
\author{\url{https://learnr.wordpress.com}}
\date{\today}
\raggedbottom
\setcounter{tocdepth}{1}
\begin{document}
\maketitle
This report has been compiled based on the United Nations report World Population Prospects: The 2008 Revision (highlights available \href{http://www.un.org/esa/population/publications/wpp2008/wpp2008_highlights.pdf}{here}). The dataset can be accessed \href{http://data.un.org/Data.aspx?d=PopDiv&f=variableID%3a12&c=1,2,4,6,7&s=_crEngNameOrderBy:asc,_timeEngNameOrderBy:desc,_varEngNameOrderBy:asc&v=1}{here}.
\tableofcontents
<>=
library(xtable); library(ggplot2)
source(‘popreportdata.R’)
for (i in seq_along(names(popreportdata))) {
cat(“\\pagebreak\n”)
i = names(popreportdata)[i]
reportlist <- popreportdata[match(i,names(popreportdata))][[1]]
filename <- function(y){paste(gsub(" ", "_", i) , y, ".pdf", sep="")}
cat("\\section{", i, "}\n", sep="")
cat("\\begin{figure}[H]\n")
cat("\\centering\n")
cat(include_graph(width = 1, filename("_trend")), '\n')
subfloat_graph(0.33, filename("_hist"), "Histogram")
subfloat_graph(0.33, filename("_rank"), "Rank Curve")
subfloat_graph(0.33, filename("_box"), "Boxplot")
cat("\\caption{Distribution plots}\n")
cat("\\end{figure}\n")
cat("\\begin{table}[h]\n")
cat("\\centering\n")
subfloat_tbl(0.4, reportlist[[1]], "Top 5 Countries")
subfloat_tbl(0.4, reportlist[[2]], "Bottom 5 Countries")
cat("\\caption{Population in 2005}\n")
cat("\\end{table}\n")
cat("\\begin{figure}\n")
cat("\\centering\n")
subfloat_graph(0.5, filename("_abs_growth"), "Absolute Growth")
subfloat_graph(0.5, filename("_ann_growth"), "Annual Compound Growth")
cat("\\caption{Growth charts 2010 – 2050}\n")
cat("\\end{figure}\n")
cat("\\begin{table}[H]\n")
cat("\\centering\n")
subfloat_tbl(1, reportlist[[3]], "Top 5 Growing Countries")
cat("\\quad\n")
subfloat_tbl(1, reportlist[[4]], "Bottom 5 Growing Countries")
cat("\\caption{Growth tables 2010 – 2050}\n")
cat("\\end{table}\n")
}
@
\end{document}
and then used:
library(tools)
Sweave("population")
texi2dvi("population.tex", pdf = TRUE)
“echo=FALSE,results=tex” and enclosing was removed from population.Rnw when I submitted
Hi,
Have you found anyway to control the font size in graphics when the graphics are scaled like in minipage/subfig as in this example?
I posted on Stackoverflow and got one answer about how to set font size in tikz:
http://stackoverflow.com/questions/2237979/how-to-control-font-sizes-in-pgf-tikz-graphics-in-latex
but I am a still hunting for a solution to the issue with scaling in an environment:
http://stackoverflow.com/questions/2239328/control-font-size-in-graphics-in-latex-when-scaling-in-minipage-subfig
If you have any ideas your feedback would be most appreciated.
Many thanks,
Jay
I haven’t experimented with this.
The answer to your second question on Stackoverflow gives quite a few examples, and explains the options well.
It does, stack overflow is a great resource. It was a sticky issue tikzDevice lets you ge thte font used to match but the size is the most apparent thing to the audiance.
Thanks for the reply.
I’m trying to do something similar at the moment. Thanks for this post, it’s invaluable.
Excellent method. Although, I think you should also post a shorter version of the above. As a fairly new user of this batch method, going through the code and understanding it took some time. A simpler example with say, just the .brew file and some R code would have made the learning faster.
Thanks again for an excellent guide !
Yes, this is a big dollop of meaty goodness that requires extended digestion.
A word to fellow noobs – wordpress has eaten one half of the esc characters in the helper functions (perhaps elsewhere, I can’t remember now). Took me half a holiday to work out why I was getting strange latex errors.
Toby’s post above has the appropriate number so you can use that as a guide.
hi learnr,
I was trying to adopt your Brew code but am running into a syntax issue. I posted it onStackExchange at http://stackoverflow.com/questions/7762025/r-and-brewsyntax-issue and I am wondering of you encountered any syntax problems. Do you think these issue may be related to the version of R or the version of Brew?
thanks!
I see your question was answered on StackOverflow.
I try your brew example until
\begin{figure}[H]
\centering
\caption{Distribution plots}
\end{figure}
I got the error
texi2dvi(“test.tex”, pdf = TRUE)
Warning message:
running command ‘”C:\PROGRA~1\MIKTEX~1.8\miktex\bin\texi2dvi.exe” –quiet –pdf “test.tex” -I “C:/PROGRA~1/R/R-214~1.1/share/texmf/tex/latex” -I “C:/PROGRA~1/R/R-214~1.1/share/texmf/bibtex/bst”‘ had status 1
Please help, I use window XP + Miktex
Sorry the code shoud be
\begin{figure}[H]
\centering
\caption{Distribution plots}
\end{figure}
I got the error
texi2dvi(“test.tex”, pdf = TRUE)
Warning message:
running command ‘”C:\PROGRA~1\MIKTEX~1.8\miktex\bin\texi2dvi.exe” –quiet –pdf “test.tex” -I “C:/PROGRA~1/R/R-214~1.1/share/texmf/tex/latex” -I “C:/PROGRA~1/R/R-214~1.1/share/texmf/bibtex/bst”‘ had status 1
Please help, I use window XP + Miktex
Do you have Latex installed on your machine?
Hello learnr!
First – Thank you very much for your blog!
Do you have an idea why the code of your brew example create this error message:
Error in source(“brewtest.r”) : brewtest.r:47:45: unexpected symbol
46: filename <- function(y) {
47: paste("graphs\", continent, y, ".pdf
^
( sign ^ is standing under " before .pdf)
Tested with R 2.15.0 und R 2.15.1 and your code did I fetched by copy&paste and pagedump.
Greegings,
Jo
Hmm, could it be that you need to escape the path?
Hello learnr!
First – Thank you very much for your blog!
I need help with an error by testing “brew-creating-repetitive-reports.
Excuse me, I can’t use emails at the moment.
Do you have an idea why the code of your brew example create this error message:
Error in source(“brewtest.r”) : brewtest.r:47:45: unexpected symbol
46: filename <- function(y) {
47: paste("graphs\", continent, y, ".pdf
^
( sign ^ is standing under " before .pdf)
Tested with R 2.15.0 und R 2.15.1 and your code did I fetched by copy&paste and pagedump.
Greegings,
Jo
In case of help I will try to find interesting sources/tricks for you in futur 😉
What a great post. I will definitely try to use this for my upcoming project. This looks awesome. But why have you not updated your blog for a long time ?
The primary reason I guess is that I have run out of ideas.
Thank you for this post it has helped me greatly. I have a question. I am using brew to generate a repetitive report and would like to suppress the caption label. I know you can do it using latex, but I have to generate a thousand or so tables and don’t want to have to do it for each one. Is there a way to suppress the caption label using either xtable or brew?
Thanks for any help you can gave me.
Xtable allows you to specify whether you want caption or not.
Hi, I’m trying to implement your example to learn something on automated report generation. Could you place somewhere the .csv file you are using? I downloaded some files from http://esa.un.org/unpd/wpp/DVD/ but none matches your format and I receive error due to incompatible vector dimensions. Thanks
Try the link in the beginning of the article.