tags: , , ,

In 2006 UserR conference Jim Porzak gave a presentation on data profiling with R. He showed how to draw summary panels of the data using a combination of grid and base graphics.

Unfortunately the code has not (yet) been released as a package, so when I recently needed to quickly review several datasets at the beginning of an analysis project I started to look for alternatives. A quick search revealed two options that offer similar functionality: r2lUniv package and describe() function in Hmisc package.

## r2lUniv

r2lUniv package performs quick analysis either on a single variable or on a dataframe by computing several statistics (frequency, centrality, dispersion, graph) for each variable and outputs the results in a LaTeX format. The output varies depending on the variable type.

 > library(r2lUniv)

One can specify the text to be inserted in front of each section.

 > textBefore <- paste("\\subsection{", names(mtcars), + "}", sep = "") > rtlu(mtcars, "fileOut.tex", textBefore = textBefore)

The function rtluMainFile generates a LaTeX main document design and allows to further customise the report.

 > text <- "\\input{fileOut.tex}" > rtluMainFile("r2lUniv_report.tex", text = text)

The resulting tex-file can then be converted into pdf.

 > library(tools) > texi2dvi("r2lUniv_report.tex", pdf = TRUE, clean = TRUE)

A sample output for the mpg-variable:

The final pdf-output can be seen here: r2lUniv_report.pdf.

## Hmisc

The describe function in Hmisc package determines whether the variable is character, factor, category, binary, discrete numeric, and continuous numeric, and prints a concise statistical summary according to each. The latex report also includes a spike histogram displaying the frequency counts.

 > library(Hmisc)
 > db <- describe(mtcars, size = "normalsize")

The easiest and fastest way is to print the results to the console.

 > db\$mpg mpg n missing unique Mean .05 .10 .25 .50 32 0 25 20.09 12.00 14.34 15.43 19.20 .75 .90 .95 22.80 30.09 31.30 lowest : 10.4 13.3 14.3 14.7 15.0 highest: 26.0 27.3 30.4 32.4 33.9

Alternatively, one can convert the describe object into a LaTeX file.

 > x <- latex(db, file = "describe.tex")

cat is used to generate the tex-report.

 > text2 <- "\\documentclass{article}\n\\usepackage{relsize,setspace}\n\\begin{document}\n\\input{describe.tex} \n\\end{document}" > cat(text2, file = "Hmisc_describe_report.tex")
 > library(tools) > texi2dvi("Hmisc_describe_report.tex", pdf = TRUE)

A sample output for the mpg-variable:

The final pdf-report can be seen here: Hmisc_describe_report.pdf.

## Conclusion

Both of the functions provide similar snapshots of the data, however I prefer the describe function for its more concise output, and also for the option to print the analysis to the console. Whilst I like the summary plots generated by r2lUniv I find them hard to read in the pdf-report because of the small font-size of the labels.

January 28, 2010 12:04 am

Hi;

Still learning the whole R and latex thing. I ran the code that you gave to produce the mpg plots. I see the two latex documents that were produce and the subdirectory graphUniv. What I am missing is the insertion of the graphics into the two right boxes.

The latex error is:

Running ‘texi2dvi’ on ‘r2lUniv_report.tex’ failed.
LaTeX errors:

The graphUniv/V1-boxplot is referenced in

\includegraphics[width=3cm]{graphUniv/V1-boxplot} of fileOut.tex.

How do I get includegraphics to find the subdirectory?

Jan

January 28, 2010 1:23 am

Have you checked whether you actually have any png-files in the graphUniv subdirectory?

January 28, 2010 1:24 am

Have you checked whether you actually have any png-files in the graphUniv subdirectory, and that they can be opened?
This error seems to suggest otherwise.

January 28, 2010 2:05 am

Ooops! Should have mentioned that as well. The directory contains 22 png files named variously V(#) box, hist or bar.

They open in a graphics view just fine.

Jan

November 11, 2010 4:00 pm

Hi, it looks like the r2lUniv package has been removed from CRAN, does anyone know the reasons for this?