Skip to content

Export Data Frames To Multi-worksheet Excel File

October 6, 2009

A few weeks ago I needed to export a number of data frames to separate worksheets in an Excel file. Although one could output csv-files from R and then import them manually or with the help of VBA into Excel, I was after a more streamlined solution, as I would need to repeat this process quite regularly in the future.

CRAN has several packages that offer the functionality of creating an Excel file, however several of them provide only the very basic functionality. The R-wiki page on exchanging data between R and Windows applications focuses mainly on the data import problem.

My objective was to find an export method that would allow me to easily split a larger dataframe by values of a given variable so that each subset would be exported to its own worksheet in the same Excel file. I tried out the different ways of achieving this and documented my findings below.



Data Preparation

The goal is to split the iris dataset by the unique values of iris$Species, and export it to a new Excel file with worksheets: setosa, versicolor and virginica.

Two ways of storing the information were used for the purposes of this exercise:

List of dataframes

> library(plyr)
> testlist <- dlply(iris, .(Species))

A character vector of containing names of data frames

First create the data frames, and then the character vector with the object names – now we will have three separate data frames called setosa, versicolor and virginica.

> d_ply(iris, .(Species), function(df) {
     assign(as.character(df$Species[1]), df,
         envir = .GlobalEnv)
 })
> testchar <- as.character(unique(iris$Species))
> testchar1 <- paste(testchar, collapse = ",")

Storing data in a list of dataframes is obviously much more convenient, however, as you see later many (or most) functions don’t accept lists as input.


dataframes2xls

Note

Requires Python (>= 2.4)

dataframes2xls saves dataframes to an xls file. Its main function write.xls, is a wrapper around a utility called xls2csv. xls2csv makes use of the Python module pyExcelerator and the afm submodule of the Python module matplotlib, both of which are included in dataframes2xls.

> library(dataframes2xls)
> dataframes2xls::write.xls(c(setosa, versicolor,
     virginica), "dataframes2xls.xls")

Appending to an existing file not possible, this negates the use of list of dataframes as input. One major shortcoming is that one needs to specify the names of exported dataframes manually. Also, I was not able to pass sheet names to the function if there was more than one sheet.

There is a way to work around the manual specification of data frames, but as you can see it is not very intuitive:

> eval(parse(text = paste("dataframes2xls::write.xls(c(",
     testchar1, "),", "'dataframes2xls.xls')")))

WriteXLS

Note

Requires Perl with module Text::CSV_XS

WriteXLS is a “Perl based R function to create Excel (XLS) files from one or more data frames. Each data frame will be written to a separate named worksheet in the Excel spreadsheet. The worksheet name will be the name of the data frame it contains or can be specified by the user”.

> library(WriteXLS)
> WriteXLS(testchar, "WriteXLS.xls", perl = perl)

Like dataframes2xls WriteXLS does not take lists as input, and therefore each of the data frames needs to be generated beforehand before calling the function(s). Also appending to a file is not possible as the Excel file, if it exists, is overwritten.


xlsReadWrite(Pro)

Note

This package currently works only on Windows machines.

xlsReadWrite saves a data frame, matrix or vector as an Excel file in Excel 97-2003 file format.

> library(xlsReadWrite)
> xlsReadWrite::write.xls(iris, "xlsReadWrite.xls")

The dataframe can be written to one sheet only. It is not possible to split the data between separate sheets, as append data to existing files feature is available in the Pro-version.

The Pro-version has a 30-day trial, so I tried it out.

> library(xlsReadWritePro)
> rfile <- "xlsReadWritePro.xls"
> exc <- xls.new(rfile)
> l_ply(testlist, function(x) {
     sheet <- as.character(unique(x$Species))
     xlsReadWritePro::write.xls(x, file = exc,
         sheet = sheet, colNames = TRUE)
 })
> xls.close(exc)

Task accomplished.

The Pro-version has several other nice features, such as the ability to save images to an Excel file, or to write Excel formulas.

For reference, the licence costs are as follows.

  • Single user license: 75 euros
  • Non-commercial single user: 19 euros
  • Company/university wide: 570 euros

RODBC

Note

There are ODBC Excel drivers for Windows only.

RODBC sqlSave function saves the data frame in the specified worksheet via ODBC after initiating the connection using a convenience wrapper odbcConnectExcel.

> library(RODBC)
> save2excel <- function(x) sqlSave(xlsFile,
     x, tablename = x$Species[1], rownames = FALSE)
> xlsFile <- odbcConnectExcel("RODBC.xls", readOnly = FALSE)
> l_ply(testlist, save2excel)
> odbcCloseAll()

This worked well. Another good thing to note is that you can append to already existing files.


RDCOMClient

Note

It requires a Windows machine that has Excel running on it.

RDCOMClient allows to access and control applications such as Excel, Word, PowerPoint, Web browsers from within R session. As an alternative, rcom package provides similar functionality.

> library(RDCOMClient)

Developer’s website provides some useful functions for exporting/importing dataframes to/from Excel.

> source("http://www.omegahat.org/RDCOMClient/examples/excelUtils3.R")
> xls <- COMCreate("Excel.Application")
> xls[["Visible"]] <- TRUE
> wb = xls[["Workbooks"]]$Add(1)
> rdcomexport <- function(x) {
     sh = wb[["Worksheets"]]$Add()
     sh[["Name"]] <- as.character(x$Species[1])
     exportDataFrame(x, at = sh$Range("A1"))
 }
> d_ply(iris, .(Species), rdcomexport)

Now I have an Excel file open with an empty “Sheet1″. I delete it and, after specifying the save-directory (it defaults to My Documents) & filename , save the file. I have Office 2007 installed, so this file format is used by default.

> xls$Sheets("Sheet1")$Delete()
> filename <- paste(getwd(), "RDCOMClient.xlsx",
     sep = "/")
> filename <- gsub('/', '\\', filename)
> wb$SaveAs(filename)
> wb$Close(filename)

Another example of RDCOMClient in action can be seen in this R-help post.


Conclusion

The Windows-only solutions were (expectedly) the most flexible with RDCOMClient-approach providing the greatest control over output. However, if formatting was not that important the simplest way to export data to multiple Excel worksheets would be via RODBC (if on Windows machine).

About these ads
29 Comments leave one →
  1. keijo permalink
    October 6, 2009 6:30 pm

    Great post again! The shortcoming with RDCOMClient is that it opens a new Excel process and you can’t e.g. run macros for that worksheet from your personal macro workbook. See e.g.

    https://stat.ethz.ch/pipermail/r-help/2009-July/203836.html

    One minor typo, I think that this

    filename <- gsub("/", "\", filename)

    should be like

    filename <- gsub("/", "\\", filename)

    • learnr permalink*
      October 6, 2009 6:50 pm

      Thanks for spotting.

      Actually it should be like this:
      gsub(“/”, “\\\\”, filename)

      It is like this in the original script, but WordPress XML-RPC seems to be eating backslashes.
      Will fix the post.

  2. kavu permalink
    October 7, 2009 1:50 am

    Many thanks for this article which has just saved me a lot of time in evaluating the various Excel export options. Really enjoy and benefit from your blog.

  3. November 11, 2009 11:31 am

    The best thing to do is write new code that exports to xlsx. It’s easy-ish because it’s a zipped XML format. I wrote an xlsx to tsv converter that can handle multiple worksheets pretty easily. A writer would be harder of course, but definitely doable.

    http://github.com/brendano/tsvutils/blob/master/xlsx2tsv

  4. Werner permalink
    December 6, 2009 4:02 pm

    Thanks for showing new alternatives, that’s a very helpful post.
    But striving after a free and portable applications only setup, it would be great to have such functions which use OpenOffice (format) instead. Anything known in that direction?

    • learnr permalink*
      December 6, 2009 8:00 pm

      There is odfWeave, but unfortunately it does not support the Calc spreadsheets yet meaning you would have to resort to text documents which might not be the most convenient solution.

  5. January 18, 2010 2:07 pm

    Thanks a lot for this post.
    I ended up using ODBC with minor changes to save2excel.
    One thing I didn’t understand is way some of my columns got the data with an added:

    to each value.
    So something like ” var ” would be ” ‘var ”

    Either way – thanks a lot!
    Tal

  6. zpfietsch permalink
    March 16, 2010 9:29 pm

    I have an end user who needs to have the r function called from a VBA macro. I am using the RExcel package and am much more familiar with R than VBA. I am starting to feel retarded because I simply need to fetch 2 columns of data for one sheet in a very large workbook with a ton of macros. Replacing the system with R is not an option, but a career choice.

    So here is the R- code

    ##load library for ‘getSymbols function
    library(quantmod)

    ##download data
    EU <- getSymbols(c('EUR/USD'), src = "oanda")

    ##separate data from data frame
    eurmat <- EURUSD

    ##assure that data is an array
    eurmat <- as.array(eurmat)

    AND here is the VBA subroutine:

    Sub rData()
    'Selects the sheet I want
    Sheets("EU").Activate
    Dim ourRange As Range

    ourRange = Range("A:B")
    'start R
    RInterface.StartRServer

    RInterface.RRun ("library(quantmod)")
    RInterface.RRun ("EU <- getSymbols(c('EUR/USD'), src = 'oanda')")
    RInterface.RRun ("eurmat < -EURUSD")
    RInterface.RRun ("eurmat <- as.array(eurmat)")

    Range(ourRange).Value = RInterface.GetArray("eurmat", Range("A:B"))

    Another version I tried used ourRange in the last line… neither works. Both receive the same error of "expected function or variable" in reference to the ".GetArray" portion. Have also attempted the above iterations without using "Range(ourRange).Value ="; which seems logical given that the output range is specified in the formula. As I'm sure you can guess… fail.

    I also tried using the "RInterface.RunRFile " command. It seems that windows is not reading my .R script files because they "are not valid win32 application[s]". How do I fix this issue?

    I realize it is probably a no brainer, but I just about used my laptop to make that new window I always wanted in my kitchen…

    Thanks,
    Zach

    • learnr permalink*
      March 22, 2010 2:51 pm

      Zach – sorry, cannot help, as I know very little about VBA.

    • August 24, 2010 2:47 am

      RInterface.GetArray is a Sub, not a function.
      From the docs:

      RInterface.GetDataframe(varname,range)
      Puts the value of R variable var (which needs to be a dataframe) into Excel range range, putting variable names in the first row of the range

      RInterface.GetArray(Rexpr,range)
      Puts the value of R expression Rexpr into Excel range range
      Rexpression has to have a value of scalar, vector, or matrix.
      Values of type lists or dataframe will produce errors.

      Since the range is given as argument, using it on the lhs of an assignment does not make sense.

      And if you use GetDataframe, you do not need to convert the dataframe to an array.

  7. Gene permalink
    May 19, 2010 12:07 am

    Have you looked at the WriteXLS package lately? You can write data frames to separate sheets.

    Requires Perl
    Example from the WriteXls help file:

    library(WriteXLS)

    # Only run the examples if Perl and all modules are present
    if (testPerl(verbose = FALSE))
    {
    # Examples using built-in data frames
    WriteXLS(“iris”, “iris.xls”)

    WriteXLS(c(“iris”, “infert”, “esoph”), “Example.xls”)

    # Clean up and delete XLS files
    unlink(“iris.xls”)
    unlink(“Example.xls”)
    }

  8. Gene permalink
    May 19, 2010 1:07 am

    Also, here’s a function to convert the objects to data frames if possible. There is no error checking to see that the conversion is possible, but it works for matrices

    writexls=function(x, wb, debug=FALSE){
    if(debug) browser()
    if(!require(‘WriteXLS’)) stop()
    dfconv=function(xx)
    eval(substitute(
    as.data.frame(variable),
    list(variable = as.name(xx))
    ))
    for(i in 1:length(x))
    assign(as.character(x[i]), dfconv(x[i]))
    WriteXLS(x, wb)
    }

  9. Mat permalink
    August 27, 2010 4:25 pm

    One thing you could add for the comparison of the packages is that writexls unfortunately is not able to export the rownames, which for me limits its use quite a lot!

  10. Marc Schwartz permalink
    September 21, 2010 3:53 am

    A quick heads up to those still following this thread.

    WriteXLS version 2.1.0 (released on Sept 18, 2010) now supports a named list as an argument, as an alternative to the names of one or more data frames. An example of this use is:

    require(WriteXLS)
    iris.split <- split(iris, iris$Species)
    WriteXLS("iris.split", "irissplit.xls")

    In addition, the new version supports the exporting of row names for the data frames to the Excel worksheets. The new argument 'row.names' was added to support this and defaults to FALSE for compatibility with existing code.

    HTH.

    • Tom Evans permalink
      October 4, 2010 2:37 am

      I don’t think it was mentioned here or in the comments. I’ve just been using the package ‘xlsx’, this seems to work pretty well:

      http://cran.r-project.org/web/packages/xlsx

      A simple command will write a dataframe to an xlsx file (suitable for Excel 07/10):
      write.xlsx(iris, “iris.xlsx”)
      I don’t think it’s as flexible as some of the others mentioned above, but it is easy to use if you just want to write some data to an excel file – and xlsx files are smaller than xls files too, which can be an advantage if a lot of data is involved.

  11. Pankaj permalink
    November 18, 2010 8:09 pm

    Regardingdataframes2xls:

    I have a dataframe containing 104 columns and 30,000 rows. I want to extract 104 columns to a single excel file as separate spreadsheet . Again each of the 104 columns are named or grouped as say A1, A2, A3 ; B1,B2,B3; C1,C2,C3 etc. So, instead of exporting to 104 spreadsheets I want to export all the A , B, and C columns to a single spreadsheet.

    How do I do that? I am a learner in R” and not an expert.

    Thanks,
    Pankaj Barah

    • learnr permalink*
      November 22, 2010 4:25 pm

      You need a script which loops through all the columns, extracts the relevant data and then appends to a new sheet in an Excel file.
      Almost all the write.xls functions have the option to specify the sheetname.

  12. March 29, 2011 5:16 pm

    Thank you so very much for this post! Very useful, indeed, after all this struggle with numeric data format conversion when saving as .csv. The library dataframes2xls made my day :-).

  13. Tej permalink
    June 18, 2011 8:37 pm

    Hi, How to export data from one excel file to other multiple files…i.e., lets say my excel contains data in column B as Monday, Tue, Wed, Thursday and every row has data with respect to that month (Total 1 excel file). Now I need to create different excel files for each month i.e., Mon, Tue, Wed, Thu (Total 4 files). Can I know how do I create this. Thanks in advance

    • learnr permalink*
      June 21, 2011 2:40 am

      There are a few ways shown in the blogpost. But why don’t you post what you have got already, and I could hopefully point you in the right direction.

  14. Sally permalink
    January 6, 2012 4:04 am

    Hi folks,

    I’m trying to export the results of a predict(glm), i.e., the predictions for a glm model. The output is 1 to 1000 and the corresponding values, but is it not set up in standardized rows and columns. I’m seeking a way to export the results to Excel for further analysis. I haven’t been able to find a way to use the dataframe export methods listed above to get for example the xlsx function to work. For example, in the output, the first row of numbers is say 1 to 10 and the second row of numbers is the predicted values for 1 to 10, then 10 to 20, then the predicted values for 10 to 20, so it’s not in a format that’s easy to separate into OIDs and corresponding values in rows and columns. Any suggestions on how I can export these results to an Excel spreadsheet?

    Thank you,

    Sally

  15. Claire Rioualen permalink
    April 27, 2012 3:59 pm

    Thanks a lot, very useful post!

  16. NEO permalink
    October 16, 2012 1:22 pm

    “odbcConnectExcel is only usable with 32-bit Windows”
    just for ur information!

  17. NEO permalink
    October 16, 2012 1:24 pm

    “package ‘RDCOMClient’ is not available (for R version 2.15.1)” hmmm

    • MSMortensen permalink
      February 6, 2013 3:34 pm

      Most likely a problem caused by you running 64-bit and the package only supporting 32-bit.
      I’m not sure, but that was my problem with a couple of the packages.
      Change to 32-bit and it should solve your problem

  18. February 19, 2013 1:48 pm

    Entering multiple sheetnames in dataframes2xls is not very R-ish but works though. Use sh.names =”MyFirstSheet:::MySecondSheet” if you specified two dataframes for example.

    • MSMortensen permalink
      June 6, 2013 11:09 am

      I use a script that outpots several dataframes to one xls regularly using the “RODBC” library. This is very easy and names the sheets after the dataframes. This is done as follows (rownames is optional and there are some extra parameters):
      library(RODBC)
      filename <- "THE_FILENAME.xls")
      xlsFile <- odbcConnectExcel(filename, readOnly = FALSE)
      sqlSave(xlsFile, Name_of_data.frame.A, rownames = FALSE)
      sqlSave(xlsFile, Name_of_data.frame.B, rownames = FALSE)
      sqlSave(xlsFile, Name_of_data.frame.C, rownames = FALSE)
      sqlSave(xlsFile, Name_of_data.frame.D, rownames = TRUE)
      odbcCloseAll()

Trackbacks

  1. Some useful R links (State of the R) « Mai!
  2. Clustergram: A graph for visualizing cluster analyses (R code) | R-statistics blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 180 other followers

%d bloggers like this: