Skip to content

gtools: Useful Data Manipulation Functions

May 31, 2009

I recently came across package gtools that contains various R programming tools for data manipulation. This post highlights some of the functions contained in this package which I found most useful for data manipulation. Some of the examples are from gtools’ help pages.

mixedsort() / mixedorder()

This function sorts or orders character strings containing numbers so that the numbers are numerically sorted rather than sorted by character value.

> library(gtools)
> desc <- paste("A", 1:12, sep = "")
> sort(desc)
 [1] "A1"  "A10" "A11" "A12" "A2"  "A3"  "A4"  "A5"
 [9] "A6"  "A7"  "A8"  "A9"
> mixedsort(desc)
 [1] "A1"  "A2"  "A3"  "A4"  "A5"  "A6"  "A7"  "A8"
 [9] "A9"  "A10" "A11" "A12"

smartbind()

Efficiently combines data frames, even if the column names don’t match.

> df1 <- data.frame(A = 1:10, B = LETTERS[1:10],
     C = rnorm(10))
> df2 <- data.frame(A = 11:20, D = rnorm(10),
     E = letters[1:10])

smartbind() combines them, appropriately creating NA entries.

> smartbind(df1, df2)
      A    B          C          D    E
1.1   1    A -0.2783511         NA <NA>
1.2   2    B  0.1885082         NA <NA>
1.3   3    C  0.2403823         NA <NA>
1.4   4    D  0.9866348         NA <NA>
1.5   5    E -0.2204937         NA <NA>
1.6   6    F -2.2177263         NA <NA>
1.7   7    G  1.5494367         NA <NA>
1.8   8    H  1.2159574         NA <NA>
1.9   9    I  0.9482660         NA <NA>
1.10 10    J -1.5123528         NA <NA>
2.1  11 <NA>         NA -1.9943360    a
2.2  12 <NA>         NA -1.3132422    b
2.3  13 <NA>         NA  0.6771895    c
2.4  14 <NA>         NA -0.8470251    d
2.5  15 <NA>         NA  0.7926469    e
2.6  16 <NA>         NA -1.5133757    f
2.7  17 <NA>         NA -1.7590515    g
2.8  18 <NA>         NA -2.6861958    h
2.9  19 <NA>         NA -0.5772023    i
2.10 20 <NA>         NA  0.8225223    j

running()

This function applies a function over subsets of the vector(s) formed by taking a fixed number of previous points. It is very handy, for example, in calculating moving averages over a period specified by width.

> df <- data.frame(a = sample(10))
> df$b <- running(df$a, width = 2, pad = TRUE,
     fun = mean)
> df
    a   b
1   2  NA
2   8 5.0
3   5 6.5
4   3 4.0
5   9 6.0
6   1 5.0
7   6 3.5
8  10 8.0
9   7 8.5
10  4 5.5
One Comment leave one →
  1. August 2, 2012 9:40 pm

    Many thanks for blogging about this… just what I was looking for!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: