ggplot2: Sales Dashboard

April 9, 2009

tags: business, chart, data, ggplot2, manipulation, plot, R, reshape, RODBC, xlsReadWrite, zoo

In 2005 dmreview.com conducted a Data Visualization competition for the best depiction of the data presented in each of the four prescribed scenarios. Robert Allison won Scenario 3 that involved the creation of a sales dashboard with priority given for measures indicating poor performance. He explains his approach here.

https://learnr.wordpress.com/wp-content/uploads/2009/04/sales_dashboard_scen3b.gif

Creating a similar dashboard in ggplot2 is perfectly feasible, and I will next show how to accomplish this.

Data Import

The first step in the process is importing the input data and converting it to a format most suitable for plotting. Data is provided in an Excel file, and unfortunately as it is laid out on only one sheet the import process requires several manual manipulations. xlsReadWrite library imports all the sheet contents, which are then manually broken down into smaller dataframes. Also, as the column names don’t contain years, this information needs to be added manually with the help of as.yearmon() function from the zoo package. In addition, two longer headings are replaced with shorter ones.

> library(xlsReadWrite)
> library(zoo)
> library(ggplot2)
> library(RODBC)

Import Data:

> file <- "scenario3.xls"
> df <- sqlFetch(odbcConnectExcel(file), sqtable = "Scenario3",
     as.is = T, na.strings = "NA")
> odbcCloseAll()
> names(df)[1] <- make.names(names(df)[1])
> names(df)[2] <- as.character(as.yearmon(paste(names(df)[2],
     2003), "%b %Y"))
> names(df)[-(1:2)] <- as.character(as.yearmon(paste(names(df)[-(1:2)],
     2004), "%b %Y"))
> df <- rename(df, c(`Dec 0001` = "Dec 2004"))

Fix df:

> df$Major.Metrics <- replace(df$Major.Metrics,
     df$Major.Metrics == "Customer Satisfaction (1-5 scale)",
     "Customer Satisfaction")
> df$Major.Metrics <- replace(df$Major.Metrics,
     df$Major.Metrics == "Avg Order Size", "Order Size")

The Excel worksheet was imported into one big datafame df, hence after extracting the specific rows/columns, the variables are renamed, and converted to a long format using a custom function.

Import targets:

> metricstargets <- df[50:56, 1:5]
> producttargets <- df[59:63, 1:5]
> regiontargets <- df[66:70, 1:5]
> calc_target <- function(targetdf) {
     names(targetdf) <- c("Major.Metrics", paste("Q",
         1:4, sep = ""))
     targetdf <- melt(targetdf, variable_name = "yearqtr")
     targetdf$yearqtr <- as.yearqtr(paste(2004,
         targetdf$yearqtr))
     targetdf <- rename(targetdf, c(value = "target"))
 }
> metricstargets <- calc_target(metricstargets)
> producttargets <- calc_target(producttargets)
> regiontargets <- calc_target(regiontargets)

Extraction of dataframes containing actual performance information follows the same principle. Target indicators are for quarters only, thus monthly results need to be aggregated into quarterly results. For most of the indicators summarising the three monthly results is sufficient, whereas for others (“Order Size”, “Customer Satisfaction”, “Market Share”, “On-Time Delivery”) the 3-month average is used. After merging the targets and actual results into one data-frame, the comparison of results is also performed in this step.

> metricsactual <- df[1:7, ]
> productactual <- df[11:15, ]
> regionactual <- df[19:23, ]

> calc_actual <- function(actualdf, targetdf) {
     actualdf <- melt(actualdf, variable_name = "yearmon")
     actualdf$yearqtr <- as.yearqtr(actualdf$yearmon,
         "%b %Y")
     actualdf <- subset(actualdf, yearqtr >
         "2003 Q4")
     actual_qtr <- cast(actualdf, yearqtr +
         Major.Metrics ~ ., c(sum, mean))
     actual_qtr$Major.Metrics <- factor(actual_qtr$Major.Metrics)
     actual_qtr$Major.Metrics <- reorder(actual_qtr$Major.Metrics,
         actual_qtr$sum, sum)
     tmp <- merge(actual_qtr, targetdf)
     tmp[with(tmp, order(Major.Metrics, yearqtr)),
         ]
 }
> metrics <- calc_actual(metricsactual, metricstargets)
> product <- calc_actual(productactual, producttargets)
> region <- calc_actual(regionactual, regiontargets)
> avg <- c("Order Size", "Customer Satisfaction",
     "Market Share", "On-Time Delivery")
> metrics$actual <- ifelse(metrics$Major.Metrics %in%
     avg, metrics$mean, metrics$sum)
> metrics$eval <- with(metrics, actual/target)
> product$actual <- product$sum
> product$eval <- with(product, actual/target)
> region$actual <- region$sum
> region$eval <- with(region, actual/target)

Performance Evaluation

The performance evaluation criteria are too cumbersome to import from the Excel file, and therefore the dataframe containing relevant information is created manually. The results in dataframes metrics, product and region are then compared against the predefined criteria and performance compared to targets classified as Good, Satisfactory or Poor.

> evaluation <- read.csv(textConnection("
 class,Major.Metrics,poor,good
 metric,Revenue,0.60,0.90
 metric,Profit,0.60,0.80
 metric,Order Size,0.50,0.75
 metric,Market Share,0.65,0.90
 metric,Customer Satisfaction,0.60,0.90
 metric,On-Time Delivery,0.60,0.90
 metric,New Customers,0.50,0.85
 product,Cabernet,0.60,0.90
 product,Zinfandel,0.60,0.90
 product,Merlot,0.60,0.90
 product,Chardonnay,0.60,0.90
 product,Sauvignan Blanc,0.60,0.90
 region,North America,0.60,0.90
 region,Europe,0.60,0.90
 region,Asia,0.60,0.90
 region,South America,0.60,0.90
 region,Middle East,0.60,0.90
 "),stringsAsFactors = FALSE)
> closeAllConnections()

> calc_evaluation <- function(df) {
     df <- merge(df, evaluation)
     df$assess <- with(df, ifelse(eval < poor,
         "Poor", ifelse(eval < good, "Satisfactory",
             "Good")))
     df
 }
> metrics <- calc_evaluation(metrics)
> product <- calc_evaluation(product)
> region <- calc_evaluation(region)

Finally the import of pipeline information

> customerpipeline <- df[26:35, 1:4]
> regionpipeline <- df[41:45, 1:5]
> names(customerpipeline) <- c("customer", "pipeline",
     "qtd", "ytd")
> names(regionpipeline) <- c("Region", "90%",
     "75%", "50%", "25%")
> regionpipe <- melt(regionpipeline, variable_name = "probability")
> regionpipe$probability <- factor(regionpipe$probability,
     levels = c("25%", "50%", "75%", "90%"))
> regionpipe$Region <- factor(regionpipe$Region)
> regionpipe$Region <- with(regionpipe, reorder(Region,
     -value, sum))
> regionpipe <- regionpipe[order(regionpipe$probability),
     ]
> regionpipe$title <- "Sales Pipeline"
> customerpipeline$title <- "Top 10 Customers"

Helper Functions

The data has been prepared and it is ready for plotting, which will be set up in the following section. Several helper functions are defined to:

a) set up grid layout for plotting.

A 10×10 grid allowing to evenly position the two plots in the last row is used for plotting. Note also that the last row is higher compared to the rows above it.

> Layout <- grid.layout(nrow = 5, ncol = 10,
     widths = unit(c(2, 2, 2), c("null", "null",
         "null")), heights = unit(c(1, 1, 1,
         1, 1.5), c("null", "null", "null")))
> grid.show.layout(Layout)

https://learnr.wordpress.com/wp-content/uploads/2009/04/dashboard_layout.png

> vplayout <- function(...) {
     grid.newpage()
     pushViewport(viewport(layout = Layout))
 }
> subplot <- function(x, y) viewport(layout.pos.row = x,
     layout.pos.col = y)

b) formatting the axis labels

> formatter <- function(x, prefix = "$") {
     ifelse(x > 999999999, paste(prefix, comma(x/1e+09,
         nsmall = 1), "bn", sep = ""), ifelse(x >
         999999, paste(prefix, comma(x/1e+06),
         "m", sep = ""), ifelse(x > 999, paste(prefix,
         comma(x/1000), "k", sep = ""), paste(prefix,
         comma(x), sep = ""))))
 }
> rng <- range(product$actual)

c) plotting & formatting.

Inside the function the following plots are drawn: – p – p2: plot drawing. Barplot used to depict actuals, and errorbars the corresponding target values. – p3: formatting of x & y axes labels and setting axes limits. – p4: mapping of fill colours to evaluation results – p5: remove axes titles, minor gridlines and legends. Finally remove empty space around the plots.

> fplot <- function(df) {
     p <- ggplot(df, aes(factor(yearqtr), actual)) +
         facet_wrap(~Major.Metrics, scales = "free_y",
             nrow = 1)
     p1 <- p + geom_bar(stat = "identity", aes(fill = assess))
     p2 <- p1 + geom_errorbar(aes(ymin = target,
         ymax = target), size = 1)
     p3 <- p2 + scale_x_discrete(labels = unique(format.yearqtr(df$yearqtr,
         format = "Q%q")))
     p3 <- p3 + scale_y_continuous(formatter = formatter,
         limits = c(0, rng[2]))
     p4 <- p3 + scale_fill_manual(values = c(Poor = "red",
         Satisfactory = "lightpink", Good = "lightgreen"))
     p5 <- p4 + opts(legend.position = "none",
         axis.title.x = theme_blank(), axis.title.y = theme_blank())
     p5 <- p5 + opts(panel.grid.minor = theme_blank())
     p5 + opts(plot.margin = unit(c(0, 0, 0,
         0), "lines"))
 }

Dashboard Plotting

Using the above function the plots in the first four rows can be prepared by passing the relevant dataframe to the function and reformatting the axes labels where necessary.

> theme_set(theme_grey(base_size = 10))
> p1_1 <- fplot(metrics[metrics$Major.Metrics ==
     "Revenue", ]) + scale_y_continuous(formatter = formatter)
> p1_2 <- p1_1 %+% metrics[metrics$Major.Metrics ==
     "Profit", ]
> p2_1 <- fplot(metrics[metrics$Major.Metrics ==
     "Order Size", ]) + scale_y_continuous(formatter = "comma")
> p2_2 <- p2_1 %+% metrics[metrics$Major.Metrics ==
     "New Customers", ]
> p2_3 <- p2_1 %+% metrics[metrics$Major.Metrics ==
     "Customer Satisfaction", ]
> p2_4 <- fplot(metrics[metrics$Major.Metrics ==
     "Market Share", ]) + scale_y_continuous(formatter = "percent")
> p2_5 <- p2_4 %+% metrics[metrics$Major.Metrics ==
     "On-Time Delivery", ]
> p3_1 <- fplot(product[product$Major.Metrics ==
     levels(product$Major.Metrics)[5], ])
> p3_2 <- fplot(product[product$Major.Metrics ==
     levels(product$Major.Metrics)[4], ])
> p3_3 <- fplot(product[product$Major.Metrics ==
     levels(product$Major.Metrics)[3], ])
> p3_4 <- fplot(product[product$Major.Metrics ==
     levels(product$Major.Metrics)[2], ])
> p3_5 <- fplot(product[product$Major.Metrics ==
     levels(product$Major.Metrics)[1], ])
> p4_1 <- fplot(region[region$Major.Metrics ==
     levels(region$Major.Metrics)[5], ])
> p4_2 <- fplot(region[region$Major.Metrics ==
     levels(region$Major.Metrics)[4], ])
> p4_3 <- fplot(region[region$Major.Metrics ==
     levels(region$Major.Metrics)[3], ])
> p4_4 <- fplot(region[region$Major.Metrics ==
     levels(region$Major.Metrics)[2], ])
> p4_5 <- fplot(region[region$Major.Metrics ==
     levels(region$Major.Metrics)[1], ])

The pipeline plots in the fifth row need to be drawn separately.

> p5a <- ggplot(regionpipe, aes(Region, value)) +
     geom_bar(stat = "identity", aes(fill = probability)) +
     labs(fill = "Probability \nof Sale") +
     opts(axis.title.x = theme_blank(), axis.title.y = theme_blank()) +
     opts(plot.margin = unit(c(0, 0, 0, 0),
         "lines")) + opts(plot.title = theme_text(size = 10)) +
     scale_y_continuous(formatter = formatter) +
     scale_fill_brewer(palette = "Blues") +
     coord_flip() + facet_wrap(~title)
> p5b <- ggplot(customerpipeline, aes(customer),
     colour = "red") + geom_bar(aes(y = ytd +
     pipeline), stat = "identity", fill = "white") +
     geom_bar(aes(y = ytd), stat = "identity",
         fill = "steelblue") + geom_errorbar(aes(ymin = ytd -
     qtd, ymax = ytd - qtd), colour = "black") +
     opts(axis.title.x = theme_blank(), axis.title.y = theme_blank()) +
     opts(plot.margin = unit(c(0, 0.5, 0, 0),
         "lines")) + scale_y_continuous(formatter = formatter) +
     facet_wrap(~title) + coord_flip()

After all the preparations we are ready to plot the dashboard.

> mmplot2 <- function() {
     vplayout()
     print(p1_1, vp = subplot(1, 1:2))
     print(p1_2, vp = subplot(1, 3:4))
     print(p2_1, vp = subplot(2, 1:2))
     print(p2_2, vp = subplot(2, 3:4))
     print(p2_3, vp = subplot(2, 5:6))
     print(p2_4, vp = subplot(2, 7:8))
     print(p2_5, vp = subplot(2, 9:10))
     print(p3_1, vp = subplot(3, 1:2))
     print(p3_2, vp = subplot(3, 3:4))
     print(p3_3, vp = subplot(3, 5:6))
     print(p3_4, vp = subplot(3, 7:8))
     print(p3_5, vp = subplot(3, 9:10))
     print(p4_1, vp = subplot(4, 1:2))
     print(p4_2, vp = subplot(4, 3:4))
     print(p4_3, vp = subplot(4, 5:6))
     print(p4_4, vp = subplot(4, 7:8))
     print(p4_5, vp = subplot(4, 9:10))
     print(p5a, vp = subplot(5, 1:5))
     print(p5b, vp = subplot(5, 6:10))
     grid.text("Sales Dashboard", x = unit(0.75,
         "npc"), y = unit(0.97, "npc"), just = "center",
         gp = gpar(col = "black", fontsize = 14))
     grid.text("All Currency in USD", x = unit(0.75,
         "npc"), y = unit(0.95, "npc"), just = "center",
         gp = gpar(col = "black", fontsize = 10))
 }
> mmplot2()

https://learnr.wordpress.com/wp-content/uploads/2009/04/sales_dashboard.png

Unfortunately, the png quality on my WinXP is not great, there is also a pdf-version available. At the moment each plot was created manually, it would be easier to use ggplot2’s facetting capabilities, however this approach would make the custom axis formatting impossible, also there would be a more visible misalignment of plots.

12 Comments leave one →

joe permalink

April 16, 2009 5:00 am

I don’t understand why this is now being posted. This was a contest from 2005.

Reply
- learnr permalink
  
  April 16, 2009 11:03 am
  
  Yes, and I only came across this contest recently. I liked the dashboard, and the underlying explanations and decided to try and replicate the design.
  
  Reply
  - joe permalink
    
    April 18, 2009 8:18 pm
    
    Oh, didn’t mean to suggest that it wasn’t useful. Thanks very much for the code. I personally would have liked to see some a more diverse panels (line charts and scatter plots for instance) and I imagine these are the kinds of things that might have been added in updated versions. I am having trouble seeing where I would have to modify the supplied code to include a scatter plot, for instance.
  - learnr permalink
    
    April 19, 2009 12:54 am
    
    As you rightly point out, all the small panel plots are of the same type. The actual plotting is handled by a helper function fplot() that allows just to specify data to be used, and the panel plot is drawn. The plots are then arranged in a grid using another helper function mmplot() that just allocates small panels into grid cells.
    
    In order for to you use a different plot type, you can first modify the commands in fplot() to draw the plot you like, and then modify mmplot() to indicate into which grid cell the plot is to be drawn. Or, alternatively you can draw one from scratch like plots p5a & p5b (i.e. the plots in the last row).
    
    If you have any specific dashboard/image in mind you have seen on the web, leave a link, and I will try to recreate it.
dkodpe permalink

April 17, 2009 5:35 pm

learnr

Great post. R does dashboards. I really enjoy your site, keep it up. I’m taking Hadley’s ggplot2 course in late July.

Kelly O’Day
http://chartsgraphs.wordpress.com
http://processtrends.com

Reply
Alexander Matrunich permalink

September 7, 2011 8:42 am

Learnr!

Do you think it’s possible to add charttips/flyover-text and [html href] drilldowns to your ggplot-variant dashboard like in original one (http://www.datavis.ca/gallery/allison/scen3b.htm)? I’ve found one thread in the support list (https://stat.ethz.ch/pipermail/r-help/2007-April/130713.html). Firstly, we’ve to get coords of every charttip area in the graph. Is it possible in ggplot2?

Reply
- learnr permalink*
  
  September 12, 2011 1:00 pm
  
  I am sure it is, although I am not aware of how to achieve this.
  
  Reply
Arun Krishnan permalink

April 11, 2012 9:02 pm

Hi,
Came across your “tutorial” on dashboards. Would it be possible to share the excel file you used? That would really help to understand some of what you have done.

Thanks,

Arun Krishnan

Reply
- learnr permalink*
  
  April 12, 2012 1:45 am
  
  The Excel file can be downloaded from (here). I hope this is the original file. As a side note, I think that the code in this blog-post could be improved significantly and expressed more succinctly.
  
  Reply

	S.Stender on ggplot2: Labelling Data Series…
	ggplot séparer la lé… on ggplot2: Two Or More Plots Sha…
	9 Useful R Data Visu… on ggplot2 Version of Figures in…
	Mandar on Data Manipulation in R to Crea…
	Mandar on Data Manipulation in R to Crea…

Finding my way around R

ggplot2: Sales Dashboard

Data Import

Performance Evaluation

Helper Functions

Dashboard Plotting

Trackbacks

Leave a comment Cancel reply

Subscribe

Recent Posts

Recent Comments

Archives

Finding my way around R

ggplot2: Sales Dashboard

Data Import

Performance Evaluation

Helper Functions

Dashboard Plotting

Share this:

Related

Trackbacks

Leave a comment Cancel reply

Subscribe

Recent Posts

Recent Comments

Archives