Analytics for Business : Waterfall Charts with R
The waterfall chart or the flying brick chart or the Mario Chart or as how finance folks call it, the bridge chart helps in visualising and understanding the cumulative effect of sequential positive and negative values. From visualizing basic reconciliations, to Cash Flow, to Balance sheets, to The Income and expenditure statements, its use in the financial world is overwhelming. This type of charting though popularized by the consulting firm McKinsey & Company, is widely used globally to sum up pictorially the cumulative effect of the values at hand.
Here, I try introducing this charting technique using data from an Income and Expenditure Statement and using the R Statistics Software to generate the chart.
Note: This post presumes that the reader has working knowledge in R Programming.The charts as appearing in this post might be small. Click on them for a full sized view
Tata Motors Standalone Financials – Income and Expenditure Statement as per Schedule III of The Companies Act, 2013 as adapted to a line by line tabular form
|Particulars||Nature||March -2017 (INR in Cr.)|
|Revenue From Operations [Gross]||Income||48,388.49|
|Less: Excise/Service Tax/Other Levies||Ex.D./Ser. Tax||– 4,736.41|
|Other Operating Revenues||Income||711.92|
|Cost Of Materials Consumed||Expenses||– 27,654.40|
|Purchase Of Stock-In Trade||Expenses||– 3,945.97|
|Operating And Direct Expenses||Expenses||– 454.48|
|Changes In Inventories Of FG,WIP And Stock-In Trade||Expenses||251.43|
|Employee Benefit Expenses||Expenses||– 3,558.52|
|Finance Costs||Expenses||– 1,590.15|
|Depreciation And Amortisation Expenses||Dep./Amort.||– 2,969.39|
|Other Expenses||Expenses||– 8,697.42|
|Less: Amounts Transfer To Capital Accounts||Tr. To Capital||941.55|
|Exceptional Items||Excep. Items||252.45|
|Total Tax Expenses||Direct Tax||– 59.22|
|Extraordinary Items||Ext. Items||– 338.71|
Download .csv dataset of above : Tata Motors SPandL Mar-17
The above table is a direct extract of the Standalone Income and Expenditure Statement of Tata Motors Limited for the year ended March 31, 2017. We’ll be using this data set to develop a waterfall chart in the R.
In the above table, the line items have been classified into 8 categories based on its nature. These classifications have been given in the Nature column. The respective amounts to each line item for the year ended March 31, 2017 have been given in the last column. For the above table, all amounts that increases the income have been marked as positive and what that reduces as negative. (Credit amounts +ve and Debit amounts –ve)
For this chart, we’ll be using the ggplot2 library. If the copy of R does not have the package, please consider installing it before proceeding forward.R being a community contributed statistical package, there could be better, easier packages out there that could do the job with lesser chunk of code, though as on the day this post was written, ggplot2 seemed to be the best package.
The ggplot2 by default does not sport a one stop function for drawing the waterfall chart and so the function geom_rect, which allows us to draw rectangles on graphs, has been used to draw the same. The first objective of ours is to note the co-ordinates to draw the rectangles on the graph and pass it on to geom_rect. From our understanding, we need a rectangle for each line item (from the column Particulars) which should have the height of the amount given in the last column. For the positive values, the rectangle should grow from bottom to the top and for the negative values , the other way round.
Say, for line items as shown in the above diagram, item x should start at 0, and grow towards 100, then by line item y it should shrink by 25 to 75 and by line item z it should grow to 105. And all of these rectangles should be placed on top of each of the line item labels x, y and z. Geom_rect should also be taught to colour each line item rectangle based on its nature i.e. Income, Expense, depreciation excetra .
Drawing the Waterfall Chart in R
Opening the data-set
Tata_Motors_SPandL_Mar_17 <- read_csv("Tata Motors SPandL Mar-17.csv") assign("TMPLM17",Tata_Motors_SPandL_Mar_17)
The above command line should vary based on where and which data file you intend to open in R. It’s easier if the R-Studio, Import Dataset option is used. The second command line is used to give a different reference name inside R as the file name is relatively lengthy.
Opening the libraries
The library ggplot2 for the geom_rect command and scales for certain text formatting inside R. Scales library is optional if you do not intend to format the chart labels.
Converting the Particulars Column Character Values to Factors
TMPLM17$Part <- factor(TMPLM17$Part, levels = TMPLM17$Part) TMPLM17$Sl_No <- seq_along(TMPLM17$Part)
The x-axis coordinates to be supplied to the geom_rect function only supports numeric variables and not characters. Therefore, the same is converted to factors so that every label will be linked to the column using a vector number. The second command line is to refer every line item by a number identification. This will be used to define the x axis dimension to the rectangle that we are about to draw.
Inserting new variables – Opening Cumulative Value and Closing Cumulative Value
TMPLM17$cum_values <- cumsum(TMPLM17$M_17_INR_Cr) TMPLM17$cum_val_p <- TMPLM17$cum_values - TMPLM17$M_17_INR_Cr
These command lines insert variables cum_val_p and cum_values which represent the opening Cumulative Values and Closing Cumulative Values. These enable us to define the starting and ending y-axis coordinates, Opening for the starting y coordinate and Closing for the ending y coordinate.
Drawing the Waterfall Chart
ggplot(data = TMPLM17, aes(x = Part, fill = Nat))+geom_rect(aes(x = Part, xmin = Sl_No-0.5, xmax = Sl_No + 0.5, ymin = cum_val_p, ymax = cum_values))
Above command line asks the ggplot function to draw the waterfall chart. Data is passed to the function through the data variable, aes function has the Particulars mapped to the x axis, geom_rect function gets the x-axis coordinates through the xmin and xmax variables and the y-axis coordinates through the ymin and ymax variables. Though we got a waterfall chart, it’s hardly presentable as it lacks proper formatting. Next we shall add some extra ggplot beautification commands and make the plot more intuitive.
Formatted Waterfall Chart
For_fun <- function(str_a) gsub(" ", "\n", str_a)
ggplot(data = TMPLM17, aes(x = Part, fill = Nat))+geom_rect(aes(x = Part, xmin = Sl_No-0.5, xmax = Sl_No + 0.5, ymin = cum_val_p, ymax = cum_values))+scale_y_continuous("INR in Crores", labels = comma)+scale_x_discrete("",breaks = TMPLM17$Part, labels = For_fun)+theme(text = element_text(size = 7))+geom_text(aes(Sl_No,cum_values, label = comma(M_17_INR_Cr)), size =2, vjust = 1)
The above code chunk is just the earlier ggplot code with some extra code chunk for formatting. For any clarifications on the extra code bits, try searching the R-Studio using “?<code>” command and if it still bothers you, feel free to reach out to me.