Mon 07 May 2018

Building Bullet Graphs and Waterfall Charts with Bokeh

Introduction

In my last article, I presented a flowchart that can be useful for those trying to select the appropriate python library for a visualization task. Based on some comments from that article, I decided to use Bokeh to create waterfall charts and bullet graphs. The rest of this article shows how to use Bokeh to create these unique and useful visualizations.

Bullet Graphs

In the comments section of the last article, Bryan Van de Ven provided example code for creating a bullet graph in Bokeh. For those of you that do not know Bryan, he is a Senior Software Engineer at Anaconda and is one of the creators of Bokeh. It is safe to say he knows that library well so I figured I should listen to his comments!

I took his example and expand it a bit and included it here in order to compare to the matplotlib process. In the process of building these examples, I learned a lot more about how to use Bokeh and hope this article will show others how to use Bokeh. I’ll be honest, I do think the resulting code is simpler to understand than the matplotlib bullet graph example.

I have posted this notebook on github so feel free to download it and use it to follow along. Unfortunately the Bokeh charts do not render in github but if you wish to use this example, it should run on your system as long as the dependencies are installed.

First, let’s do the imports and enable Bokeh’s output to display in our notebook:

from bokeh.io import show, output_notebook
from bokeh.palettes import PuBu4
from bokeh.plotting import figure
from bokeh.models import Label

output_notebook()

For this example, we’ll populate the data with python lists. We could modify this to fit in a pandas dataframe but we will stick with simple python data types for this example:

data= [("John Smith", 105, 120),
       ("Jane Jones", 99, 110),
       ("Fred Flintstone", 109, 125),
       ("Barney Rubble", 135, 123),
       ("Mr T", 45, 105)]

limits = [0, 20, 60, 100, 160]
labels = ["Poor", "OK", "Good", "Excellent"]
cats = [x[0] for x in data]

The code is a pretty straightforward definition of the data. The one tricky code portion is building out a list of categories in the cats variable that will go on the y-axis.

The next step is to create the Bokeh figure and set a couple of options related to the way the x-axis and grid lines are displayed. As mentioned above, we use the cats variable to define all the categories in the y_range

p=figure(title="Sales Rep Performance", plot_height=350, plot_width=800, y_range=cats)
p.x_range.range_padding = 0
p.grid.grid_line_color = None
p.xaxis[0].ticker.num_minor_ticks = 0

The next section will create the colored range bars using bokeh’s hbar . To make this work, we need to define the left and right range of each bar along with the color . We can use python’s zip function to create the data structure we need:

zip(limits[:-1], limits[1:], PuBu4[::-1])

[(0, 20, '#f1eef6'),
 (20, 60, '#bdc9e1'),
 (60, 100, '#74a9cf'),
 (100, 160, '#0570b0')]

Here’s how to pull it all together to create the color ranges:

for left, right, color in zip(limits[:-1], limits[1:], PuBu4[::-1]):
    p.hbar(y=cats, left=left, right=right, height=0.8, color=color)

We use a similar process to add a black bar for each performance measure:

perf = [x[1] for x in data]
p.hbar(y=cats, left=0, right=perf, height=0.3, color="black")

The final marker we need to add is a segment that shows the the target value:

comp = [x[2]for x in data]
p.segment(x0=comp, y0=[(x, -0.5) for x in cats], x1=comp,
          y1=[(x, 0.5) for x in cats], color="white", line_width=2)

The final step is to add the labels for each range. We can use zip to create the label structures we need and then add each label to the layout:

for start, label in zip(limits[:-1], labels):
    p.add_layout(Label(x=start, y=0, text=label, text_font_size="10pt",
                       text_color='black', y_offset=5, x_offset=15))

I think this solution is simpler to follow than the matplotlib example. Let’s see if the same is true for the waterfall chart.

Waterfall Chart

I decided to take Bryan’s comments as an opportunity to create a waterfall chart in Bokeh and see how hard (or easy) it is to do. He recommended that the candlestick chart would be a good place to start and I did use that as the basis for this solution. All of the code is in a notebook that is available here.

Let’s start with the Bokeh and pandas imports and enabling the notebook output:

from bokeh.plotting import figure, show
from bokeh.io import output_notebook
from bokeh.models import ColumnDataSource, LabelSet
from bokeh.models.formatters import NumeralTickFormatter
import pandas as pd

output_notebook()

For this solution, I’m going to create a pandas dataframe and use Bokeh’s ColumnDataSource to make the code a little simpler. This has the added benefit of making this code easy to convert to take an Excel input instead of the manually created dataframe.

Feel free to refer to this cheatsheet if you need some help understanding how to create the dataframe as shown below:

# Create the initial dataframe
index = ['sales','returns','credit fees','rebates','late charges','shipping']
data = {'amount': [350000,-30000,-7500,-25000,95000,-7000]}
df = pd.DataFrame(data=data,index=index)

# Determine the total net value by adding the start and all additional transactions
net = df['amount'].sum()

	amount
sales	350000
returns	-30000
credit fees	-7500
rebates	-25000
late charges	95000
shipping	-7000

The final waterfall code is going to require us to define several additional attributes for each segment including:

starting position
bar color
label position
label text

By adding this to a single dataframe, we can use Bokeh’s built in capabilities to simplify the final code.

For the next step, we’ll add the running total, segment start location and the position of the label:

df['running_total'] = df['amount'].cumsum()
df['y_start'] = df['running_total'] - df['amount']

# Where do we want to place the label?
df['label_pos'] = df['running_total']

Next, we add a row at the bottom on the dataframe that contains the net value:

df_net = pd.DataFrame.from_records([(net, net, 0, net)],
                                   columns=['amount', 'running_total', 'y_start', 'label_pos'],
                                   index=["net"])
df = df.append(df_net)

For this particular waterfall, I would like to have the negative values a different color and have formatted the labels below the chart. Let’s add columns to the dataframe with the values:

df['color'] = 'grey'
df.loc[df.amount < 0, 'color'] = 'red'
df.loc[df.amount < 0, 'label_pos'] = df.label_pos - 10000
df["bar_label"] = df["amount"].map('{:,.0f}'.format)

Here’s the final dataframe containing all the data we need. It did take some manipulation of the data to get to this state but it is fairly standard pandas code and is easy to debug if something goes awry.

	amount	running_total	y_start	label_pos	color	bar_label
sales	350000	350000	0	350000	grey	350,000
returns	-30000	320000	350000	310000	red	-30,000
credit fees	-7500	312500	320000	302500	red	-7,500
rebates	-25000	287500	312500	277500	red	-25,000
late charges	95000	382500	287500	382500	grey	95,000
shipping	-7000	375500	382500	365500	red	-7,000
net	375500	375500	0	375500	grey	375,500

Creating the actual plot, is fairly standard Bokeh code since the dataframe has all the values we need:

TOOLS = "box_zoom,reset,save"
source = ColumnDataSource(df)
p = figure(tools=TOOLS, x_range=list(df.index), y_range=(0, net+40000),
           plot_width=800, title = "Sales Waterfall")

By defining the ColumnDataSource as our dataframe, Bokeh takes care of creating all segments and labels without doing any looping.

p.segment(x0='index', y0='y_start', x1="index", y1='running_total',
          source=source, color="color", line_width=55)

We will do some minor formatting to add labels and format the y-axis nicely:

p.grid.grid_line_alpha=0.3
p.yaxis[0].formatter = NumeralTickFormatter(format="($ 0 a)")
p.xaxis.axis_label = "Transactions"

The final step is to add all the labels onto the bars using the LabelSet :

labels = LabelSet(x='index', y='label_pos', text='bar_label',
                  text_font_size="8pt", level='glyph',
                  x_offset=-20, y_offset=0, source=source)
p.add_layout(labels)

Here’s the final chart:

Once again, I think the final solution is simpler than the matplotlib code and the resulting output looks pleasing. You also have the added bonus that the charts are interactive and could be enhanced even more by using the Bokeh server (see my Australian Wine Ratings article for an example). The code should also be straightforward to modify for your specific datasets.

Summary

I appreciate that Bryan took the time to offer the suggestion to build these plots in Bokeh. This exercise highlighted for me that Bokeh is very capable of building custom charts that I would normally create with matplotlib. I will continue to evaluate options and post updates here as I learn more. Feel free to comment below if you find this useful.

Practical Business Python

Building Bullet Graphs and Waterfall Charts with Bokeh

Introduction

Bullet Graphs

Waterfall Chart

Summary

Comments

Subscribe to the mailing list

Social

Submit a Topic

Popular

Article Roadmap

Feeds

Disclosure