Mon 28 August 2017

Building a Bullet Graph in Python

Introduction

Lately I have been spending time reading about various visualization techniques with the goal of learning unique ways to display complex data. One of the interesting chart ideas I have seen is the bullet graph. Naturally, I wanted to see if I could create one in python but I could not find any existing implementations. This article will walk through why a bullet graph (aka bullet chart) is useful and how to build one using python and matplotlib.

Visualization Resources

Over the past few weeks, I have been reading two very good books about data visualization. The first is Cole Nussbaumer Knaflic’s book Storytelling with Data and the second is The Big Book of Dashboards by Steve Wexler, Jeffrey Shaffer and Andy Gotgreave. I found both of these books very enjoyable to read and picked up a lot of useful ideas for developing my own visualizations. This topic is extremely fascinating to me and I think these are nice resources to have in your library.

Storytelling with Data is a guide to presenting data in an effective manner and covers several topics related to choosing effective visuals, telling compelling stories and thinking like a designer. This book does not specifically describe the bullet graph but does introduce some of the concepts and ideas as to why this graph is effective. Because I enjoyed this book so much, I checked out the Storytelling with Data Website which recommends the Big Book of Dashboards book; naturally I had to add it to my library.

The Big Book of Dashboard is an extremely valuable resource for anyone that finds themselves trying to build a dashboard for displaying complex information. In Wexler, Shaffer and Cotgreave’s book, the authors go through an in-depth analysis of 28 different dashboards and explain why they were developed, how they are used and ideas to improve them. The book is very visually appealing and densely packed with great ideas. It is a resource that can be read straight through or quickly browsed through for inspiration.

I have really enjoyed each of these books. I am convinced that there would be a lot better data visualizations in the world if all the Excel and Powerpoint jockeys had both of these books on their desks!

What is a bullet graph?

The Big Book of Dashboards introduced me to the concept of a bullet graph (aka bullet chart) and I found the concept very interesting. I immediately thought of several cases where I could use it.

So, what is a bullet graph? From the book:

“The Bullet Graph encodes data using length/height, position, and color to show actual compared to target and performance bands.”

The example from wikipedia is fairly easy to understand:

Stephen Few developed the bullet graph to overcome some of the challenges with traditional gauges and meters. The bullet graph is describe by Wikipedia:

The bullet graph features a single, primary measure (for example, current year-to-date revenue), compares that measure to one or more other measures to enrich its meaning (for example, compared to a target), and displays it in the context of qualitative ranges of performance, such as poor, satisfactory, and good. The qualitative ranges are displayed as varying intensities of a single hue to make them discernible by those who are color blind and to restrict the use of colors on the dashboard to a minimum.

Both of these books are tool agnostic so there is not any significant discussion related to how to create these visualizations. I could find examples in Excel but I wanted to see if I could create one in python. I figured if I had existing code that worked, I would be more likely to use it when the time was right. I also like the idea of making this easy to do in python instead of struggling with Excel.

I did some searching but could not find any python examples so I set out to create a reuseable function to build these charts using base matplotlib functionality. I am including the code here and on github in the hope it is useful to others. Feel free to send me pull requests if you have ideas on how to improve it.

Building the chart

The idea for the bullet chart is that we can use a stacked bar chart to represent the various ranges and another smaller bar chart to represent the value. Finally, a vertical line marks the target. Sounds simple enough, right?

Since this is a somewhat complicated layer of components, I think the simplest way to construct this is using matplotlib. In the sections below, I will walk through the basic concepts, then present the final code section which is a little more scalable for multiple charts. I am hoping the community will chime in with better ways to simplify the code or make it more generically useful.

Start the Process

I recommend that you run this code in your jupyter notebook environment. You can access an example notebook here.

To get started, import all the modules we need:

import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.ticker import FuncFormatter

%matplotlib inline

Astute readers may be wondering why we are including seaborn in the imports. Seaborn has some really useful tools for managing color palettes so I think it is easier to leverage this capability than trying to replicate it in some other manner.

The main reason we need to generate a palette is that we will most likely want to generate a visually appealing color scheme for the various qualitative ranges. Instead of trying to code values by hand, let’s use seaborn to do it.

In this example, we can use the palplot convenience function to display a palette of 5 shades of green:

sns.palplot(sns.light_palette("green", 5))

Making 8 different shades of purple in reverse order is as easy as:

sns.palplot(sns.light_palette("purple",8, reverse=True))

This functionality makes it convenient to create a consistent color scale for as many categories as you need.

Now that we now how to set the palette, let’s try to create a simple bullet graph using the principles laid out in the Effectively Using Matplotlib article.

First, define the values we want to plot:

limits = [80, 100, 150]
data_to_plot = ("Example 1", 105, 120)

This will be used to create 3 ranges: 0-80, 81-100, 101-150 and an “Example” line with a value of 105 and target line of 120. Next, build out a blues color palette:

palette = sns.color_palette("Blues_r", len(limits))

The first step is to build the stacked bar chart of the ranges:

fig, ax = plt.subplots()
ax.set_aspect('equal')
ax.set_yticks([1])
ax.set_yticklabels([data_to_plot[0]])

prev_limit = 0
for idx, lim in enumerate(limits):
    ax.barh([1], lim-prev_limit, left=prev_limit, height=15, color=palette[idx])
    prev_limit = lim

Which yields a nice bar chart:

Then we can add a smaller bar chart representing the value of 105:

# Draw the value we're measuring
ax.barh([1], data_to_plot[1], color='black', height=5)

Closer….

The final step is to add the target marker using axvline :

ax.axvline(data_to_plot[2], color="gray", ymin=0.10, ymax=0.9)

This actually works pretty well but is not very scalable. Ideally we should be able to show multiple bullet graphs on the same scale. Also, this code makes some bad assumptions that do not scale well as the values in the range change.

The Final Code

After much trial and error and playing around with matplotlib, I developed a function that is more generally useful:

def bulletgraph(data=None, limits=None, labels=None, axis_label=None, title=None,
                size=(5, 3), palette=None, formatter=None, target_color="gray",
                bar_color="black", label_color="gray"):
    """ Build out a bullet graph image
        Args:
            data = List of labels, measures and targets
            limits = list of range valules
            labels = list of descriptions of the limit ranges
            axis_label = string describing x axis
            title = string title of plot
            size = tuple for plot size
            palette = a seaborn palette
            formatter = matplotlib formatter object for x axis
            target_color = color string for the target line
            bar_color = color string for the small bar
            label_color = color string for the limit label text
        Returns:
            a matplotlib figure
    """
    # Determine the max value for adjusting the bar height
    # Dividing by 10 seems to work pretty well
    h = limits[-1] / 10

    # Use the green palette as a sensible default
    if palette is None:
        palette = sns.light_palette("green", len(limits), reverse=False)

    # Must be able to handle one or many data sets via multiple subplots
    if len(data) == 1:
        fig, ax = plt.subplots(figsize=size, sharex=True)
    else:
        fig, axarr = plt.subplots(len(data), figsize=size, sharex=True)

    # Add each bullet graph bar to a subplot
    for idx, item in enumerate(data):

        # Get the axis from the array of axes returned when the plot is created
        if len(data) > 1:
            ax = axarr[idx]

        # Formatting to get rid of extra marking clutter
        ax.set_aspect('equal')
        ax.set_yticklabels([item[0]])
        ax.set_yticks([1])
        ax.spines['bottom'].set_visible(False)
        ax.spines['top'].set_visible(False)
        ax.spines['right'].set_visible(False)
        ax.spines['left'].set_visible(False)

        prev_limit = 0
        for idx2, lim in enumerate(limits):
            # Draw the bar
            ax.barh([1], lim - prev_limit, left=prev_limit, height=h,
                    color=palette[idx2])
            prev_limit = lim
        rects = ax.patches
        # The last item in the list is the value we're measuring
        # Draw the value we're measuring
        ax.barh([1], item[1], height=(h / 3), color=bar_color)

        # Need the ymin and max in order to make sure the target marker
        # fits
        ymin, ymax = ax.get_ylim()
        ax.vlines(
            item[2], ymin * .9, ymax * .9, linewidth=1.5, color=target_color)

    # Now make some labels
    if labels is not None:
        for rect, label in zip(rects, labels):
            height = rect.get_height()
            ax.text(
                rect.get_x() + rect.get_width() / 2,
                -height * .4,
                label,
                ha='center',
                va='bottom',
                color=label_color)
    if formatter:
        ax.xaxis.set_major_formatter(formatter)
    if axis_label:
        ax.set_xlabel(axis_label)
    if title:
        fig.suptitle(title, fontsize=14)
    fig.subplots_adjust(hspace=0)

I am not going to go through the code in detail but the basic idea is to create a subplot for each chart and stack them on top of each other. I remove all the spines so that it is relatively clean and simple.

Here is how to use the function to display a “Sales Rep Performance” bullet chart:

data_to_plot2 = [("John Smith", 105, 120),
                 ("Jane Jones", 99, 110),
                 ("Fred Flintstone", 109, 125),
                 ("Barney Rubble", 135, 123),
                 ("Mr T", 45, 105)]

bulletgraph(data_to_plot2, limits=[20, 60, 100, 160],
            labels=["Poor", "OK", "Good", "Excellent"], size=(8,5),
            axis_label="Performance Measure", label_color="black",
            bar_color="#252525", target_color='#f7f7f7',
            title="Sales Rep Performance")

I think this is a nice way to compare results across multiple individuals and have a good sense for how they compare to each other. It also shows how values compare to the other quantitative standards we have set. It is illustrative of how much information you can quickly glean from this chart and that trying to do this with other chart types would probably not be as effective.

One other nice thing we can easily do is format the x axis to more consistently display information. In the next case, we can measure marketing budget performance for a hypothetical company. I also chose to keep this in shades of gray and slightly changed the size variable in order to make it look more consistent.

def money(x, pos):
    'The two args are the value and tick position'
    return "${:,.0f}".format(x)

Then create a new set of data to plot:

money_fmt = FuncFormatter(money)
data_to_plot3 = [("HR", 50000, 60000),
                 ("Marketing", 75000, 65000),
                 ("Sales", 125000, 80000),
                 ("R&D", 195000, 115000)]
palette = sns.light_palette("grey", 3, reverse=False)
bulletgraph(data_to_plot3, limits=[50000, 125000, 200000],
            labels=["Below", "On Target", "Above"], size=(10,5),
            axis_label="Annual Budget", label_color="black",
            bar_color="#252525", target_color='#f7f7f7', palette=palette,
            title="Marketing Channel Budget Performance",
            formatter=money_fmt)

Summary

The proliferation of data and data analysis tools has made the topic of visualization very important and is a critical skill for anyone that does any level of data analysis. The old world of Excel pie charts and 3D graphs is not going to cut it going forward. Fortunately there are many resources to help build that skill. The Big Book of Dashboards and Storytelling with Data are two useful resources that are worth adding to your library if you do any level of data visualization.

The Big Book of Dashboards introduced me to the bullet graph which is a useful format for displaying actual results vs various targets and ranges. Unfortunately there was not an existing python implementation I coudl find. The fairly compact function described in this article is a good starting point and should be a helpful function to use when creating your own bullet graphs.

Feel free to send github pull requests if you have ideas to make this code more useful.

Updates

7-May-2018: An example via Bokeh is now available in this post.

Practical Business Python

Building a Bullet Graph in Python

Introduction

Visualization Resources

What is a bullet graph?

Building the chart

Start the Process

The Final Code

Summary

Updates

Comments

Subscribe to the mailing list

Social

Submit a Topic

Popular

Article Roadmap

Feeds

Disclosure