Practical Business Python

Taking care of business, one python script at a time

Tue 02 January 2018

Interactive Visualization of Australian Wine Ratings

Posted by Chris Moffitt in articles   

article header image


Over on Kaggle, there is an interesting data set of over 130K wine reviews that have been scraped and pulled together into a single file. I thought this data set would be really useful for showing how to build an interactive visualization using Bokeh. This article will walk through how to build a Bokeh application that has good examples of many of its features. The app itself is really helpful and I had a lot of fun exploring this data set using the visuals. Additionally, this application shows the power of Bokeh and it should give you some ideas as to how you could use it in your own projects. Let’s get started by exploring the “rich, smokey flavors with a hint of oak, tea and maple” that are embedded in this data set.

Data Overview

I will not spend much time walking through the data but if you are interested in learning more about the data, what it contains and how it could be a useful tool for further building out your skills, please check out the Kaggle page.

For this analysis, I chose to focus on only Australian wines. The decision to filter the data was somewhat arbitrary but I found that it ended up being a large enough dataset to make it interesting but not so large that performance was a problem on my middle-of-the-road laptop.

I made some minor cleanups and edits of the data which I won’t go through here but all the changes are available in this notebook.

Here is a snapshot of the data we will explore in the rest of the article:

country description designation points price province region_1 region_2 taster_name taster_twitter_handle title variety winery variety_color
77 Australia This medium-bodied Chardonnay features aromas … Made With Organic Grapes 86 18.0 South Australia South Australia NaN Joe Czerwinski @JoeCz Yalumba 2016 Made With Organic Grapes Chardonn… Chardonnay Yalumba #440154
83 Australia Pale copper in hue, this wine exudes passion f… Jester Sangiovese 86 20.0 South Australia McLaren Vale NaN Joe Czerwinski @JoeCz Mitolo 2016 Jester Sangiovese Rosé (McLaren Vale) Rosé Mitolo #450558
123 Australia The blend is roughly two-thirds Shiraz and one… Parson’s Flat 92 40.0 South Australia Padthaway NaN Joe Czerwinski @JoeCz Henry’s Drive Vignerons 2006 Parson’s Flat Shi… Shiraz-Cabernet Sauvignon Henry’s Drive Vignerons #460B5E
191 Australia From the little-known region of Padthaway, thi… The Trial of John Montford 87 30.0 South Australia Padthaway NaN Joe Czerwinski @JoeCz Henry’s Drive Vignerons 2006 The Trial of John… Cabernet Sauvignon Henry’s Drive Vignerons #471163
232 Australia Lifted cedar and pine notes interspersed with … Red Belly Black 85 12.0 South Australia South Australia NaN NaN NaN Angove’s 2006 Red Belly Black Shiraz (South Au… Shiraz Angove’s #471669

For this specific dataset, I approached the problem as an interested consumer, not as a datascientist trying to build a predictive model. Basically, I want to have a simple way to explore the data and find wines that might be interesting to purchase. As a wine consumer, I’m mostly interested in price vs. ratings (aka points). An interactive scatter plot should be a useful way to explore the data in more detail and Bokeh is well suited for this kind of application.

To get your palette ready, here’s a small tasting of the app we’ll be building:

Wine Analysis

As a pun, it’s a bit on the dry side but I think it has a strong finish.


From the Bokeh site:

Bokeh is a Python interactive visualization library that targets modern web browsers for presentation. Its goal is to provide elegant, concise construction of novel graphics in the style of D3.js, and to extend this capability with high-performance interactivity over very large or streaming datasets. Bokeh can help anyone who would like to quickly and easily create interactive plots, dashboards, and data applications.

Bokeh has two methods for creating visualizations. The first approach is to generate HTML documents that can be used standalone or embedded in a jupyter notebook. The process for creating a plot is very similar to what you would do with matplotlib or some other python visualization library. The key bonus with Bokeh is that you get basic interactivity for free.

The second method for creating visualization is to build a Bokeh app that provides more flexibility and customization options. The downside is that you do need to run a seperate application to serve the data. This works really well for individual or small group analysis. Deploying to the world at large takes a little more effort.

I based this example on an application I am developing at work to interactively explore price and volume relationships. I have found that the learning curve is a little steep with the Bokeh app approach but the results have been fantastic. The gallery examples, are another rich source for understanding Bokeh’s capabilities. By the end of this article, I hope you feel the same way I do about the possibilities of using Bokeh for building powerful, complex, interactive visualization tools.

Building the App

If you are using Anaconda, then install bokeh with conda:

conda install bokeh

For this app, I am going to use the single file approach as described here.

The final file, is stored in the github repo and I will keep that updated if people identify changes or improvements in this script. In addition, here is the processed csv file.

The first step is to import several modules we will need to build the app:

import pandas as pd
from bokeh.plotting import figure
from bokeh.layouts import layout, widgetbox
from bokeh.models import ColumnDataSource, HoverTool, BoxZoomTool, ResetTool, PanTool
from bokeh.models.widgets import Slider, Select, TextInput, Div
from bokeh.models import WheelZoomTool, SaveTool, LassoSelectTool
from import curdoc
from functools import lru_cache

The next step is to create a function to load data from the csv file and return a pandas DataFrame. I have wrapped this function with the lru_cache() decorator in order to cache the result. This is not strictly required but is useful to minimize those extra IO calls for loading the data from disk.

def load_data():
    df = pd.read_csv("Aussie_Wines_Plotting.csv", index_col=0)
    return df

In order to format the details, I am defining the ordering of the columns as well as the list of all the provinces we may want to filter by. For this example, I hard coded the list but in other situations you could dynamically build the list off the data.

# Column order for displaying the details of a specific review
col_order = ["price", "points", "variety", "province", "description"]

all_provinces = [
    "All", "South Australia", "Victoria", "Western Australia",
    "Australia Other", "New South Wales", "Tasmania"

Now that some of the prep work is out of the way, I will get all of the Bokeh widgets set up. The Select , Slider and TextInput widgets capture input from the user. The Div widget will be used to display output based on the data being selected.

desc = Div(text="All Provinces", width=800)
province = Select(title="Province", options=all_provinces, value="All")
price_max = Slider(start=0, end=900, step=5, value=200, title="Maximum Price")
title = TextInput(title="Title Contains")
details = Div(text="Selection Details:", width=800)

Here’s what the widgets look like in the final form:


The “secret sauce” for Bokeh is the ColumnDataSource. This object stores the data the rest of the script will visualize. For the initial run through of the code, I will load with all the data. In subsequent code, we can update the source with selected or filtered data.

source = ColumnDataSource(data=load_data())

Every Bokeh plot supports interactive tools. Here’s what the tools look like for this specific app:

Tool bar

The actual building of the tools is straightforward. You have the option of defining tools as a list of strings but it is not possible to customize the tools when you use this approach. In this application, it is useful to define the hover tool to show the title of the wine as well as its variety. We can use any column of data that is available to us in our DataFrame and reference it using the @.

hover = HoverTool(tooltips=[
    ("title", "@title"),
    ("variety", "@variety"),
    hover, BoxZoomTool(), LassoSelectTool(), WheelZoomTool(), PanTool(),
    ResetTool(), SaveTool()

Bokeh uses figures as the base object for creating a visualization. Once the figure is created, items can be placed on the figure. For this use case, I decided to place circles on the figure based on the price and points assigned to each wine.

p = figure(
    title="Australian Wine Analysis",
    y_axis_label="price (USD)",

Now that the basic plot is structured, we need to handle changes to the data and make sure the appropriate updates are made to the visualization. With the addition of a few functions, Bokeh does most of the heavy lifting to keep the visualization updated.

The first function is select_reviews. The basic purpose of this function is to load the full dataset, apply any filtering based on user input and return the filtered dataset as a pandas DataFrame.

In this particular example, we can filter data based on the maximum price, province and string value in the title. The function uses standard pandas operations to filter the data and get it down to a subset of data in the selected DataFrame. Finally, the function updates the description text to show what is being filtered.

def select_reviews():
    """ Use the current selections to determine which filters to apply to the
    data. Return a dataframe of the selected data
    df = load_data()

    # Determine what has been selected for each widgetd
    max_price = price_max.value
    province_val = province.value
    title_val = title.value

    # Filter by price and province
    if province_val == "All":
        selected = df[df.price <= max_price]
        selected = df[(df.province == province_val) & (df.price <= max_price)]

    # Further filter by string in title if it is provided
    if title_val != "":
        selected = selected[selected.title.str.contains(title_val, case=False) == True]

    # Example showing how to update the description
    desc.text = "Province: {} and Price < {}".format(province_val, max_price)
    return selected

The next helper function is used to update the ColumnDataSource we setup earlier. This is straightforward with the exception of specifically updating versus just assigning a new source.

def update():
    """ Get the selected data and update the data in the source
    df_active = select_reviews() = ColumnDataSource(data=df_active).data

Up until now, we have focused on updating data when the user interacts with the custom defined widgets. The other interaction we need to handle is when the user selects a group of points via the LassoSelect tool. If a set of points is selected, we need to get those details and display them below the graph. In my opinion this is a really useful feature that enables some very intuitive exploration of the data.

I will go through this function in smaller sections since there are some unique Bokeh concepts here.

Bokeh keeps track of what has been selected as a 1d or 2d array depending on the type of selection tool. We need to pull out the indices of all selected items and use that to get a subset of data.

def selection_change(attrname, old, new):
    """ Function will be called when the poly select (or other selection tool)
    is used. Determine which items are selected and show the details below
    the graph
    selected = source.selected["1d"]["indices"]

Now that we know what was selected, let’s get the latest dataset based on any filtering that the user has done. If we do not do this, the indices will not match up. Trust me, it took me a while to figure this out!

df_active = select_reviews()

Now, if data is selected, let’s get that subset of data and transform it so that it is easy to compare side by side. I used the style.render() function to make the HTML more styled and consistent with the rest of the app. As an aside, this new API in pandas allows for a lot more customization of the HTML output of a DataFrame. I’m keeping it simple in this case, but you can explore more in the pandas style docs .

if selected:
    data = df_active.iloc[selected, :]
    temp = data.set_index("title").T.reindex(index=col_order)
    details.text =
    details.text = "Selection Details"

Here is what the selection looks like.

Tool bar

Now that the widgets and other interactive components are built and the process for retrieving and filtering data is in place, they all need to be tied together.

For each control, make sure updates call the update function and include the old and new values.

controls = [province, price_max, title]

for control in controls:
    control.on_change("value", lambda attr, old, new: update())

If there is a selection, call the selection_change function.

source.on_change("selected", selection_change)

The next section controls the layout. We setup the widgetbox as well as the layout .

inputs = widgetbox(*controls, sizing_mode="fixed")
l = layout([[desc], [inputs, p], [details]], sizing_mode="fixed")

We need to do an initial update of the data, then attach this model and its layout to the current document. The last line adds a title for the browser window.

curdoc().title = "Australian Wine Analysis"

If we want to execute the app, run this from the command line:

bokeh serve

Open up the browser and go to http://localhost:5006/winepicker and explore the data.


I have created a video that walks through the interactive nature of the application. I think this brief video does a good job of showing all the interactive options available with this approach. If you have been interested in enough to read this far, it is worth your time to watch the video and see the app in action.


There are many options for visualizing data within the python ecosystem. Bokeh specializes in making visualizations that have a high degree of interactive capability out of the box as well as the ability to customize even further with some additional coding. In my experience, there is a bit of a learning curve to get these apps working but they can be very useful tools for visualizing data.

I hope this article will be a useful guide for others that are interested in building their own custom visualizations for their unique business problems. Feel free to leave a comment if this post is helpful.


29-Jan-2018: Fixed single vs double quotes for consistency. Also made sure title search was not case sensitive.