Over on Kaggle, there is an interesting data set of over 130K wine reviews that have been scraped and pulled together into a single file. I thought this data set would be really useful for showing how to build an interactive visualization using Bokeh. This article will walk through how to build a Bokeh application that has good examples of many of its features. The app itself is really helpful and I had a lot of fun exploring this data set using the visuals. Additionally, this application shows the power of Bokeh and it should give you some ideas as to how you could use it in your own projects. Let’s get started by exploring the “rich, smokey flavors with a hint of oak, tea and maple” that are embedded in this data set.
It is difficult to write a python script that does not have some interaction with the file system. The activity could be as simple as reading a data file into a pandas DataFrame or as complex as parsing thousands of files in a deeply nested directory structure. Python’s standard library has several helpful functions for these tasks - including the pathlib module.
The pathlib module was first included in python 3.4 and has been enhanced in each of the subsequent releases. Pathlib is an object oriented interface to the filesystem and provides a more intuitive method to interact with the filesystem in a platform agnostic and pythonic manner.
Python’s visualization landscape is quite complex with many available libraries for various types of data visualization. In previous articles, I have covered several approaches for visualizing data in python. These options are great for static data but oftentimes there is a need to create interactive visualizations to more easily explore data. Trying to cobble interactive charts together by hand is possible but certainly not desirable when deployment speed is critical. That’s where Dash comes in.
Dash is an open source framework created by the plotly team that leverages Flask, plotly.js and React.js to build custom data visualization apps. This article is a high level overview of how to get started with dash to build a simple, yet powerful interactive dashboard.
Lately I have been spending time reading about various visualization techniques with the goal of learning unique ways to display complex data. One of the interesting chart ideas I have seen is the bullet graph. Naturally, I wanted to see if I could create one in python but I could not find any existing implementations. This article will walk through why a bullet graph (aka bullet chart) is useful and how to build one using python and matplotlib.
Every once in a while it is useful to take a step back and look at pandas’ functions and see if there is a new or better way to do things. I was recently working on a problem and noticed that pandas had a Grouper function that I had never used before. I looked into how it can be used and it turns out it is useful for the type of summary analysis I tend to do on a frequent basis.
In addition to functions that have been around a while, pandas continues to provide new and improved capabilities with every release. The updated agg function is another very useful and intuitive tool for summarizing data.
This article will walk through how and why you may want to use the
agg functions on your own data. Along the way, I will include a few tips
and tricks on how to use them most effectively.
There are many data analysis tools available to the python analyst and it can be challenging to know which ones to use in a particular situation. A useful (but somewhat overlooked) technique is called association analysis which attempts to find common patterns of items in large data sets. One specific application is often called market basket analysis. The most commonly cited example of market basket analysis is the so-called “beer and diapers” case. The basic story is that a large retailer was able to mine their transaction data and find an unexpected purchase pattern of individuals that were buying beer and baby diapers at the same time.
In early March, I published an article introducing prophet which is an open source library released by Facebook that is used to automate the time series forecasting process. As I promised in this article, I’m going to see how well those predictions held up to the real world after 2.5 months of traffic on this site.
The python visualization world can be a frustrating place for a new user. There are many different options and choosing the right one is a challenge. For example, even after 2 years, this article is one of the top posts that lead people to this site. In that article, I threw some shade at matplotlib and dismissed it during the analysis. However, after using tools such as pandas, scikit-learn, seaborn and the rest of the data science stack in python - I think I was a little premature in dismissing matplotlib. To be honest, I did not quite understand it and how to use it effectively in my workflow.
Now that I have taken the time to learn some of these tools and how to use them with matplotlib, I have started to see matplotlib as an indispensable tool. This post will show how I use matplotlib and provide some recommendations for users getting started or users who have not taken the time to learn matplotlib. I do firmly believe matplotlib is an essential part of the python data science stack and hope this article will help people understand how to use it for their own visualizations.
One of the compelling features of pandas is that it has a rich library of methods for manipulating
data. However, there are times when it is not clear what the various functions
do and how to use them. If you are approaching a problem from an Excel mindset,
it can be difficult to translate the planned solution into the unfamiliar pandas command.
One of those “unknown” functions is the
Even after using pandas for a while, I have never had the chance to use this function
so I recently took some time to figure out what it is and how it could be helpful
for real world analysis. This article will walk through an example where
transform can be used to efficiently summarize data.
A common business analytics task is trying to forecast the future based on known historical data. Forecasting is a complicated topic and relies on an analyst knowing the ins and outs of the domain as well as knowledge of relatively complex mathematical theories. Because the mathematical concepts can be complex, a lot of business forecasting approaches are “solved” with a little linear regression and “intuition.” More complex models would yield better results but are too difficult to implement.
Given that background, I was very interested to see that Facebook recently open sourced a python and R library called prophet which seeks to automate the forecasting process in a more sophisticated but easily tune-able model. In this article, I’ll introduce prophet and show how to use it to predict the volume of traffic in the next year for Practical Business Python. To make this a little more interesting, I will post the prediction through the end of March so we can take a look at how accurate the forecast is.