Practical Business Python

Taking care of business, one python script at a time

Mon 26 March 2018

Overview of Pandas Data Types

Posted by Chris Moffitt in articles   

When doing data analysis, it is important to make sure you are using the correct data types; otherwise you may get unexpected results or errors. In the case of pandas, it will correctly infer data types in many cases and you can move on with your analysis without any further thought on the topic.

Despite how well pandas works, at some point in your data analysis processes, you will likely need to explicitly convert data from one type to another. This article will discuss the basic pandas data types (aka dtypes), how they map to python and numpy data types and the options for converting from one pandas type to another.

Read more...


Tue 20 February 2018

Intro to pdvega - Plotting for Pandas using Vega-Lite

Posted by Chris Moffitt in articles   

Jake VanderPlas covered this topic in his PyCon 2017 talk and the landscape has probably gotten even more confusing in the year since this talk was presented.

Jake is also one of the creators of Altair (discussed in this post) and is back with another plotting library called pdvega. This library leverages some of the concepts introduced in Altair but seeks to tackle a smaller subset of visualization problems. This article will go through a couple examples of using pdvega and compare it to the basic capabilities present in pandas today.

Read more...


Tue 02 January 2018

Interactive Visualization of Australian Wine Ratings

Posted by Chris Moffitt in articles   

Over on Kaggle, there is an interesting data set of over 130K wine reviews that have been scraped and pulled together into a single file. I thought this data set would be really useful for showing how to build an interactive visualization using Bokeh. This article will walk through how to build a Bokeh application that has good examples of many of its features. The app itself is really helpful and I had a lot of fun exploring this data set using the visuals. Additionally, this application shows the power of Bokeh and it should give you some ideas as to how you could use it in your own projects. Let’s get started by exploring the “rich, smokey flavors with a hint of oak, tea and maple” that are embedded in this data set.

Read more...


Mon 27 November 2017

Using Python’s Pathlib Module

Posted by Chris Moffitt in articles   

It is difficult to write a python script that does not have some interaction with the file system. The activity could be as simple as reading a data file into a pandas DataFrame or as complex as parsing thousands of files in a deeply nested directory structure. Python’s standard library has several helpful functions for these tasks - including the pathlib module.

The pathlib module was first included in python 3.4 and has been enhanced in each of the subsequent releases. Pathlib is an object oriented interface to the filesystem and provides a more intuitive method to interact with the filesystem in a platform agnostic and pythonic manner.

Read more...


Mon 31 July 2017

Pandas Grouper and Agg Functions Explained

Posted by Chris Moffitt in articles   

Every once in a while it is useful to take a step back and look at pandas’ functions and see if there is a new or better way to do things. I was recently working on a problem and noticed that pandas had a Grouper function that I had never used before. I looked into how it can be used and it turns out it is useful for the type of summary analysis I tend to do on a frequent basis.

In addition to functions that have been around a while, pandas continues to provide new and improved capabilities with every release. The updated agg function is another very useful and intuitive tool for summarizing data.

This article will walk through how and why you may want to use the Grouper and agg functions on your own data. Along the way, I will include a few tips and tricks on how to use them most effectively.

Read more...


Mon 03 July 2017

Introduction to Market Basket Analysis in Python

Posted by Chris Moffitt in articles   

There are many data analysis tools available to the python analyst and it can be challenging to know which ones to use in a particular situation. A useful (but somewhat overlooked) technique is called association analysis which attempts to find common patterns of items in large data sets. One specific application is often called market basket analysis. The most commonly cited example of market basket analysis is the so-called “beer and diapers” case. The basic story is that a large retailer was able to mine their transaction data and find an unexpected purchase pattern of individuals that were buying beer and baby diapers at the same time.

Read more...



Tue 25 April 2017

Effectively Using Matplotlib

Posted by Chris Moffitt in articles   

The python visualization world can be a frustrating place for a new user. There are many different options and choosing the right one is a challenge. For example, even after 2 years, this article is one of the top posts that lead people to this site. In that article, I threw some shade at matplotlib and dismissed it during the analysis. However, after using tools such as pandas, scikit-learn, seaborn and the rest of the data science stack in python - I think I was a little premature in dismissing matplotlib. To be honest, I did not quite understand it and how to use it effectively in my workflow.

Now that I have taken the time to learn some of these tools and how to use them with matplotlib, I have started to see matplotlib as an indispensable tool. This post will show how I use matplotlib and provide some recommendations for users getting started or users who have not taken the time to learn matplotlib. I do firmly believe matplotlib is an essential part of the python data science stack and hope this article will help people understand how to use it for their own visualizations.

Read more...


Tue 04 April 2017

Understanding the Transform Function in Pandas

Posted by Chris Moffitt in articles   

One of the compelling features of pandas is that it has a rich library of methods for manipulating data. However, there are times when it is not clear what the various functions do and how to use them. If you are approaching a problem from an Excel mindset, it can be difficult to translate the planned solution into the unfamiliar pandas command. One of those “unknown” functions is the transform method. Even after using pandas for a while, I have never had the chance to use this function so I recently took some time to figure out what it is and how it could be helpful for real world analysis. This article will walk through an example where transform can be used to efficiently summarize data.

Read more...


Mon 06 March 2017

Forecasting Website Traffic Using Facebook’s Prophet Library

Posted by Chris Moffitt in articles   

A common business analytics task is trying to forecast the future based on known historical data. Forecasting is a complicated topic and relies on an analyst knowing the ins and outs of the domain as well as knowledge of relatively complex mathematical theories. Because the mathematical concepts can be complex, a lot of business forecasting approaches are “solved” with a little linear regression and “intuition.” More complex models would yield better results but are too difficult to implement.

Given that background, I was very interested to see that Facebook recently open sourced a python and R library called prophet which seeks to automate the forecasting process in a more sophisticated but easily tune-able model. In this article, I’ll introduce prophet and show how to use it to predict the volume of traffic in the next year for Practical Business Python. To make this a little more interesting, I will post the prediction through the end of March so we can take a look at how accurate the forecast is.

Read more...