Practical Business Python

Taking care of business, one python script at a time

Mon 31 July 2017

Pandas Grouper and Agg Functions Explained

Posted by Chris Moffitt in articles   

Every once in a while it is useful to take a step back and look at pandas’ functions and see if there is a new or better way to do things. I was recently working on a problem and noticed that pandas had a Grouper function that I had never used before. I looked into how it can be used and it turns out it is useful for the type of summary analysis I tend to do on a frequent basis.

In addition to functions that have been around a while, pandas continues to provide new and improved capabilities with every release. The updated agg function is another very useful and intuitive tool for summarizing data.

This article will walk through how and why you may want to use the Grouper and agg functions on your own data. Along the way, I will include a few tips and tricks on how to use them most effectively.

Read more...


Mon 03 July 2017

Introduction to Market Basket Analysis in Python

Posted by Chris Moffitt in articles   

There are many data analysis tools available to the python analyst and it can be challenging to know which ones to use in a particular situation. A useful (but somewhat overlooked) technique is called association analysis which attempts to find common patterns of items in large data sets. One specific application is often called market basket analysis. The most commonly cited example of market basket analysis is the so-called “beer and diapers” case. The basic story is that a large retailer was able to mine their transaction data and find an unexpected purchase pattern of individuals that were buying beer and baby diapers at the same time.

Read more...



Tue 25 April 2017

Effectively Using Matplotlib

Posted by Chris Moffitt in articles   

The python visualization world can be a frustrating place for a new user. There are many different options and choosing the right one is a challenge. For example, even after 2 years, this article is one of the top posts that lead people to this site. In that article, I threw some shade at matplotlib and dismissed it during the analysis. However, after using tools such as pandas, scikit-learn, seaborn and the rest of the data science stack in python - I think I was a little premature in dismissing matplotlib. To be honest, I did not quite understand it and how to use it effectively in my workflow.

Now that I have taken the time to learn some of these tools and how to use them with matplotlib, I have started to see matplotlib as an indispensable tool. This post will show how I use matplotlib and provide some recommendations for users getting started or users who have not taken the time to learn matplotlib. I do firmly believe matplotlib is an essential part of the python data science stack and hope this article will help people understand how to use it for their own visualizations.

Read more...


Tue 04 April 2017

Understanding the Transform Function in Pandas

Posted by Chris Moffitt in articles   

One of the compelling features of pandas is that it has a rich library of methods for manipulating data. However, there are times when it is not clear what the various functions do and how to use them. If you are approaching a problem from an Excel mindset, it can be difficult to translate the planned solution into the unfamiliar pandas command. One of those “unknown” functions is the transform method. Even after using pandas for a while, I have never had the chance to use this function so I recently took some time to figure out what it is and how it could be helpful for real world analysis. This article will walk through an example where transform can be used to efficiently summarize data.

Read more...