Practical Business Python

Taking care of business, one python script at a time

Mon 14 October 2019

Binning Data with Pandas qcut and cut

Posted by Chris Moffitt in articles   

When dealing with continuous numeric data, it is often helpful to bin the data into multiple buckets for further analysis. There are several different terms for binning including bucketing, discrete binning, discretization or quantization. Pandas supports these approaches using the cut and qcut functions. This article will briefly describe why you may want to bin your data and how to use the pandas functions to convert continuous data to a set of discrete buckets. Like many pandas functions, cut and qcut may seem simple but there is a lot of capability packed into those functions. Even for more experience users, I think you will learn a couple of tricks that will be useful for your own analysis.

Read more...


Tue 17 September 2019

Happy Birthday Practical Business Python!

Posted by Chris Moffitt in articles   

On September 17th, 2014, I published my first article which means that today is the 5th birthday of Practical Business Python. Thank you to all my readers and all those that have supported me through this process! It has been a great journey and I look forward to seeing what the future holds.

This 5 year anniversary gives me the opportunity to reflect on the blog and what will be coming next. I figured I would use this milestone to walk through a few of the stats and costs associated with running this blog for the past 5 years. This post will not be technical but I am hopeful that my readers as well as current and aspiring bloggers going down this path will find it helpful. Finally, please use the comments to let me know what content you would like to see in the future.

Read more...


Mon 26 August 2019

Combine Multiple Excel Worksheets Into a Single Pandas Dataframe

Posted by Chris Moffitt in articles   

One of the most commonly used pandas functions is read_excel. This short article shows how you can read in all the tabs in an Excel workbook and combine them into a single pandas dataframe using one command.

For those of you that want the TLDR, here is the command:

df = pd.concat(pd.read_excel('2018_Sales_Total.xlsx', sheet_name=None), ignore_index=True)

Read on for an explanation of when to use this and how it works.

Read more...


Mon 08 July 2019

Build a Celebrity Look-Alike Detector with Azure’s Face Detect and Python

Posted by Chris Moffitt in articles   

This article describes how to to use Microsoft Azure’s Cognitive Services Face API and python to identify, count and classify people in a picture. In addition, it will show how to use the service to compare two face images and tell if they are the same person. We will try it out with several celebrity look-alikes to see if the algorithm can tell the difference between two similar Hollywood actors. By the end of the article, you should be able to use these examples to further explore Azure’s Cognitive Services with python and incorporate them in your own projects.

Read more...


Mon 03 June 2019

Evangelizing Python for Business

Posted by Chris Moffitt in articles   

On May 30th, I had the pleasure of presenting at the MinneAnalytics Data Tech Conference with @KatieKodes. Our talk was on “Evangelizing Python for Business”. Here is the summary of the talk:

Python’s simple structure has been vital to the democratization of data science. But as the field rushes forward, making splashy headlines about specialized new jobs, everyday Excel users remain unaware of the value that elementary building blocks of Python for data science can bring them at the office.

Join us for a conversation about bringing Python out of IT and into the business. We’ll share challenges and successes from writing tutorials, teaching classes, and advocating adoption among new users.

I really enjoyed the presentation and received a lot of positive feedback. As a result, I wanted to capture some of the ideas in a post so that the broader community could see it and generate some dialog on tips and techniques that have worked for you. The actual content in this blog is closely tied to our presentation but contain some additional idea and thoughts that I may want to expand on in future posts.

Read more...


Mon 13 May 2019

Stylin’ with Pandas

Posted by Chris Moffitt in articles   

I have been working on a side project so I have not had as much time to blog. Hopefully I will be able to share more about that project soon.

In the meantime, I wanted to write an article about styling output in pandas. The API for styling is somewhat new and has been under very active development. It contains a useful set of tools for styling the output of your pandas DataFrames and Series. In my own usage, I tend to only use a small subset of the available options but I always seem to forget the details. This article will show examples of how to format numbers in a pandas DataFrame and use some of the more advanced pandas styling visualization options to improve your ability to analyze data with pandas.

Read more...


Mon 18 February 2019

Monte Carlo Simulation with Python

Posted by Chris Moffitt in articles   

There are many sophisticated models people can build for solving a forecasting problem. However, they frequently stick to simple Excel models based on average historical values, intuition and some high level domain-specific heuristics. This approach may be precise enough for the problem at hand but there are alternatives that can add more information to the prediction with a reasonable amount of additional effort.

One approach that can produce a better understanding of the range of potential outcomes and help avoid the “flaw of averages” is a Monte Carlo simulation. The rest of this article will describe how to use python with pandas and numpy to build a Monte Carlo simulation to predict the range of potential values for a sales compensation budget. This approach is meant to be simple enough that it can be used for other problems you might encounter but also powerful enough to provide insights that a basic “gut-feel” model can not provide on its own.

Read more...


Mon 28 January 2019

Updated: Using Pandas To Create an Excel Diff

Posted by Chris Moffitt in articles   

Several years ago, I wrote an article about using pandas to creating a diff of two excel files. Ovet the years, the pandas API has changed and the diff script no longer works with the latest pandas releases. Through the magic of search engines, people are still discovering the article and are asking for help in getting it to work with more recent versions of pandas. Since pandas is closing in on a 1.0 release, I think this is a good time to get an updated version out there.

Read more...



Tue 20 November 2018

Building a Repeatable Data Analysis Process with Jupyter Notebooks

Posted by Chris Moffitt in articles   

Over the past couple of months, there has been an ongoing discussion about Jupyter Notebooks affectionately called the “Notebook Wars”. The genesis of the discussion is Joel Grus’ presentation I Don’t Like Notebooks and has been followed up with Tim Hopper’s response, aptly titled I Like Notebooks. There have been several follow-on posts on this topic including thoughtful analysis from Yihui Xie.

The purpose of this post is to use some of the points brought up in these discussions as a background for describing my personal best practices for the analysis I frequently perform with notebooks. In addition, this approach can be tailored for your unique situation. I think many new python users do not take the time to think through some of these items I discuss. My hope is that this article will spark some discussion and provide a framework that others can build off for making repeatable and easy to understand data analysis pipelines that fit their needs.

Read more...