Practical Business Python

Taking care of business, one python script at a time

Tue 02 June 2020

sidetable - Create Simple Summary Tables in Pandas

Posted by Chris Moffitt in articles   

Today I am happy to announce the release of a new pandas utility library called sidetable. This library makes it easy to build a frequency table and simple summary of missing values in a DataFrame. I have found it to be a useful tool when starting data exploration on a new data set and I hope others find it useful as well.

This project is also an opportunity to illustrate how to use pandas new API to register custom DataFrame accessors. This API allows you to build custom functions for working with pandas DataFrames and Series and could be really useful for building out your own library of custom pandas accessor functions.

Read more...


Mon 04 May 2020

Exploring an Alternative to Jupyter Notebooks for Python Development

Posted by Chris Moffitt in articles   

Jupyter notebooks are an amazing tool for evaluating and exploring data. I have been using them as an integral part of my day to day analysis for several years and reach for them almost any time I need to do data analysis or exploration. Despite how much I like using python in Jupyter notebooks, I do wish for the editor capabilities you can find in VS Code. I also would like my files to work better when versioning them with git.

Recently, I have started using a solution that supports the interactivity of the Jupyter notebook and the developer friendliness of plain .py text files. Visual Studio Code enables this approach through Jupyter code cells and the Python Interactive Window. Using this combination, you can visualize and explore your data in real time with a plain python file that includes some lightweight markup. The resulting file works seamlessly with all VS Code editing features and supports clean git check ins.

The rest of this article will discuss how to use this python development workflow within VS Code and some of the primary reasons why you may or may not want to do so.

Read more...


Mon 30 March 2020

Using WSL to Build a Python Development Environment on Windows

Posted by Chris Moffitt in articles   

In 2016, Microsoft launched Windows Subsystem for Linux (WSL) which brought robust unix functionality to Windows. In May 2019, Microsoft announced the release of WSL 2 which includes an updated architecture that improved many aspects of WSL - especially file system performance. I have been following WSL for a while but now that WSL 2 is nearing general release, I decided to install it and try it out. In the few days I have been using it, I have really enjoyed the experience. The combo of using Windows 10 and a full Linux distro like Ubuntu is a really powerful development solution that works surprisingly well.

Read more...


Tue 18 February 2020

Python Tools for Record Linking and Fuzzy Matching

Posted by Chris Moffitt in articles   

Record linking and fuzzy matching are terms used to describe the process of joining two data sets together that do not have a common unique identifier. Examples include trying to join files based on people’s names or merging data that only have organization’s name and address.

This problem is a common business challenge and difficult to solve in a systematic way - especially when the data sets are large. A naive approach using Excel and vlookup statements can work but requires a lot of human intervention. Fortunately, python provides two libraries that are useful for these types of problems and can support complex matching algorithms with a relatively simple API.

Read more...


Mon 20 January 2020

Using Markdown to Create Responsive HTML Emails

Posted by Chris Moffitt in articles   

As part of managing the PB Python newsletter, I wanted to develop a simple way to write emails once using plain text and turn them into responsive HTML emails for the newsletter. In addition, I needed to maintain a static archive page on the blog that links to the content of each newsletter. This article shows how to use python tools to transform a markdown file into a responsive HTML email suitable for a newsletter as well as a standalone page integrated into a pelican blog.

Read more...


Mon 23 December 2019

Creating Interactive Dashboards from Jupyter Notebooks

Posted by Duarte O.Carmo in articles   

I am pleased to have another guest post from Duarte O.Carmo. He wrote series of posts in July on report generation with Papermill that were very well received. In this article, he will explore how to use Voilà and Plotly Express to convert a Jupyter notebook into a standalone interactive web site. In addition, this article will show examples of collecting data through an API endpoint, performing sentiment analysis on that data and show multiple approaches to deploying the dashboard.

Read more...


Mon 16 December 2019

Finding Natural Breaks in Data with the Fisher-Jenks Algorithm

Posted by Chris Moffitt in articles   

This article is inspired by a tweet from Peter Baumgartner. In the tweet he mentioned the Fisher-Jenks algorithm and showed a simple example of ranking data into natural breaks using the algorithm. Since I had never heard about it before, I did some research.

After learning more about it, I realized that it is very complimentary to my previous article on Binning Data and it is intuitive and easy to use in standard pandas analysis. It is definitely an approach I would have used in the past if I had known it existed.

I suspect many people are like me and have never heard of the concept of natural breaks before but have probably done something similar on their own data. I hope this article will expose this simple and useful approach to others so that they can add it to their python toolbox.

The rest of this article will discuss what the Jenks optimization method (or Fisher-Jenks algorithm) is and how it can be used as a simple tool to cluster data using “natural breaks”.

Read more...


Mon 02 December 2019

Building a Windows Shortcut with Python

Posted by Chris Moffitt in articles   

I prefer to use miniconda for installing a lightweight python environment on Windows. I also like to create and customize Windows shortcuts for launching different conda environments in specific working directories. This is an especially useful tip for new users that are not as familiar with the command line on Windows.

After spending way too much time trying to get the shortcuts setup properly on multiple Windows machines, I spent some time automating the link creation process. This article will discuss how to use python to create custom Windows shortcuts to launch conda environments.

Read more...


Tue 26 November 2019

Tips for Selecting Columns in a DataFrame

Posted by Chris Moffitt in articles   

This article will discuss several tips and shortcuts for using iloc to work with a data set that has a large number of columns. Even if you have some experience with using iloc you should learn a couple of helpful tricks to speed up your own analysis and avoid typing lots of column names in your code.

Read more...


Mon 11 November 2019

Book Review: Machine Learning Pocket Reference

Posted by Chris Moffitt in articles   

This article is a review of O’Reilly’s Machine Learning Pocket Reference by Matt Harrison. Since Machine Learning can cover a lot of content, I was very interested to see what content a “Pocket Reference” would contain. Overall, I really enjoyed this book and think it deserves a place on many data science practitioner’s book shelves. Read on for more details about what is included in this reference and who should consider purchasing it.

Read more...