Practical Business Python

Taking care of business, one python script at a time

Mon 15 November 2021

16 Reasons to Use VS Code for Developing Jupyter Notebooks

Posted by Chris Moffitt in articles   

Visual Studio Code is one of the most popular text editors with a track record of continual improvements. One area where VS Code has been recently innovating is its Jupyter Notebook support. The early releases of VS Code sought to replicate existing Jupyter Notebook features in VS Code. Recent VS Code releases have continued to develop notebook features that provide an experience that in many cases is better than the traditional Jupyter Notebook experience.

I am a big fan of using Jupyter Notebooks for python analysis - even though there are limitations. For the type of adhoc analysis I do, the notebook combination of code and visualizations is superior to working with ad hoc Excel files. That being said, there are times when I wish I had a more full-featured editor for my notebook code.

In this article I will cover 16 reasons why you should consider using VS Code as your editor of choice when working with python in Jupyter Notebooks. I am not including them in any particular order but think number 11 is one of my favorites.

Read more...


Tue 16 February 2021

Efficiently Cleaning Text with Pandas

Posted by Chris Moffitt in articles   

It’s no secret that data cleaning is a large portion of the data analysis process. When using pandas, there are multiple techniques for cleaning text fields to prepare for further analysis. As data sets grow large, it is important to find efficient methods that perform in a reasonable time and are maintainable since text cleaning is a process that evolves over time.

This article will show examples of cleaning text fields in a large data file and illustrates tips for how to efficiently clean unstructured text fields.

Read more...


Mon 18 January 2021

Case Study: Automating Excel File Creation and Distribution with Pandas and Outlook

Posted by Chris Moffitt in articles   

I enjoy hearing from readers that have used concepts from this blog to solve their own problems. It always amazes me when I see examples where only a few lines of python code can solve a real business problem and save organizations a lot of time and money. I am also impressed when people figure out how to do this with no formal training - just with some hard work and willingness to persevere through the learning curve.

Read more...


Mon 11 January 2021

Pandas DataFrame Visualization Tools

Posted by Chris Moffitt in articles   

I have talked quite a bit about how pandas is a great alternative to Excel for many tasks. One of Excel’s benefits is that it offers an intuitive and powerful graphical interface for viewing your data. In contrast, pandas + a Jupyter notebook offers a lot of programmatic power but limited abilities to graphically display and manipulate a DataFrame view.

There are several tools in the Python ecosystem that are designed to fill this gap. They range in complexity from simple JavaScript libraries to complex, full-featured data analysis engines. The one common denominator is that they all provide a way to view and selectively filter your data in a graphical format. From this point of commonality they diverge quite a bit in design and functionality.

This article will review several of these options in order to give you an idea of the landscape and evaluate which ones might be useful for your analysis process.

Read more...


Mon 09 November 2020

Comprehensive Guide to Grouping and Aggregating with Pandas

Posted by Chris Moffitt in articles   

One of the most basic analysis functions is grouping and aggregating data. In some cases, this level of analysis may be sufficient to answer business questions. In other instances, this activity might be the first step in a more complex data science analysis. In pandas, the groupby function can be combined with one or more aggregation functions to quickly and easily summarize data. This concept is deceptively simple and most new pandas users will understand this concept. However, they might be surprised at how useful complex aggregation functions can be for supporting sophisticated analysis.

This article will quickly summarize the basic pandas aggregation functions and show examples of more complex custom aggregations. Whether you are a new or more experienced pandas user, I think you will learn a few things from this article.

Read more...


Mon 19 October 2020

Reading Poorly Structured Excel Files with Pandas

Posted by Chris Moffitt in articles   

With pandas it is easy to read Excel files and convert the data into a DataFrame. Unfortunately Excel files in the real world are often poorly constructed. In those cases where the data is scattered across the worksheet, you may need to customize the way you read the data. This article will discuss how to use pandas and openpyxl to read these types of Excel files and cleanly convert the data to a DataFrame suitable for further analysis.

Read more...


Mon 12 October 2020

Case Study: Processing Historical Weather Pattern Data

Posted by Chris Moffitt in articles   

The main purpose of this blog is to show people how to use Python to solve real world problems. Over the years, I have been fortunate enough to hear from readers about how they have used tips and tricks from this site to solve their own problems. In this post, I am extremely delighted to present a real world case study. I hope it will give you some ideas about how you can apply these concepts to your own problems.

This example comes from Michael Biermann from Germany. He had the challenging task of trying to gather detailed historical weather data in order to do analysis on the relationship between air temperature and power consumption. This article will show how he used a pipeline of Python programs to automate the process of collecting, cleaning and processing gigabytes of weather data in order to perform his analysis.

Read more...



Mon 14 September 2020

Reading HTML tables with Pandas

Posted by Chris Moffitt in articles   

The pandas read_html() function is a quick and convenient way to turn an HTML table into a pandas DataFrame. This function can be useful for quickly incorporating tables from various websites without figuring out how to scrape the site’s HTML. However, there can be some challenges in cleaning and formatting the data before analyzing it. In this article, I will discuss how to use pandas read_html() to read and clean several Wikipedia HTML tables so that you can use them for further numeric analysis.

Read more...


Mon 17 August 2020

Taking Another Look at Plotly

Posted by Chris Moffitt in articles   

I’ve written quite a bit about visualization in python - partially because the landscape is always evolving. Plotly stands out as one of the tools that has undergone a significant amount of change since my first post in 2015. If you have not looked at using Plotly for python data visualization lately, you might want to take it for a spin. This article will discuss some of the most recent changes with Plotly, what the benefits are and why Plotly is worth considering for your data visualization needs.

Read more...