Mon 03 June 2019

Evangelizing Python for Business

On May 30th, I had the pleasure of presenting at the MinneAnalytics Data Tech Conference with @KatieKodes. Our talk was on “Evangelizing Python for Business”. Here is the summary of the talk:

Python’s simple structure has been vital to the democratization of data science. But as the field rushes forward, making splashy headlines about specialized new jobs, everyday Excel users remain unaware of the value that elementary building blocks of Python for data science can bring them at the office.

Join us for a conversation about bringing Python out of IT and into the business. We’ll share challenges and successes from writing tutorials, teaching classes, and advocating adoption among new users.

I really enjoyed the presentation and received a lot of positive feedback. As a result, I wanted to capture some of the ideas in a post so that the broader community could see it and generate some dialog on tips and techniques that have worked for you. The actual content in this blog is closely tied to our presentation but contain some additional idea and thoughts that I may want to expand on in future posts.

Mon 13 May 2019

Stylin’ with Pandas

Posted by Chris Moffitt in articles

I have been working on a side project so I have not had as much time to blog. Hopefully I will be able to share more about that project soon.

In the meantime, I wanted to write an article about styling output in pandas. The API for styling is somewhat new and has been under very active development. It contains a useful set of tools for styling the output of your pandas DataFrames and Series. In my own usage, I tend to only use a small subset of the available options but I always seem to forget the details. This article will show examples of how to format numbers in a pandas DataFrame and use some of the more advanced pandas styling visualization options to improve your ability to analyze data with pandas.

Mon 18 February 2019

Monte Carlo Simulation with Python

Posted by Chris Moffitt in articles

There are many sophisticated models people can build for solving a forecasting problem. However, they frequently stick to simple Excel models based on average historical values, intuition and some high level domain-specific heuristics. This approach may be precise enough for the problem at hand but there are alternatives that can add more information to the prediction with a reasonable amount of additional effort.

One approach that can produce a better understanding of the range of potential outcomes and help avoid the “flaw of averages” is a Monte Carlo simulation. The rest of this article will describe how to use python with pandas and numpy to build a Monte Carlo simulation to predict the range of potential values for a sales compensation budget. This approach is meant to be simple enough that it can be used for other problems you might encounter but also powerful enough to provide insights that a basic “gut-feel” model can not provide on its own.

Mon 28 January 2019

Updated: Using Pandas To Create an Excel Diff

Posted by Chris Moffitt in articles

Several years ago, I wrote an article about using pandas to creating a diff of two excel files. Ovet the years, the pandas API has changed and the diff script no longer works with the latest pandas releases. Through the magic of search engines, people are still discovering the article and are asking for help in getting it to work with more recent versions of pandas. Since pandas is closing in on a 1.0 release, I think this is a good time to get an updated version out there.

Mon 07 January 2019

Using The Pandas Category Data Type

Posted by Chris Moffitt in articles

In my previous article, I wrote about pandas data types; what they are and how to convert data to the appropriate type. This article will focus on the pandas categorical data type and some of the benefits and drawbacks of using it.

Tue 20 November 2018

Building a Repeatable Data Analysis Process with Jupyter Notebooks

Posted by Chris Moffitt in articles

Over the past couple of months, there has been an ongoing discussion about Jupyter Notebooks affectionately called the “Notebook Wars”. The genesis of the discussion is Joel Grus’ presentation I Don’t Like Notebooks and has been followed up with Tim Hopper’s response, aptly titled I Like Notebooks. There have been several follow-on posts on this topic including thoughtful analysis from Yihui Xie.

The purpose of this post is to use some of the points brought up in these discussions as a background for describing my personal best practices for the analysis I frequently perform with notebooks. In addition, this approach can be tailored for your unique situation. I think many new python users do not take the time to think through some of these items I discuss. My hope is that this article will spark some discussion and provide a framework that others can build off for making repeatable and easy to understand data analysis pipelines that fit their needs.

Mon 08 October 2018

Pandas Crosstab Explained

Posted by Chris Moffitt in articles

Pandas offers several options for grouping and summarizing data but this variety of options can be a blessing and a curse. These approaches are all powerful data analysis tools but it can be confusing to know whether to use a groupby, pivot_table or crosstab to build a summary table. Since I have previously covered pivot_tables, this article will discuss the pandas crosstab function, explain its usage and illustrate how it can be used to quickly summarize data. My goal is to have this article be a resource that you can bookmark and refer to when you need to remind yourself what you can do with the crosstab function.

Mon 06 August 2018

New Plot Types in Seaborn’s Latest Release

Posted by Chris Moffitt in articles

Seaborn is one of the go-to tools for statistical data visualization in python. It has been actively developed since 2012 and in July 2018, the author released version 0.9. This version of Seaborn has several new plotting features, API changes and documentation updates which combine to enhance an already great library. This article will walk through a few of the highlights and show how to use the new scatter and line plot functions for quickly creating very useful visualizations of data.

Mon 02 July 2018

Automating Windows Applications Using COM

Posted by Chris Moffitt in articles

Python has many options for natively creating common Microsoft Office file types including Excel, Word and PowerPoint. In some cases, however, it may be too difficult to use the pure python approach to solve a problem. Fortunately, python has the “Python for Windows Extensions” package known as pywin32 that allows us to easily access Window’s Component Object Model (COM) and control Microsoft applications via python. This article will cover some basic use cases for this type of automation and how to get up and running with some useful scripts.

Tue 29 May 2018

Book Review: Machine Learning with Python Cookbook

Posted by Chris Moffitt in articles

This article is a review of Chris Albon’s book, Machine Learning with Python Cookbook. This book is in the tradition of other O’Reilly “cookbook” series in that it contains short “recipes” for dealing with common machine learning scenarios in python. It covers the full spectrum of tasks from simple data wrangling and pre-processing to more complex machine learning model development and deep learning implementations. Since this is such a fast moving and broad topic, it is nice to get a new book that covers the latest topics and presents them in a compact but very useful format. Bottom line, I enjoyed reading this book and think it will be a useful resource to have on my python bookshelf. Read on for some more details about the book and who will benefit most from reading it.

Practical Business Python

Evangelizing Python for Business

Stylin’ with Pandas

Monte Carlo Simulation with Python

Updated: Using Pandas To Create an Excel Diff

Using The Pandas Category Data Type

Building a Repeatable Data Analysis Process with Jupyter Notebooks

Pandas Crosstab Explained

New Plot Types in Seaborn’s Latest Release

Automating Windows Applications Using COM

Book Review: Machine Learning with Python Cookbook

Subscribe to the mailing list

Social

Submit a Topic

Popular

Article Roadmap

Feeds

Disclosure