Practical Business Python

Taking care of business, one python script at a time

Newsletter Number 3

Sent on Tue 18 December 2018

Welcome to the third issue of the newsletter. The number of subscribers continues to grow so I hope all the new and existing find this useful.

PB Python Articles

Since the last mailing, I’ve published one article on Building a Repeatable Data Analysis Process with Jupyter Notebooks. I have continued to use this format in my own personal analysis projects and find that it works pretty well. If you check out the comments, you’ll see some approaches that others have taken as well. Regardless of which approach you use, figuring out your own repeatable process and using it is important.

My article on Populating MS Word Templates with Python is pretty popular but I have seen feedback that some users struggle to handle more complex templating situations with python and Word. I recently found an alternative approach that uses Jinja template syntax for Word documents. If you’re interested, you might want to check out python docx template.

Other Python Notes

Here are a few other updates in the python ecosystem.

  • Altair 2.3.0 was released this week. It’s hard to believe that I first talked about Altair in April 2016! I continue to use it quite a bit and find that it is becoming my go-to visualization tool. I can produce just about any visualization I need by using Altair. In addition, there is an ecosystem of Altair packages popping up. For example, I recently used Altair catplot for some of my analysis and it worked pretty well.
  • While we are talking about the python visualization landscape, I should mention chartify . This is a new plotting library on top of Bokeh. I think it is interesting from two perspectives. First, it is created by Spotify so it is designed for solving some of their challenges. Second, it is an overlay on Bokeh in a similar way that Seaborn leverages matplotlib.
  • Speaking of Bokeh, there is a brand new project called Pandas-Bokeh that provides a simple way to create Bokeh-based plots with pandas. It will be interesting to see if this gains some traction.
  • I recently learned about a project called pyjanitor that seeks to replicate some of the data cleaning functions available in the R package called janitor. I have not used it yet and find some of the methods a little redundant but do think it is an interesting project and might be helpful to people that struggle with the pandas API at times.
  • The python GUI space continues to evolve. I have played around with PySimpleGui a bit and it is relatively simple to get GUI’s up and running. The default is to use Tkinter but there is ongoing work to make it compatible with Qt.
  • One of the big issues with Jupyter notebooks is using them with version control like you would a normal text file. The jupytext project looks like a possible solution. I have not used it yet but I hope to in the future.

Useful Resources

  • I have been working on a project to do fuzzy matching on addresses. I have played around with fuzzywuzzy but it does not scale well to matching 100,000+ addresses. In looking for more sophisticated approaches to fuzzy pattern matching, I found RobinL’s fuzzymatcher program. I think this is a really powerful and intuitive library for doing fuzzy lookups. If you are looking for a more scaleable approach you might want to check it out. There is one bug I ran into and submitted a pull request for so it may take some extra work to get it going but I do think it’s worth investigating if you are dealing with this type of problem.
  • I recently had the opportunity to listen to Max Humber give a talk about budgeting with python. You can see his notebooks on his github repo. He had some pretty cool tricks for working with TimeStamps and turning natural english phrases like “every week until July 10th” into TimeStamps for a pandas DataFrame. He introduced me to the recurrent library which seems really handy.

A Quick Code Tip

I recently learned about using python -m site as a quick way to see how sys.path is set up in your environment. Give it a shot in your own environment next time you get a chance.

Wrapping It Up

I hope you enjoyed the latest newsletter. Feel free to forward to those that might find this interesting. You can sign up here if you are not a subscriber.