As part of managing the PB Python newsletter, I wanted to develop a simple way to write emails once using plain text and turn them into responsive HTML emails for the newsletter. In addition, I needed to maintain a static archive page on the blog that links to the content of each newsletter. This article shows how to use python tools to transform a markdown file into a responsive HTML email suitable for a newsletter as well as a standalone page integrated into a pelican blog.
I am pleased to have another guest post from Duarte O.Carmo. He wrote series of posts in July on report generation with Papermill that were very well received. In this article, he will explore how to use Voilà and Plotly Express to convert a Jupyter notebook into a standalone interactive web site. In addition, this article will show examples of collecting data through an API endpoint, performing sentiment analysis on that data and show multiple approaches to deploying the dashboard.
This article is inspired by a tweet from Peter Baumgartner. In the tweet he mentioned the Fisher-Jenks algorithm and showed a simple example of ranking data into natural breaks using the algorithm. Since I had never heard about it before, I did some research.
After learning more about it, I realized that it is very complimentary to my previous article on Binning Data and it is intuitive and easy to use in standard pandas analysis. It is definitely an approach I would have used in the past if I had known it existed.
I suspect many people are like me and have never heard of the concept of natural breaks before but have probably done something similar on their own data. I hope this article will expose this simple and useful approach to others so that they can add it to their python toolbox.
The rest of this article will discuss what the Jenks optimization method (or Fisher-Jenks algorithm) is and how it can be used as a simple tool to cluster data using “natural breaks”.
I prefer to use miniconda for installing a lightweight python environment on Windows. I also like to create and customize Windows shortcuts for launching different conda environments in specific working directories. This is an especially useful tip for new users that are not as familiar with the command line on Windows.
After spending way too much time trying to get the shortcuts setup properly on multiple Windows machines, I spent some time automating the link creation process. This article will discuss how to use python to create custom Windows shortcuts to launch conda environments.
This article will discuss several tips and shortcuts for using
iloc to work with a
data set that has a large number of columns. Even if you have some experience with using
iloc you should
learn a couple of helpful tricks to speed up your own analysis and avoid typing lots of column names in your code.
This article is a review of O’Reilly’s Machine Learning Pocket Reference by Matt Harrison. Since Machine Learning can cover a lot of content, I was very interested to see what content a “Pocket Reference” would contain. Overall, I really enjoyed this book and think it deserves a place on many data science practitioner’s book shelves. Read on for more details about what is included in this reference and who should consider purchasing it.
The other day, I was using pandas to clean some messy Excel data that included several thousand rows of inconsistently formatted currency values. When I tried to clean it up, I realized that it was a little more complicated than I first thought. Coincidentally, a couple of days later, I followed a twitter thread which shed some light on the issue I was experiencing. This article summarizes my experience and describes how to clean up messy currency fields and convert them into a numeric value for further analysis. The concepts illustrated here can also apply to other types of pandas data cleanup tasks.
When dealing with continuous numeric data, it is often helpful to bin the data into
multiple buckets for further analysis. There are several different terms for binning
including bucketing, discrete binning, discretization or quantization. Pandas supports
these approaches using the
This article will briefly describe why you may want to bin your data and how to use the pandas
functions to convert continuous data to a set of discrete buckets. Like many pandas functions,
qcut may seem simple but there is a lot of capability packed into
those functions. Even for more experience users, I think you will learn a couple of tricks
that will be useful for your own analysis.
On September 17th, 2014, I published my first article which means that today is the 5th birthday of Practical Business Python. Thank you to all my readers and all those that have supported me through this process! It has been a great journey and I look forward to seeing what the future holds.
This 5 year anniversary gives me the opportunity to reflect on the blog and what will be coming next. I figured I would use this milestone to walk through a few of the stats and costs associated with running this blog for the past 5 years. This post will not be technical but I am hopeful that my readers as well as current and aspiring bloggers going down this path will find it helpful. Finally, please use the comments to let me know what content you would like to see in the future.
One of the most commonly used pandas functions is
read_excel. This short article shows how you
can read in all the tabs in an Excel workbook and combine them into a single pandas dataframe using
For those of you that want the TLDR, here is the command:
df = pd.concat(pd.read_excel('2018_Sales_Total.xlsx', sheet_name=None), ignore_index=True)
Read on for an explanation of when to use this and how it works.