Several years ago, I developed a very simple program called barnum to generate fake data that could be used to test applications. Over the years, I had forgotten about it. With the recent closing of Google code, I decided to take the opportunity to move the code to github and see if it might be useful to people.
Pandas is excellent at manipulating large amounts of data and summarizing it in multiple text and visual representations. Without much effort, pandas supports output to CSV, Excel, HTML, json and more. Where things get more difficult is if you want to combine multiple pieces of data into one document. For example, if you want to put two DataFrames on one Excel sheet, you need to use the Excel libraries to manually construct your output. It is certainly possible but not simple. This article will describe one method to combine multiple pieces of information into an HTML template and convert it to a standalone PDF document using Jinja templates and WeasyPrint.
The previous pivot table article described how to use the pandas
pivot_table function to
combine and present data in an easy to view manner. This concept is probably
familiar to anyone that has used pivot tables in Excel. However, pandas
has the capability to easily take a cross section of the data and manipulate it.
This cross section capability makes a pandas pivot table really useful for generating custom reports.
This article will give a short example of how to manipulate the data in a pivot table to
create a custom Excel report with a subset of pivot table data.
In the python world, there are multiple options for visualizing your data. Because of this variety, it can be really challenging to figure out which one to use when. This article contains a sample of some of the more popular ones and illustrates how to use them to create a simple bar chart. I will create examples of plotting data with: Pandas, Seaborn, ggplot, Bokeh, pygal and Plotly.
More and more information from local, state and federal governments is being placed on the web. However, a lot of the data is not presented in a way that is easy to download and manipulate. I think it is an important civic duty for us all to be aware of how government money is spent. Having the data in a more accessible format is a first step in that process.
In this article, I’ll use BeautifulSoup to scrape some data from the Minnesota 2014 Capital Budget. Then I’ll load the data into a pandas DataFrame and create a simple plot showing where the money is going.
Most people likely have experience with pivot tables in Excel.
Pandas provides a similar function called (appropriately enough)
While it is exceedingly useful, I frequently find myself struggling to remember how to use the syntax
to format the output for my needs. This article will focus on explaining the pandas
pivot_table function and how to use it for your data analysis.
A very common tasks for python and pandas is to automate the process of aggregating data from multiple files and spreadsheets.
This article will walk through the basic flow required to parse multiple Excel files, combine the data, clean it up and analyze it. The combination of python + pandas can be extremely powerful for these activities and can be a very useful alternative to the manual processes or painful VBA scripts frequently used in business settings today.
I have been very excited by the response to the first post in this series. Thank you to all for the positive feedback. I want to keep the series going by highlighting some other tasks that you commonly execute in Excel and show how you can perform similar functions in pandas.
In the first article, I focused on common math tasks in Excel and their pandas counterparts. In this article, I’ll focus on some common selection and filtering tasks and illustrate how to do the same thing in pandas.
The purpose of this article is to show some common Excel tasks and how you would do them in pandas.
Python, pandas and matplot can be used to create a waterfall chart.