Several years ago, I wrote an article about using pandas to creating a diff of two excel files. Ovet the years, the pandas API has changed and the diff script no longer works with the latest pandas releases. Through the magic of search engines, people are still discovering the article and are asking for help in getting it to work with more recent versions of pandas. Since pandas is closing in on a 1.0 release, I think this is a good time to get an updated version out there.
Python has many options for natively creating common Microsoft Office file types including Excel, Word and PowerPoint. In some cases, however, it may be too difficult to use the pure python approach to solve a problem. Fortunately, python has the “Python for Windows Extensions” package known as pywin32 that allows us to easily access Window’s Component Object Model (COM) and control Microsoft applications via python. This article will cover some basic use cases for this type of automation and how to get up and running with some useful scripts.
Whenever I am doing analysis with pandas my first goal is to get data into
a panda’s DataFrame using one of the many available options. For the vast
majority of instances, I use
However, there are instances when I just have a few lines of data or some calculations
that I want to include in my analysis. In these cases it is helpful to
know how to create DataFrames from standard python data structures such as lists or
dictionaries. The basic process is not difficult but because there are several different
options it is helpful to understand how each works. I can never remember whether
I should use
from_items or the
DataFrame constructor. Normally, through some trial and error,
I figure it out. Since it is still confusing to me, I thought I would walk through
several examples below to clarify the different approaches. At the end of the article,
I briefly show how this can be useful when generating Excel reports.
I have heard from various people that my previous articles on common Excel tasks in pandas were useful in helping new pandas users translate Excel processes into equivalent pandas code. This article will continue that tradition by illustrating various pandas indexing examples using Excel’s Filter function as a model for understanding the process.
I have written several times about the usefulness of pandas as a data manipulation/wrangling tool and how it can be used to efficiently move data to and from Excel. There are cases, however, where you need an interactive environment for data analysis and trying to pull that together in pure python, in a user-friendly manner would be difficult. This article will discuss how to use xlwings to tie Excel, Python and pandas together to build a data analysis tool that pulls information from an external database, manipulates it and presents it to the user in a familiar spreadsheet format.