Practical Business Python

Taking care of business, one python script at a time

Mon 02 July 2018

Automating Windows Applications Using COM

Posted by Chris Moffitt in articles   

article header image

Introduction

Python has many options for natively creating common Microsoft Office file types including Excel, Word and PowerPoint. In some cases, however, it may be too difficult to use the pure python approach to solve a problem. Fortunately, python has the “Python for Windows Extensions” package known as pywin32 that allows us to easily access Window’s Component Object Model (COM) and control Microsoft applications via python. This article will cover some basic use cases for this type of automation and how to get up and running with some useful scripts.

What is COM?

From the Microsoft Website, the Component Object Model (COM) is:

a Platform-independent, distributed, object-oriented system for creating binary software components that can interact. COM is the foundation technology for Microsoft’s OLE (compound documents) and ActiveX (Internet-enabled components) technologies. COM objects can be created with a variety of programming languages.

This technology allows us to control Windows applications from another program. Many of the readers of this blog have probably seen or used VBA for some level of automation of an Excel task. COM is the foundational technology that supports VBA.

pywin32

The pywin32 package has been around for a very long time. In fact, the book that covers this topic was published in 2000 by Mark Hammond and Andy Robinson. Despite being 18 years old (which make me feel really old :), the underlying technology and concepts still work today. Pywin32 is basically a very thin wrapper of python that allows us to interact with COM objects and automate Windows applications with python. The power of this approach is that you can pretty much do anything that a Microsoft Application can do through python. The downside is that you have to run this on a Windows system with Microsoft Office installed. Before we go through some examples, make sure you have pywin32 installed on your system using pip or conda

One other recommendation I would make is that you keep a link to Tim Golden’s page handy. This resource has many more details on how to use python on Windows for automation and other administration tasks.

Getting Started

All of these applications start with similar imports and process for activating an application. Here is a very short example of opening up Excel:

import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')

excel.Visible = True
_ = input("Press ENTER to quit:")

excel.Application.Quit()

Once you run this from the command line, you should see Excel open up. When you press ENTER, the application will close. There are a few key concepts to go through before we actually make this a more useful application.

The first step is to import the win32 client. I’ve used the convention of importing it as win32 to make the actual dispatch code a little shorter.

The magic of this code is using EnsureDispatch to launch Excel. In this example, I use gencache.EnsureDispatch to create a static proxy. I recommend reading this article if you want to know more details about static vs. dynamic proxies. I have had good luck using this approach for the types of examples included in this article but will be honest - I have not widely experimented with the various dispatch approaches.

Now that the excel object is launched, we need to explicitly make it visible by setting excel.Visible = True

The win32 code is pretty smart and will close down excel once the program is done running. This means that if we just leave the code to run on its own, you probably won’t see Excel. I include the dummy prompt to keep Excel visible on the screen until the user presses ENTER.

I include the final line of excel.Application.Quit() as a bit of a belt and suspenders approach. Strictly speaking win32 should close out Excel when the program is done but I decided to include excel.Application.Quit() to show how to force the application to close.

This is the most basic approach to using COM. We can extend this in a number of more useful ways. The rest of this article will go through some examples that might be useful for your own needs.

Open a File in Excel

In my day-to-day work, I frequently use pandas to analyze and manipulate data, then output the results in Excel. The next step in the process is to open up the Excel and review the results. In this example, we can automate the file opening process which can make it simpler than trying to navigate to the right directory and open a file.

Here’s the full example:

import win32com.client as win32
import pandas as pd
from pathlib import Path

# Read in the remote data file
df = pd.read_csv("https://github.com/chris1610/pbpython/blob/master/data/sample-sales-tax.csv?raw=True")

# Define the full path for the output file
out_file = Path.cwd() / "tax_summary.xlsx"

# Do some summary calcs
# In the real world, this would likely be much more involved
df_summary = df.groupby('category')['ext price', 'Tax amount'].sum()

# Save the file as Excel
df_summary.to_excel(out_file)

# Open up Excel and make it visible
excel = win32.gencache.EnsureDispatch('Excel.Application')
excel.Visible = True

# Open up the file
excel.Workbooks.Open(out_file)

# Wait before closing it
_ = input("Press enter to close Excel")
excel.Application.Quit()

Here’s the resulting Excel output:

Excel Image

This simple example expands on the earlier one by showing how to use the Workbooks object to open up a file.

Attach an Excel file to Outlook

Another simple scenario where COM is helpful is when you want to attach a file to an email and send to a distribution list. This example shows how to do some data manipulation, open up a Outlook email, attach a file and leave it open for additional text before sending.

Here’s the full example:

import win32com.client as win32
import pandas as pd
from pathlib import Path
from datetime import date


to_email = """
Lincoln, Abraham <honest_abe@example.com>; chris@example.com
"""

cc_email = """
Franklin, Benjamin <benny@example.com>
"""

# Read in the remote data file
df = pd.read_csv("https://github.com/chris1610/pbpython/blob/master/data/sample-sales-tax.csv?raw=True")

# Define the full path for the output file
out_file = Path.cwd() / "tax_summary.xlsx"

# Do some summary calcs
# In the real world, this would likely be much more involved
df_summary = df.groupby('category')['ext price', 'Tax amount'].sum()

# Save the file as Excel
df_summary.to_excel(out_file)

# Open up an outlook email
outlook = win32.gencache.EnsureDispatch('Outlook.Application')
new_mail = outlook.CreateItem(0)

# Label the subject
new_mail.Subject = "{:%m/%d} Report Update".format(date.today())

# Add the to and cc list
new_mail.To = to_email
new_mail.CC = cc_email

# Attach the file
attachment1 = out_file

# The file needs to be a string not a path object
new_mail.Attachments.Add(Source=str(attachment1))

# Display the email
new_mail.Display(True)

This example gets a little more involved but the basic concepts are the same. We need to create our object (Outlook in this case) and create a new email. One of the challenging aspects of working with COM is that there is not a very consistent API. It is not intuitive that you create an email like this: new_mail = outlook.CreateItem(0) It generally takes a little searching to figure out the exact API for the specific problem. Google and stackoverflow are your friends.

Once the email object is created, you can add the recipient and CC list as well as attach the file. When it is all said and done, it looks like this:

Outlook email image

The email is open and you can add additional information and send it. In this example, I chose not to close out Outlook and let python handle those details.

Copying Data into Excel

The final example is the most involved but illustrates a powerful approach for blending the data analysis of python with the user interface of Excel.

It is possible to build complex excel with pandas but that approach can be very laborious. An alternative approach would be to build up the complex file in Excel, then do the data manipulation and copy the data tab to the final Excel output.

Here is an example of the Excel dashboard we want to create:

Dashboard example

Yes, I know that pie charts are awful but I can almost guarantee that someone is going to ask you to put one in the dashboard at some point in time! Also, this template had a pie chart and I decided to keep it in the final output instead of trying to figure out another chart.

It might be helpful to take a step back and look at the basic process the code will follow:

Process flow

Let’s get started with the code.

import win32com.client as win32
import pandas as pd
from pathlib import Path

# Read in the remote data file
df = pd.read_csv("https://github.com/chris1610/pbpython/blob/master/data/sample-sales-tax.csv?raw=True")

# Define the full path for the data file file
data_file = Path.cwd() / "sales_summary.xlsx"

# Define the full path for the final output file
save_file = Path.cwd() / "sales_dashboard.xlsx"

# Define the template file
template_file = Path.cwd() / "sample_dashboard_template.xlsx"

In the section we performed our imports, read in the data and defined all three files. Of note is that this process includes the step of summarizing the data with pandas and saving the data in an Excel file. We then re-open that file and copy the data into the template. It is a bit convoluted but this is the best approach I could figure out for this scenario.

Next we perform the analysis and save the temp Excel file:

# Do some summary calcs
# In the real world, this would likely be much more involved
df_summary = df.groupby('category')['quantity', 'ext price', 'Tax amount'].sum()

# Save the file as Excel
df_summary.to_excel(data_file, sheet_name="Data")

Now we use COM to merge the temp output file into our Excel dashboard tab and save a new copy:

# Use com to copy the files around
excel = win32.gencache.EnsureDispatch('Excel.Application')
excel.Visible = False
excel.DisplayAlerts = False

# Template file
wb_template = excel.Workbooks.Open(template_file)

# Open up the data file
wb_data = excel.Workbooks.Open(data_file)

# Copy from the data file (select all data in A:D columns)
wb_data.Worksheets("Data").Range("A:D").Select()

# Paste into the template file
excel.Selection.Copy(Destination=wb_template.Worksheets("Data").Range("A1"))

# Must convert the path file object to a string for the save to work
wb_template.SaveAs(str(save_file))

The code opens up Excel and makes sure it is not visible. Then it opens up the dashboard template and data files. It uses the Range("A:D").Select() to select all the data and then copies it into the template file.

The final step is to save the template as a new file.

This approach can be a very convenient shortcut when you have a situation where you want to use python for data manipulation but need a complex Excel output. You may not have an apparent need for it now but if you ever build up a complex Excel report, this approach is much simpler than trying to code the spreadsheet by hand with python.

Conclusion

My preference is to try to stick with python as much as possible for my day-to-day data analysis. However, it is important to know when other technologies can streamline the process or make the results have a bigger impact. Microsoft’s COM technology is a mature technology and can be used effectively through python to do tasks that might be too difficult to do otherwise. Hopefully this article has given you some ideas on how to incorporate this technique into your own workflow. If you have any tasks you like to use pywin32 for, let us know in the comments.

Changes

  • 29-Nov-2020: Updated code to capitalize Quit and Visible .

Comments