Practical Business Python

Taking care of business, one python script at a time

Mon 17 August 2015

Creating Powerpoint Presentations with Python

Posted by Chris Moffitt in articles   

Introduction

Love it or loathe it, PowerPoint is widely used in most business settings. This article will not debate the merits of PowerPoint but will show you how to use python to remove some of the drudgery of PowerPoint by automating the creation of PowerPoint slides using python.

Fortunately for us, there is an excellent python library for creating and updating PowerPoint files: python-pptx. The API is very well documented so it is pretty easy to use. The only tricky part is understanding the PowerPoint document structure including the various master layouts and elements. Once you understand the basics, it is relatively simple to automate the creation of your own PowerPoint slides. This article will walk through an example of reading in and analyzing some Excel data with pandas, creating tables and building a graph that can be embedded in a PowerPoint file.

PowerPoint File Basics

Python-pptx can create blank PowerPoint files but most people are going to prefer working with a predefined template that you can customize with your own content. Python-pptx’s API supports this process quite simply as long as you know a few things about your template.

Before diving into some code samples, there are two key components you need to understand: Slide Layouts and Placeholders. In the images below you can see an example of two different layouts as well as the template’s placeholders where you can populate your content.

In the image below, you can see that we are using Layout 0 and there is one placeholder on the slide at index 1.

PowerPoint Layout 0

In this image, we use Layout 1 for a completely different look.

PowerPoint Layout 1

In order to make your life easier with your own templates, I created a simple standalone script that takes a template and marks it up with the various elements.

I won’t explain all the code line by line but you can see analyze_ppt.py on github. Here is the function that does the bulk of the work:

def analyze_ppt(input, output):
    """ Take the input file and analyze the structure.
    The output file contains marked up information to make it easier
    for generating future powerpoint templates.
    """
    prs = Presentation(input)
    # Each powerpoint file has multiple layouts
    # Loop through them all and  see where the various elements are
    for index, _ in enumerate(prs.slide_layouts):
        slide = prs.slides.add_slide(prs.slide_layouts[index])
        # Not every slide has to have a title
        try:
            title = slide.shapes.title
            title.text = 'Title for Layout {}'.format(index)
        except AttributeError:
            print("No Title for Layout {}".format(index))
        # Go through all the placeholders and identify them by index and type
        for shape in slide.placeholders:
            if shape.is_placeholder:
                phf = shape.placeholder_format
                # Do not overwrite the title which is just a special placeholder
                try:
                    if 'Title' not in shape.text:
                        shape.text = 'Placeholder index:{} type:{}'.format(phf.idx, shape.name)
                except AttributeError:
                    print("{} has no text attribute".format(phf.type))
                print('{} {}'.format(phf.idx, shape.name))
    prs.save(output)

The basic flow of this function is to loop through and create an example of every layout included in the source PowerPoint file. Then on each slide, it will populate the title (if it exists). Finally, it will iterate through all of the placeholders included in the template and show the index of the placeholder as well as the type.

If you want to try it yourself:

python analyze_ppt.py simple-template.ppt simple-template-markup.ppt

Refer to the input and output files to see what you get.

Creating your own PowerPoint

For the dataset and analysis, I will be replicating the analysis in Generating Excel Reports from a Pandas Pivot Table. The article explains the pandas data manipulation in more detail so it will be helpful to make sure you are comfortable with it before going too much deeper into the code.

Let’s get things started with the inputs and basic shell of the program:

from __future__ import print_function
from pptx import Presentation
from pptx.util import Inches
import argparse
import pandas as pd
import numpy as np
from datetime import date
import matplotlib.pyplot as plt
import seaborn as sns

# Functions go here

if __name__ == "__main__":
    args = parse_args()
    df = pd.read_excel(args.report.name)
    report_data = create_pivot(df)
    create_chart(df, "report-image.png")
    create_ppt(args.infile.name, args.outfile.name, report_data, "report-image.png")

After we create our command line args, we read the source Excel file into a pandas DataFrame. Next, we use that DataFrame as an input to create the Pivot_table summary of the data:

def create_pivot(df, index_list=["Manager", "Rep", "Product"],
                 value_list=["Price", "Quantity"]):
    """
    Take a DataFrame and create a pivot table
    Return it as a DataFrame pivot table
    """
    table = pd.pivot_table(df, index=index_list,
                           values=value_list,
                           aggfunc=[np.sum, np.mean], fill_value=0)
    return table

Consult the Generating Excel Reports from a Pandas Pivot Table if this does not make sense to you.

The next piece of the analysis is creating a simple bar chart of sales performance by account:

def create_chart(df, filename):
    """ Create a simple bar chart saved to the filename based on the dataframe
    passed to the function
    """
    df['total'] = df['Quantity'] * df['Price']
    final_plot = df.groupby('Name')['total'].sum().order().plot(kind='barh')
    fig = final_plot.get_figure()
    # Size is the same as the PowerPoint placeholder
    fig.set_size_inches(6, 4.5)
    fig.savefig(filename, bbox_inches='tight', dpi=600)

Here is a scaled down version of the image:

PowerPoint Graph

We have a chart and a pivot table completed. Now we are going to embed that information into a new PowerPoint file based on a given PowerPoint template file.

Before I go any farther, there are a couple of things to note. You need to know what layout you would like to use as well as where you want to populate your content. In looking at the output of analyze_ppt.py we know that the title slide is layout 0 and that it has a title attribute and a subtitle at placeholder 1.

Here is the start of the function that we use to create our output PowerPoint:

def create_ppt(input, output, report_data, chart):
    """ Take the input powerpoint file and use it as the template for the output
    file.
    """
    prs = Presentation(input)
    # Use the output from analyze_ppt to understand which layouts and placeholders
    # to use
    # Create a title slide first
    title_slide_layout = prs.slide_layouts[0]
    slide = prs.slides.add_slide(title_slide_layout)
    title = slide.shapes.title
    subtitle = slide.placeholders[1]
    title.text = "Quarterly Report"
    subtitle.text = "Generated on {:%m-%d-%Y}".format(date.today())

This code creates a new presentation based on our input file, adds a single slide and populates the title and subtitle on the slide. It looks like this:

PowerPoint Title Slide

Pretty cool huh?

The next step is to embed our picture into a slide.

From our previous analysis, we know that the graph slide we want to use is layout index 8, so we create a new slide, add a title then add a picture into placeholder 1. The final step adds a subtitle at placeholder 2.

# Create the summary graph
graph_slide_layout = prs.slide_layouts[8]
slide = prs.slides.add_slide(graph_slide_layout)
title = slide.shapes.title
title.text = "Sales by account"
placeholder = slide.placeholders[1]
pic = placeholder.insert_picture(chart)
subtitle = slide.placeholders[2]
subtitle.text = "Results consistent with last quarter"

Here is our masterpiece:

PowerPoint Chart

For the final portion of the presentation, we will create a table for each manager with their sales performance.

Here is an image of what we’re going to achieve:

PowerPoint Table

Creating tables in PowerPoint is a good news / bad news story. The good news is that there is an API to create one. The bad news is that you can’t easily convert a pandas DataFrame to a table using the built in API. However, we are very fortunate that someone has already done all the hard work for us and created PandasToPowerPoint.

This excellent piece of code takes a DataFrame and converts it to a PowerPoint compatible table. I have taken the liberty of including a portion of it in my script. The original has more functionality that I am not using so I encourage you to check out the repo and use it in your own code.

# Create a slide for each manager
for manager in report_data.index.get_level_values(0).unique():
    slide = prs.slides.add_slide(prs.slide_layouts[2])
    title = slide.shapes.title
    title.text = "Report for {}".format(manager)
    top = Inches(1.5)
    left = Inches(0.25)
    width = Inches(9.25)
    height = Inches(5.0)
    # Flatten the pivot table by resetting the index
    # Create a table on the slide
    df_to_table(slide, report_data.xs(manager, level=0).reset_index(),
                left, top, width, height)
prs.save(output)

The code takes each manager out of the pivot table and builds a simple DataFrame that contains the summary data. Then uses the df_to_table to convert the DataFrame into a PowerPoint compatible table.

If you want to run this on your own, the full code would look something like this:

python create_ppt.py simple-template.pptx sales-funnel.xlsx myreport.pptx

All of the relevant files are available in the github repository.

Conclusion

One of the things I really enjoy about using python to solve real world business problems is that I am frequently pleasantly surprised at the rich ecosystem of very well thought out python tools already available to help with my problems. In this specific case, PowerPoint is rarely a joy to use but it is a necessity in many environments.

After reading this article, you should know that there is some hope for you next time you are asked to create a bunch of reports in PowerPoint. Keep this article in mind and see if you can find a way to automate away some of the tedium!


 
       Vote on Hacker News          

Comments