Practical Business Python

Taking care of business, one python script at a time

Mon 14 September 2015

Adding a Simple GUI to Your Pandas Script

Posted by Chris Moffitt in articles   

Introduction

Using python and pandas in the business world can be a very useful alternative to the pain of manipulating Excel files. While this combination of technologies is powerful, it can be challenging to convince others to use a python script - especially when many may be intimidated by using the command line. In this article I will show an example of how to easily create an end-user-friendly GUI using the Gooey library. This interface is based on wxWindows so it looks like a “native” application on Windows, Mac and Linux. Ultimately, I believe that presenting a simple user interface to your scripts can greatly increase the adoption of python in your place of business.

The Problem

I will be basing the example in this article on my prior post - Combining Data From Multiple Excel Files. The basic concept is that there is a periodic need to combine data from multiple excel files into a “master file” and perform some additional manipulations.

Unfortunately this process is error prone and time consuming when someone tries to do a lot of manual copying and pasting in Excel. However, it is relatively straightforward to create python + pandas scripts to perform the same manipulations in a more repeatable and robust format. However, as soon as you ask the user to type something like the line below, you will lose them:

python pandas_gui.py c:\Users\cmoffitt\Documents\src c:\users\cmoffitt\Documents\dest c:\users\cmoffitt\Documents\customer-status.xlsx -d 9/6/2015

Instead, you could present them with a simple GUI that looks something like this:

Example UI

The nice thing about this example is that you have standard windows directory and file chooser dialogs along with a standard date picker widget. It will be a much smoother transition for your users to use this UI than to try to remember how to use the command line with all the various arguments shown above.

The rest of this article will explain how to create this UI with very minor changes to the base code you would build using argparse . If you are not familiar with argparse then this article might be helpful to reference before you go much further. As shown in the article, argparse (and friends) are very sophisticated libraries but I have found that you can create very meaningful and useful tools with the very basic options I’ll show in this post.

Install

Gooey requires wxPython for its interface. wxPython can be a little tricky to install but if you are using the Anaconda or Miniconda distribution the install is as simple as:

conda install wxpython

I highly recommend using conda for the install process - especially on Windows.

Gooey can then be installed using pip:

pip install gooey

Building The Script

This notebook shows the basic idea for this program. What I will do next is build a simple version of this using argparse to pass in the source and destination directories as well as a location for the customer-status.xlsx file.

I am going to create a parse_args function to set up the following required inputs:

  • data_directory
  • output_directory
  • Customer account status file

I will add an example of an optional date argument as well but for the purposes of this example, I do not actually use the value. As they say, that is an exercise left to the reader.

The simplest example of argparse would look something like this:

from argparse import ArgumentParser
parser = ArgumentParser(description='Create Quarterly Marketing Report')
parser.add_argument('data_directory',
                        action='store',
                        help="Source directory that contains Excel files")
parser.add_argument('output_directory',
                        action='store',
                        help="Output directory to save summary report")
parser.add_argument('cust_file',
                        action='store',
                        help='Customer Account Status File')
parser.add_argument('-d', help='Start date to include')
args = parser.parse_args()

When you are ready to access your arguments, you can get them like this:

source = args.data_directory
dest = args.output_directory

One other unique aspect to this code is that I added a simple json dump of the arguments and restore them as the default next time the script is executed. I originally did this to streamline the testing process but realize that this would be helpful in the real world too. Here is the full code sample with the default values filled in based on the previous execution of the script.

from argparse import ArgumentParser

def parse_args():
    """ Use ArgParser to build up the arguments we will use in our script
    Save the arguments in a default json file so that we can retrieve them
    every time we run the script.
    """
    stored_args = {}
    # get the script name without the extension & use it to build up
    # the json filename
    script_name = os.path.splitext(os.path.basename(__file__))[0]
    args_file = "{}-args.json".format(script_name)
    # Read in the prior arguments as a dictionary
    if os.path.isfile(args_file):
        with open(args_file) as data_file:
            stored_args = json.load(data_file)
    parser = ArgumentParser(description='Create Quarterly Marketing Report')
    parser.add_argument('data_directory',
                        action='store',
                        default=stored_args.get('data_directory'),
                        help="Source directory that contains Excel files")
    parser.add_argument('output_directory',
                        action='store',
                        default=stored_args.get('output_directory'),
                        help="Output directory to save summary report")
    parser.add_argument('cust_file',
                        action='store',
                        default=stored_args.get('cust_file'),
                        help='Customer Account Status File')
    parser.add_argument('-d', help='Start date to include',
                        default=stored_args.get('d'))
    args = parser.parse_args()
    # Store the values of the arguments so we have them next time we run
    with open(args_file, 'w') as data_file:
        # Using vars(args) returns the data as a dictionary
        json.dump(vars(args), data_file)
    return args

This code allows us to do a basic command line interface that looks like this:

python pandas_gui_args.py --help

usage: pandas_gui_args.py [-h] [-d D]
                          data_directory output_directory cust_file

Create Quarterly Marketing Report

positional arguments:
  data_directory    Source directory that contains Excel files
  output_directory  Output directory to save summary report
  cust_file         Customer Account Status File

optional arguments:
  -h, --help        show this help message and exit
  -d D              Start date to include

The main section of the code would look like the section below. The basic flow is:

  • Get the command line inputs
  • Pass the appropriate ones to the input and processing functions
  • Save the data to the desired location
if __name__ == '__main__':
    conf = parse_args()
    print("Reading sales files")
    sales_df = combine_files(conf.data_directory)
    print("Reading customer data and combining with sales")
    customer_status_sales = add_customer_status(sales_df, conf.cust_file)
    print("Saving sales and customer summary data")
    save_results(customer_status_sales, conf.output_directory)
    print("Done")

Add a Gooey GUI

The command line solution shown above is very familiar to many but I imagine there are people in your organization that would instantly turn away from a solution that looks something like what I have shown above. However, Gooey makes it as easy as two lines of code to make a UI for this script.

The most basic steps are to import Gooey and add the decorator in front of the function that processes your arguments. Here is what it would look for our example:

from gooey import Gooey


@Gooey(program_name="Create Quarterly Marketing Report")
def parse_args():
    """ Rest of program below
    """

When you run this, you would see a simple UI like this:

Example UI v2

I think we all agree that this is fairly intuitive and would be something you could easily explain to your most non-technical users. The other nice thing is that it runs the same on Windows, Mac or Linux (as illustrated above).

The one challenge would be that users would probably expect to have some nice widgets to allow them to select directories and dates. If you would like to do that then you can substitute the GooeyParser for your ArgParser and add the widget information to the parser code.

Change

parser = ArgParser(description='Create Quarterly Marketing Report')

to

parser = GooeyParser(description='Create Quarterly Marketing Report')

And add your widget :

parser.add_argument('data_directory',
                    action='store',
                    default=stored_args.get('data_directory'),
                    widget='DirChooser',
                    help="Source directory that contains Excel files")

Here’s what it looks like to use the DirChooser1 , FileChooser and DateChooser widgets:

from gooey import Gooey, GooeyParser

@Gooey(program_name="Create Quarterly Marketing Report")
def parse_args():
    """ Use GooeyParser to build up the arguments we will use in our script
    Save the arguments in a default json file so that we can retrieve them
    every time we run the script.
    """
    stored_args = {}
    # get the script name without the extension & use it to build up
    # the json filename
    script_name = os.path.splitext(os.path.basename(__file__))[0]
    args_file = "{}-args.json".format(script_name)
    # Read in the prior arguments as a dictionary
    if os.path.isfile(args_file):
        with open(args_file) as data_file:
            stored_args = json.load(data_file)
    parser = GooeyParser(description='Create Quarterly Marketing Report')
    parser.add_argument('data_directory',
                        action='store',
                        default=stored_args.get('data_directory'),
                        widget='DirChooser',
                        help="Source directory that contains Excel files")
    parser.add_argument('output_directory',
                        action='store',
                        widget='DirChooser',
                        default=stored_args.get('output_directory'),
                        help="Output directory to save summary report")
    parser.add_argument('cust_file',
                        action='store',
                        default=stored_args.get('cust_file'),
                        widget='FileChooser',
                        help='Customer Account Status File')
    parser.add_argument('-d', help='Start date to include',
                        default=stored_args.get('d'),
                        widget='DateChooser')
    args = parser.parse_args()

Now you have some native widgets the look very customary for the host OS:

Example UI v3

The other nice feature is that when you execute the program you have a simple wrapper around the display and reasonable error windows if there is an underlying error in your program.

Example UI v4

One other handy component is that there is a “Restart” button at the bottom of the screen. If you select that button, you can go back to your input screen and adjust any variables and re-execute the program. This is really nice if you need to run the program multiple times with different inputs.

Part of what I really like about this solution is that there is very little additional overhead in your code. A traditional GUI (tkinter, QT, wxWindows etc) would require a lot of code to show this UI. This example shows how unobtrusive the solution can be.

The Final Program

I have not gone into the example of the actual pandas code but you can see in the full sample that it is relatively straightforward to use the arguments as inputs into your various python function calls. If you would like to see the code on github, here it is.

"""
Simple Pandas program to combine Excel files and summarize data.
This demonstrates the use of Gooey to add a simple UI on top of the script
"""
from __future__ import print_function
import pandas as pd
import numpy as np
import glob
import os
import json
from argparse import ArgumentParser
from gooey import Gooey, GooeyParser


@Gooey(program_name="Create Quarterly Marketing Report")
def parse_args():
    """ Use GooeyParser to build up the arguments we will use in our script
    Save the arguments in a default json file so that we can retrieve them
    every time we run the script.
    """
    stored_args = {}
    # get the script name without the extension & use it to build up
    # the json filename
    script_name = os.path.splitext(os.path.basename(__file__))[0]
    args_file = "{}-args.json".format(script_name)
    # Read in the prior arguments as a dictionary
    if os.path.isfile(args_file):
        with open(args_file) as data_file:
            stored_args = json.load(data_file)
    parser = GooeyParser(description='Create Quarterly Marketing Report')
    parser.add_argument('data_directory',
                        action='store',
                        default=stored_args.get('data_directory'),
                        widget='DirChooser',
                        help="Source directory that contains Excel files")
    parser.add_argument('output_directory',
                        action='store',
                        widget='DirChooser',
                        default=stored_args.get('output_directory'),
                        help="Output directory to save summary report")
    parser.add_argument('cust_file',
                        action='store',
                        default=stored_args.get('cust_file'),
                        widget='FileChooser',
                        help='Customer Account Status File')
    parser.add_argument('-d', help='Start date to include',
                        default=stored_args.get('d'),
                        widget='DateChooser')
    args = parser.parse_args()
    # Store the values of the arguments so we have them next time we run
    with open(args_file, 'w') as data_file:
        # Using vars(args) returns the data as a dictionary
        json.dump(vars(args), data_file)
    return args


def combine_files(src_directory):
    """ Read in all of the sales xlsx files and combine into 1
    combined DataFrame
    """
    all_data = pd.DataFrame()
    for f in glob.glob(os.path.join(src_directory, "sales-*.xlsx")):
        df = pd.read_excel(f)
        all_data = all_data.append(df, ignore_index=True)
    all_data['date'] = pd.to_datetime(all_data['date'])
    return all_data


def add_customer_status(sales_data, customer_file):
    """ Read in the customer file and combine with the sales data
    Return the customer with their status as an ordered category
    """
    df = pd.read_excel(customer_file)
    all_data = pd.merge(sales_data, df, how='left')
    # Default everyone to bronze if no data included
    all_data['status'].fillna('bronze', inplace=True)
    # Convert the status to a category and order it
    all_data["status"] = all_data["status"].astype("category")
    all_data["status"].cat.set_categories(["gold", "silver", "bronze"], inplace=True)
    return all_data


def save_results(sales_data, output):
    """ Perform a summary of the data and save the data as an excel file
    """
    summarized_sales = sales_data.groupby(["status"])["unit price"].agg([np.mean])
    output_file = os.path.join(output, "sales-report.xlsx")
    writer = pd.ExcelWriter(output_file, engine='xlsxwriter')
    summarized_sales = summarized_sales.reset_index()
    summarized_sales.to_excel(writer)


if __name__ == '__main__':
    conf = parse_args()
    print("Reading sales files")
    sales_df = combine_files(conf.data_directory)
    print("Reading customer data and combining with sales")
    customer_status_sales = add_customer_status(sales_df, conf.cust_file)
    print("Saving sales and customer summary data")
    save_results(customer_status_sales, conf.output_directory)
    print("Done")

I think you will agree that this can be a really useful solution for distributing small standalone programs to users that are not comfortable running python from the command line. I have built this example around pandas but it would work for pretty much any python script using argparse. Obviously if you need a more robust solution then you’ll need to evaluate other options but I would argue that there is a lot of power in this potential solution.

Edit History

  • 9-14-205: Corrected typos
  • 9-17-2015: Updated example code imports to include GooeyParser

 
       Vote on Hacker News          

Comments