Practical Business Python

Taking care of business, one python script at a time

Mon 19 December 2016

Building a Financial Model with Pandas - Version 2

Posted by Chris Moffitt in articles   

article header image


In my last article, I discussed building a financial model in pandas that could be used for multiple amortization scenarios. Unfortunately, I realized that I made a mistake in that approach so I had to rethink how to solve the problem. Thanks to the help of several individuals, I have a new solution that resolves the issues and produces the correct results.

In addition to posting the updated solution, I have taken this article as an opportunity to take a step back and examine what I should have done differently in approaching the original problem. While it is never fun to make a mistake in front of thousands of people, I’ll try to swallow my pride and learn from it.

What was the problem?

If you have not read the last article, it would be helpful to review it in order to get familiar with the problem I was trying to solve. As you can see in the comments, my solution was not correct because I failed to re-calculate the schedule based on the principal reduction from additional principal payments. Basically, I need to recalculate the values each period - which looks like a looping process. Unfortunately, this was what I was explicitly trying to avoid.

What is the solution?

Based on comments from several knowledgeable readers, I think the best solution is to use a generator to build up the table and return it in a format that can be easily converted to a pandas dataframe. The benefit of the generator is that it gracefully handles the situation where the loan is paid off early due to additional principal payments. If you are not familiar with python generators and their usage, this post is helpful.

The first step in the updated solution is to build the amortize function which effectively loops through each period and returns an OrderedDict which can be easily converted into a pandas dataframe.

import pandas as pd
from datetime import date
import numpy as np
from collections import OrderedDict
from dateutil.relativedelta import *

def amortize(principal, interest_rate, years, addl_principal=0, annual_payments=12,

    pmt = -round(np.pmt(interest_rate/annual_payments, years*annual_payments, principal), 2)
    # initialize the variables to keep track of the periods and running balances
    p = 1
    beg_balance = principal
    end_balance = principal

    while end_balance > 0:

        # Recalculate the interest based on the current balance
        interest = round(((interest_rate/annual_payments) * beg_balance), 2)

        # Determine payment based on whether or not this period will pay off the loan
        pmt = min(pmt, beg_balance + interest)
        principal = pmt - interest

        # Ensure additional payment gets adjusted if the loan is being paid off
        addl_principal = min(addl_principal, beg_balance - principal)
        end_balance = beg_balance - (principal + addl_principal)

        yield OrderedDict([('Month',start_date),
                           ('Period', p),
                           ('Begin Balance', beg_balance),
                           ('Payment', pmt),
                           ('Principal', principal),
                           ('Interest', interest),
                           ('Additional_Payment', addl_principal),
                           ('End Balance', end_balance)])

        # Increment the counter, balance and date
        p += 1
        start_date += relativedelta(months=1)
        beg_balance = end_balance

Once this function is defined, building out a dataframe containing the full schedule for the results is straightforward:

schedule = pd.DataFrame(amortize(700000, .04, 30, addl_principal=200, start_date=date(2016, 1,1)))
Period Month Begin Balance Payment Interest Principal Additional_Payment End Balance
0 1 2016-01-01 700000.00 3341.91 2333.33 1008.58 200.0 698791.42
1 2 2016-02-01 698791.42 3341.91 2329.30 1012.61 200.0 697578.81
2 3 2016-03-01 697578.81 3341.91 2325.26 1016.65 200.0 696362.16
3 4 2016-04-01 696362.16 3341.91 2321.21 1020.70 200.0 695141.46
4 5 2016-05-01 695141.46 3341.91 2317.14 1024.77 200.0 693916.69
Period Month Begin Balance Payment Interest Principal Additional_Payment End Balance
319 320 2042-08-01 14413.65 3341.91 48.05 3293.86 200.0 10919.79
320 321 2042-09-01 10919.79 3341.91 36.40 3305.51 200.0 7414.28
321 322 2042-10-01 7414.28 3341.91 24.71 3317.20 200.0 3897.08
322 323 2042-11-01 3897.08 3341.91 12.99 3328.92 200.0 368.16
323 324 2042-12-01 368.16 369.39 1.23 368.16 0.0 0.00

The nice aspect of this solution is that the generator approach builds up the results in an incremental manner so that you do not have to try to determine how many iterations you need in advance. Essentially, the code keeps calculating the end_balance each period until it gets to 0 and the generator is complete.

Example Analysis

I have built out a variation on this solution that also includes summary statistics on the scenarios so that you can easily see things like:

  • How many payments will you make?
  • When will the balance be paid off?
  • How much in interest do you pay over the life of the loan?

This notebook contains the full working code. Here are a few example to show you how it works and can be a handy solution for modeling various scenarios:

schedule1, stats1 = amortization_table(100000, .04, 30, addl_principal=50, start_date=date(2016,1,1))
schedule2, stats2 = amortization_table(100000, .05, 30, addl_principal=200, start_date=date(2016,1,1))
schedule3, stats3 = amortization_table(100000, .04, 15, addl_principal=0, start_date=date(2016,1,1))

pd.DataFrame([stats1, stats2, stats3])
Payoff Date Num Payments Interest Rate Years Principal Payment Additional Payment Total Interest
0 2041-01-01 301 0.04 30 100000 477.42 50 58441.08
1 2032-09-01 201 0.05 30 100000 536.82 200 47708.38
2 2030-12-01 180 0.04 15 100000 739.69 0 33143.79

You could also build out some simple scenarios and visualize the alternative outcomes:

additional_payments = [0, 50, 200, 500]
fig, ax = plt.subplots(1, 1)

for pmt in additional_payments:
    result, _ = amortization_table(100000, .04, 30, addl_principal=pmt, start_date=date(2016,1,1))
    ax.plot(result['Month'], result['End Balance'], label='Addl Payment = ${}'.format(str(pmt)))
plt.title("Pay Off Timelines")
Alternative scenarios

Lessons Learned

I will admit that it is embarrassing to put out a “solution” to a problem and realize pretty quickly (due to feedback) that it was wrong. In the interest of continuous improvement, here are some lessons I learned:

  • Understand the problem

    I made the mistake of thinking I knew how the pre-payment process worked but I was obviously mistaken. If I spent a little more time building up a prototype in Excel and validating the results ahead of time, I would have caught my errors much earlier in the process.

  • Don’t fixate on a predefined solution approach

    I decided that I wanted to make the solution in pure-pandas without any looping. In reality, I should have thought about the entire problem and all of the options available in the python ecosystem - including the standard lib.

  • Look at the standard lib

    While pandas has a lot of great tools, the python standard library is really rich and provides many capabilities that can solve a wide variety of problems.

  • The python community is great

    So many places on the Internet can be hostile. However, I am very impressed with how many people publicly and privately offered their support to help me fix the issue. Everyone that reached out to me was doing it in the spirit of trying to help me understand the problem and build a better solution. I appreciate their patience and willingness to work with me on finding a better approach. Several people spent much of their own time looking at my proposed fix and offering their ideas on how to improve.

  • Sometimes the best way to learn is to try and fail

    I went into this article series trying to learn more about using pandas. However, I actually had a chance to learn and use generators for a real life problem. As a result I do understand python generators much more and understand why they are a good solution to this type of problem. I also spent some time pondering how to use python’s min and max functions to simplify some of my code.

Even with a bit of a stumble in this process, it has been a good learning experience and I hope it will be for many of you as well.

       Vote on Hacker News