# Building a Financial Model with Pandas - Version 2

Posted by Chris Moffitt in articles

## Introduction

In my last article, I discussed building a financial model in pandas that could be used for multiple amortization scenarios. Unfortunately, I realized that I made a mistake in that approach so I had to rethink how to solve the problem. Thanks to the help of several individuals, I have a new solution that resolves the issues and produces the correct results.

In addition to posting the updated solution, I have taken this article as an opportunity to take a step back and examine what I should have done differently in approaching the original problem. While it is never fun to make a mistake in front of thousands of people, I’ll try to swallow my pride and learn from it.

## What was the problem?

If you have not read the last article, it would be helpful to review it in order to get familiar with the problem I was trying to solve. As you can see in the comments, my solution was not correct because I failed to re-calculate the schedule based on the principal reduction from additional principal payments. Basically, I need to recalculate the values each period - which looks like a looping process. Unfortunately, this was what I was explicitly trying to avoid.

## What is the solution?

Based on comments from several knowledgeable readers, I think the best solution is to use a generator to build up the table and return it in a format that can be easily converted to a pandas dataframe. The benefit of the generator is that it gracefully handles the situation where the loan is paid off early due to additional principal payments. If you are not familiar with python generators and their usage, this post is helpful.

The first step in the updated solution is to build the ```
amortize
```

function
which effectively loops through each period and returns an ```
OrderedDict
```

which can be easily converted into a pandas dataframe.

```
import pandas as pd
from datetime import date
import numpy as np
from collections import OrderedDict
from dateutil.relativedelta import *
def amortize(principal, interest_rate, years, addl_principal=0, annual_payments=12, start_date=date.today()):
pmt = -round(np.pmt(interest_rate/annual_payments, years*annual_payments, principal), 2)
# initialize the variables to keep track of the periods and running balances
p = 1
beg_balance = principal
end_balance = principal
while end_balance > 0:
# Recalculate the interest based on the current balance
interest = round(((interest_rate/annual_payments) * beg_balance), 2)
# Determine payment based on whether or not this period will pay off the loan
pmt = min(pmt, beg_balance + interest)
principal = pmt - interest
# Ensure additional payment gets adjusted if the loan is being paid off
addl_principal = min(addl_principal, beg_balance - principal)
end_balance = beg_balance - (principal + addl_principal)
yield OrderedDict([('Month',start_date),
('Period', p),
('Begin Balance', beg_balance),
('Payment', pmt),
('Principal', principal),
('Interest', interest),
('Additional_Payment', addl_principal),
('End Balance', end_balance)])
# Increment the counter, balance and date
p += 1
start_date += relativedelta(months=1)
beg_balance = end_balance
```

Once this function is defined, building out a dataframe containing the full schedule for the results is straightforward:

```
schedule = pd.DataFrame(amortize(700000, .04, 30, addl_principal=200, start_date=date(2016, 1,1)))
schedule.head()
```

Period | Month | Begin Balance | Payment | Interest | Principal | Additional_Payment | End Balance | |
---|---|---|---|---|---|---|---|---|

0 | 1 | 2016-01-01 | 700000.00 | 3341.91 | 2333.33 | 1008.58 | 200.0 | 698791.42 |

1 | 2 | 2016-02-01 | 698791.42 | 3341.91 | 2329.30 | 1012.61 | 200.0 | 697578.81 |

2 | 3 | 2016-03-01 | 697578.81 | 3341.91 | 2325.26 | 1016.65 | 200.0 | 696362.16 |

3 | 4 | 2016-04-01 | 696362.16 | 3341.91 | 2321.21 | 1020.70 | 200.0 | 695141.46 |

4 | 5 | 2016-05-01 | 695141.46 | 3341.91 | 2317.14 | 1024.77 | 200.0 | 693916.69 |

```
schedule.tail()
```

Period | Month | Begin Balance | Payment | Interest | Principal | Additional_Payment | End Balance | |
---|---|---|---|---|---|---|---|---|

319 | 320 | 2042-08-01 | 14413.65 | 3341.91 | 48.05 | 3293.86 | 200.0 | 10919.79 |

320 | 321 | 2042-09-01 | 10919.79 | 3341.91 | 36.40 | 3305.51 | 200.0 | 7414.28 |

321 | 322 | 2042-10-01 | 7414.28 | 3341.91 | 24.71 | 3317.20 | 200.0 | 3897.08 |

322 | 323 | 2042-11-01 | 3897.08 | 3341.91 | 12.99 | 3328.92 | 200.0 | 368.16 |

323 | 324 | 2042-12-01 | 368.16 | 369.39 | 1.23 | 368.16 | 0.0 | 0.00 |

The nice aspect of this solution is that the generator approach builds up the results
in an incremental manner so that you do not have to try to determine how many iterations
you need in advance. Essentially, the code keeps calculating the ```
end_balance
```

each
period until it gets to 0 and the generator is complete.

## Example Analysis

I have built out a variation on this solution that also includes summary statistics on the scenarios so that you can easily see things like:

- How many payments will you make?
- When will the balance be paid off?
- How much in interest do you pay over the life of the loan?

This notebook contains the full working code. Here are a few example to show you how it works and can be a handy solution for modeling various scenarios:

```
schedule1, stats1 = amortization_table(100000, .04, 30, addl_principal=50, start_date=date(2016,1,1))
schedule2, stats2 = amortization_table(100000, .05, 30, addl_principal=200, start_date=date(2016,1,1))
schedule3, stats3 = amortization_table(100000, .04, 15, addl_principal=0, start_date=date(2016,1,1))
pd.DataFrame([stats1, stats2, stats3])
```

Payoff Date | Num Payments | Interest Rate | Years | Principal | Payment | Additional Payment | Total Interest | |
---|---|---|---|---|---|---|---|---|

0 | 2041-01-01 | 301 | 0.04 | 30 | 100000 | 477.42 | 50 | 58441.08 |

1 | 2032-09-01 | 201 | 0.05 | 30 | 100000 | 536.82 | 200 | 47708.38 |

2 | 2030-12-01 | 180 | 0.04 | 15 | 100000 | 739.69 | 0 | 33143.79 |

You could also build out some simple scenarios and visualize the alternative outcomes:

```
additional_payments = [0, 50, 200, 500]
fig, ax = plt.subplots(1, 1)
for pmt in additional_payments:
result, _ = amortization_table(100000, .04, 30, addl_principal=pmt, start_date=date(2016,1,1))
ax.plot(result['Month'], result['End Balance'], label='Addl Payment = ${}'.format(str(pmt)))
plt.title("Pay Off Timelines")
plt.ylabel("Balance")
ax.legend();
```

## Lessons Learned

I will admit that it is embarrassing to put out a “solution” to a problem and realize pretty quickly (due to feedback) that it was wrong. In the interest of continuous improvement, here are some lessons I learned:

- Understand the problem
- I made the mistake of thinking I knew how the pre-payment process worked but I was obviously mistaken. If I spent a little more time building up a prototype in Excel and validating the results ahead of time, I would have caught my errors much earlier in the process.

- Don’t fixate on a predefined solution approach
- I decided that I wanted to make the solution in pure-pandas without any looping. In reality, I should have thought about the entire problem and all of the options available in the python ecosystem - including the standard lib.

- Look at the standard lib
- While pandas has a lot of great tools, the python standard library is really rich and provides many capabilities that can solve a wide variety of problems.

- The python community is great
- So many places on the Internet can be hostile. However, I am very impressed with how many people publicly and privately offered their support to help me fix the issue. Everyone that reached out to me was doing it in the spirit of trying to help me understand the problem and build a better solution. I appreciate their patience and willingness to work with me on finding a better approach. Several people spent much of their own time looking at my proposed fix and offering their ideas on how to improve.

- Sometimes the best way to learn is to try and fail
- I went into this article series trying to learn more about using pandas.
However, I actually had a chance to learn and use generators for a real life
problem. As a result I do understand python generators much more and understand
why they are a good solution to this type of problem. I also spent some time
pondering how to use python’s
`min`

and`max`

functions to simplify some of my code.

Even with a bit of a stumble in this process, it has been a good learning experience and I hope it will be for many of you as well.

- ← Building a Financial Model with Pandas
- Data Science Challenge - Predicting Baseball Fanduel Points →

## Comments