Managing spending with Python & Pandas

Recently, my bank announced that they will shut down their online budget book functionality. During the last years, it gave me a nice overview of my spending and allowed me to plan for the next months. A while ago, I already did some analysis of exported transactions using Python. But now I had much more motivation to create a usable solution. In the last weeks I did just that. In this post, I give an overview of my workflow using an example dataset. The full project can be found on github.

Usually it is possible to export bank transactions in the csv format, which is just a plain text file. This may look something like this:

date       ;  amount ; category
2022-06-20 ;  -36.35 ; mobility/gas
2022-06-20 ;   -7.0  ; mobility/parking
2022-06-16 ;  -52.15 ; leisure/food
2022-06-16 ;   -5.63 ; living expenses/groceries
2022-06-15 ;  -32.23 ; leisure/hobby
2022-06-15 ;   -6.99 ; living expenses/groceries
2022-06-15 ;   -3.99 ; living expenses/phone
2022-06-10 ; -203.84 ; vacation/transport
2022-06-10 ;  -10.44 ; leisure/food

Here I already replaced the descriptions with categories. Without those, it is almost impossible to get meaningful insights from the data. Luckily I was able to export my transactions from the online budget book, so I already had a lot of data to work with.

In Python, tabular data like transactions can easily be processed with the Pandas package. The first step is to import the transactions as the Pandas data structure, a dataframe:

import pandas as pd

transactions = pd.read_csv(
'transactions.csv', header=0,
encoding='ISO-8859-1', sep=';',
decimal='.', thousands=',' )

Specifying the decimal and thousands separators allows Pandas to correctly identify the numbers. Another step is to make the dates understandable by Python:

transactions['date'] = pd.to_datetime( transactions['date'], format='%Y-%m-%d' )

Now that the data is correctly imported, the daily account balance can be calculated (the final balance has to be specified):

transactions['balance'] = transactions['amount'].cumsum()

transactions['balance'] = transactions['balance'] -
transactions.iloc[0]['balance'] + final_balance

The data can be quickly filtered by different criteria, for example a specific year:

transactions = transactions.query(
 'date.dt.year >= 2022 and 
  date.dt.year <  2023 ')

As a first result, the account balance during the year can be plotted (using an example data set):

This is an interesting result, but we still don’t know, for what the money was spent. The next step is to group the transactions by categories and months:

sum_cat_month = ( transactions
.groupby('category')
.resample('1M', on='date')
.sum()['amount']
.unstack(0) )

This creates a new dataframe, which gives a very nice overview (use landscape view on small screens):

           Jan ...   Dec   SUM  /MONTH
income    2700 ...  2700  32400   2700
flat     -1348 ... -1093 -13567  -1131
leisure   -133 ...  -180  -3027   -275
living    -519 ...  -526  -5839   -487
mobility  -566 ...  -225  -4544   -379
vacation     0 ...     0  -2620   -437
SUM        135 ...   677   2803    234

Numbers are nice, but a graphical representation is much easier to understand:

The sum of each column in the above plot is the balance per month:

We can also have a more detailed look at specific categories:

In the above plots, also a forecast of the next three months is included, based on the average of the past months. For costs which occur not every month, such as insurances, it makes more sense to use last year’s values as a prediction. All the plots are collected in a Jupyter Notebook, which can directly be exported as a report.

In the next post, I show how I automatically categorize new transcations using machine learning.

2 responses to “Managing spending with Python & Pandas”

Leave a comment

Design a site like this with WordPress.com
Get started