Have you ever heard about Mint?
It's a pretty popular budget tracker developed by Intuit. It's able to fetch transactions directly from your bank account. But to do this, it needs — you guessed it — your bank account's login credentials1. Thanks but no thanks!
Back in 2014, I was searching for the perfect budget tracker — one that I could trust more than those on the market. Besides Mint and similar apps, the alternatives were not great. Most required that you manually record each one of your purchases into an app, which in my view was neither practical nor sustainable in the long run.
So I wondered, could I build my own?
Coming up with version 1.0
Before writing any code, I figured that I could easily test out my idea using Excel. Because what's better than a spreadsheet to quickly make a budget?
Technically, I would log into my bank account each month, copy the list of transactions displayed on the screen into a spreadsheet and compute whatever stats I needed. For example, I could compute the total amount in dollars for grocery purchases in the past month. To do so, I'd manually tag or classify transactions. Simple enough, right?
This looked something like this:
I did this for a few months, but soon enough, I realized that it was rather time-consuming and error-prone. For example, the data that I needed to copy into my spreadsheet required some cleaning and formatting every single time to adapt it to my spreadsheet's format.
So there had to be a better way.
Coming up with version 2.0
My earlier tests with Excel had given me some insights into what my application needed:
- It needed to automatically connect to my bank account and download the list of transactions for the past month (for all of my accounts and credit cards).
- It needed to automatically classify transactions and compute stats about them, like the total amount in dollars for grocery purchases for the past month.
- It needed to store results somewhere, like in a file or a database, for later consultation.
Sure, things needed to be automated in some capacity, but not at 100% — I expected to run my tool manually each month.
I initially came up with the following design:
- Downloader would be responsible for downloading transactions from my bank's website.
- Parser would parse the downloaded transactions to create their internal representation.
- Classifier would go through all transactions and classify them depending on which category they belonged to (e.g. grocery, leisure, etc.).
- Formatter would perform calculations on transactions based on how they had been classified and it would also format them in a way that was compatible with the underlying storage.
- Writer would write the data to a file or a database, depending on what was better.
That was basically it!
1. Downloading transactions
Until then, I hadn't realized that it was possible to download monthly transaction reports as TSV files directly from my bank's web app. So I built a Selenium-based web scraper in C# that prompted me for my bank account's credentials and automatically downloaded the necessary TSV files.
2. Parsing transactions
The TSV files that my scraper downloaded contained all kinds of information; transactions for the past month and extra information that was unnecessary for my purpose, such as headers and footers, the account balance, credit card fees, interest rates, account numbers, etc. So I had to filter them out.
3. Classifying transactions
Whenever my program ran into an unknown transaction — one whose description wasn't recognized — it prompted me to define a rule that matched the description's amount. Then it asked me which category (and optional sub-category) it should assign to it:
The transaction 'HOME DEPOT #9999' (302.21$) is unknown. Please define a condition to identify it (press [Esc] for no condition). Formula: (amount >= 300.00) Category: R Sub category: (None)
In the above example, a transaction with the description "HOME DEPOT #9999" had never been seen before, so I specified that such a transaction with an amount greater than or equal to 300$ was associated with home renovations (hence why I specified "R" for its category). Then it was automatically stored in a config file as follows:
<?xml version="1.0" encoding="utf-8"?> <transactions> <transaction name="HOME DEPOT #9999"> <rules> <rule formula="(amount >= 300.0)" category="R" subcategory="" /> </rules> </transaction> ... <transactions>
So the config file was progressively filled with new rules as new (unknown) transactions were encountered. And every time I launched my program, it loaded the config file in memory so that any transaction matching one of the rules was automatically assigned the corresponding category (and sub-category). I had to implement my own arithmetic expression parser and evaluator to do this2.
4. Formatting and storing results
Initially, I wasn't sure whether a spreadsheet or a database would be most appropriate for my needs. But then I figured that an Excel spreadsheet would be the wiser choice, mostly because I'd be able to easily consult the results again and again. So I used the .NET Excel Interop to do this. This implied that my program didn't need to compute totals and other stats by itself — these things were delegated directly to Excel via regular cell formulas.
I intended to run my program manually each month, so I'd run it from my desktop. No hosting needed!
Coming up with version 3.0
Version 2.0 was a pretty major improvement over version 1.0 — it automated many things that I no longer needed to do manually. However, it still had these flaws:
- I had to manually run my program each month, which became pretty annoying after a while.
- I had to manually categorize transactions, which was unnecessarily repetitive.
- I was missing some kind of visual representation of transactions, so it was tedious to perform, let's say, month-to-month comparisons.
- I could only view my budget from my desktop (it was stored in a spreadsheet). What if I wanted to view it from another computer or a smartphone?
- I couldn't view my transactions in real-time; the generated Excel spreadsheet was only for the past month.
So I had an idea — why not build my own web app to specifically address these issues? Because I knew how to build a website, so building a web app probably wasn't too complicated, right?
Upgrading my skills
C# had been my go-to language up until then, both in my day job and in my personal projects. But quite honestly, I couldn't picture myself building my app in ASP.NET and having to deal with IIS — this would have forced me to host the whole app on Windows. Remember that back in 2014, .NET (and even less so ASP.NET) was NOT supported on Linux. So I had to look elsewhere.
At the time, Single-Page Applications (SPAs) were all the rage. So there was no better opportunity than my project to learn how to build one. I went through several tutorials, including that of React.js to learn how to build a SPA. It was surprisingly easier to do than I initially thought.
So I went ahead and taught myself Python, which was a concise, mature and cross-platform programming language. I read the two editions of Dive Into Python from cover to cover to learn the basics. And since I had heard great things about the Flask web framework, I followed Miguel Grinberg's excellent tutorial to learn how to build REST APIs in Python.
Having refreshed my skills, I was then ready to update my requirements and my design.
My application definitely deserved more, especially on the usability side:
- It needed to automatically synchronize itself with my latest financial transactions (for all of my bank accounts and credit cards) — ideally in near real-time — without requiring any intervention on my part whatsoever.
- It needed to automatically classify transactions based on easily customizable rules.
- It needed to allow browsing through transactions and filtering them out by account and by month.
- It needed to display two charts — one showing the cumulative amount of money I had among all of my accounts over time, and another one showing the total monthly amount in dollars for each purchase category (e.g. 400$ of groceries in March, 375$ in April, and so on).
- It needed to allow me to manually trigger a synchronization or a classification run of transactions. This would be useful for debugging purposes.
- It needed to be accessible from anywhere, be it from a laptop or a smartphone.
- It needed to handle everything securely, and data had to be encrypted so that no one besides me could access it.
I wasn't exactly sure how I'd implement all of these things, but I thought that they'd make an awesome app.
Here's the new design that I came up with for my app:
- Task scheduler would be responsible for triggering and scheduling new tasks. Tasks would be pushed to a queue and persisted to a database. A config would define a schedule for the different kinds of tasks.
- Task executer would be responsible for executing the various enqueued tasks.
- A synchronization task would scrape all the latest transactions from a bank account and/or credit cards and store them in a database. The web scraper would use the authentication details provided by a config to log into the bank account.
- A categorization task would go through all unclassified tasks and assign them a category based on user-defined rules.
- A cleanup task would go through the task history in the database and delete all the ones older than X days.
- Web service would provide several REST endpoints for interacting with the database. For example, they would allow to list, update or delete transactions, rules and tasks.
- Web app would be a SPA securely connecting to the web service for all of its needs.
Although I initially planned to be the sole user, I designed things so that I could eventually support multiple users, like who knows — perhaps friends, family or even paying users.
Overall, my app was divided into two parts:
- The frontend, which consisted of a responsive Single-Page Application (SPA) made with HTML5, React.js, React-Bootstrap, Chart.js, Moment.js, Underscore.js and jQuery.
- The backend, which was a mixture of Python, Flask, Selenium and SQLite.
Let's dig a little bit more into the technical details.
1. Cloud hosting
I used a single DigitalOcean VM with 1GB of RAM running CentOS. This was plenty enough for my app's needs.
I had already developed a minimal web scraper in C# in version 2.0, but it was no longer enough. My app needed to download daily transactions, not just a monthly report!
So I developed a Selenium-based web scraper in Python capable of downloading whatever information I needed from my bank account, including account/card numbers and transactions. It browsed through the various screens of my bank's web app to do this. This required careful validations at each step to get reliable results. For example, navigating to a screen required validating whether the scraper had ended up on the right screen. Otherwise, unexpected behavior might have occurred, and let's face it — you don't want anything to mess with your bank account.
Besides, the login credentials that the scraper used to log into my bank's web app were stored securely in an encrypted file (I used the great cryptography.io library for this).
My app would essentially store transactions, categorization rules, a task history and user information, which could all be stored using a predefined schema. So using a good ol' relational database was the way to go.
I initially wasn't sure whether I'd use a more beefed-up database like PostgreSQL, or an embedded one such as SQLite. But since I planned to be the sole user initially, I figured that using PostgreSQL would be overkill and also quite heavy on my VM's resources. So I went ahead and chose SQLite. Fortunately, it supported some level of concurrency, so my main server and my web service could live in separate processes and still share the same database.
4. Web server
For serving my web app as well as the web service, I chose Nginx, coupled with uWSGI. Not only had I heard great things about Nginx's performance and ease of use, but it was also cross-platform, so I could easily test it out on my Windows-based development machine3.
First of all, I enabled HTTPS (and disabled HTTP) on my Nginx server and I configured an SSL certificate.
Also, my web app used JWS/JWT to handle authentication, which seemed like the standard way to go. The user's password was salted and hashed using the excellent Passlib library and the result was ultimately stored in the local SQLite database (along with the username).
Also, I configured strict firewall rules on the backend using iptables. I blocked every port by default, except those that were strictly necessary. For an extra layer of security, I configured SSH Key-Based Authentication (and I disabled password authentication) and Fail2ban.
Finally, I implemented various validators and sanitizers to ensure that my web app wasn't susceptible to SQL injection attacks.
6. Single-Page Application (SPA)
To create charts, I briefly considered D3.js, but the learning curve seemed quite steeper than that of Chart.js, so I settled on the latter. To create the "stacked area" effect, I came up with a couple of algorithms of my own. For example, to create the "Total $" stacked area chart, I retrieved the "account balance" data provided by my bank for each one of my accounts. But since the balance was only available for the dates at which transactions had occurred (and not for every single day of the month), I had to interpolate data. Also, I had to superpose several layers of data to create the "stacked area" effect4.
And ultimately, my SPA just called my web service's endpoints to either update the backend's database or retrieve information that it needed to display.
7. Rule parsing and evaluation
For transactions to be automatically classified (i.e. assigned a category, such as "G" for "Groceries"), my app needed to use user-defined rules. When a user entered a new rule in the UI, it was as a plain text string. So it needed to be parsed, validated, and evaluated by the backend.
I had already developed my own arithmetic expression parser and evaluator in C# in version 2.0, but it was pretty limited. So I built an entirely new version in Python5, which supported more operators (boolean, binary, comparison, and parentheses) — all the while respecting their precedence — and more database fields. For example, "description == 'HOME DEPOT' && amount >= 50.0" matched any transaction made at Home Depot whose amount was greater than or equal to 50.00$CAD.
8. Task scheduling
Every action performed by the backend was a task, which came in two flavors — scheduled or manually triggered. Scheduled tasks were configured using a separate YAML file as follows:
!Root task_schedules: - !TaskSchedule task_type: synchronize utc_time: '12:00:00' - !TaskSchedule task_type: categorize utc_time: '13:00:00' - !TaskSchedule task_type: cleanup utc_time: '14:00:00'
These were automatically triggered by the Task Scheduler (e.g. at 2 pm each day, any task older than 7 days was deleted from the task history). But the user was also free to manually trigger tasks on demand via the UI — this was useful when debugging or when configuring and testing out new categorization rules. Ultimately, both kinds of task were handled by the Task Executer, which was responsible for executing them. Internally, it used a queue6 from which a background thread dequeued tasks to execute.
9. Logging and monitoring
Since I was initially the sole user, my app didn't use any fancy logging or monitoring.
I used Python's logging module, coupled with my own "debugging" screen (made with React). So every time I logged into my app, I knew whether the last task had completed successfully or not — if it hadn't, then I just looked at the generated log files and I determined pretty quickly what the issue was.
Let's get a tour
Now, let's go through the various screens of my app to see what it looked like. It was aesthetically pretty basic, but it was doing its job perfectly.
First, an authentication screen provided a secure way to log into my web app.
Under "Transactions", I had an overview of all of my transactions. I could filter them out by account number (including credit card number) and by month.
Under "Statistics", I was able to get charts7 representing:
- the cumulative amount of money in all of my bank accounts over time (i.e. "Total $")8.
- the amount of money spent monthly for each category of item that I had previously configured (i.e. "$/category").
Under "Conditions", I was able to configure the categorization rules for transactions using simple arithmetic formulas.
Under "Tasks", I was able to see the latest tasks performed by my app (a synchronization run, a categorization run, or a cleanup run9). This was extremely useful for debugging purposes.
Some buttons also allowed me to manually trigger either one of the tasks. For example, I could manually add a new categorization rule under "Conditions", then hit "Categorize" under "Tasks" to trigger a categorization run and then validate that the rule had worked as intended by going under "Transactions".
Why not make it a startup?
I was quite satisfied with how things had turned out on a technical level. But more than anything else, it was super satisfying to use a web app that I had created and that solved a problem that I personally had.
So this led me to wonder — what if aspiring entrepreneurs like me would like to track their own budget? I had never considered that! Fortunately, I knew how to quickly validate an idea thanks to my previous attempts at startups. But the problem was, how could I find a significant number of aspiring entrepreneurs? Well, the planets were surely aligned, because there was an upcoming entrepreneurship fair in my city, so I had to go there.
At the fair, I cold introduced myself to as many random people as possible10. Most of them were aspiring entrepreneurs, and they pretty much all told me the same thing. By and large, they didn't really care about tracking their budget. But even worse than that — they didn't want to know how much money they spent and how much there was left in their bank account because it scared them. What a totally unexpected finding, to say the least!
On site, I also spoke with teachers from a business school and a startup bootcamp. They told me that they definitely saw value in my app and that their students would surely be interested to use it. Sure, this was some sweet validation to my ears, and I didn't doubt that some of their students would find my app valuable. But who were those students, exactly? Well, they were aspiring entrepreneurs! Given what the 50 or so people had already told me, I felt that they were probably not the right market segment for my app. Anyway, would they be willing to pay for a problem that they didn't really care about?
Just in case, I also tested the waters with a couple of other segments in the following days, like freelancers and actual entrepreneurs. But again, their responses were similar to those of aspiring entrepreneurs. At least, I learned that some of them used QuickBooks.
In retrospect, those findings weren't that surprising. Very few people closely track their budget, and for those who do, few are willing to pay for a tool that enables them to to that. After all, Mint has always been free. And for those who are willing to pay something, then they're already using the tool that they need.
So be it — I realized that I'd stay the sole and only user of my web app.
RIP, budget tracker
Throughout the year that followed, I used my web app extensively. I found some bugs here and there, but nothing serious.
But I became warier of one thing. With all the data breaches that I had heard about, was there a risk of somebody hacking or compromising my server? Sure, my server was virtually unknown on the internet and it was properly secured. But it still contained very sensitive information — the kind that, once decrypted, could allow someone to take over my bank account.
I felt uneasy about the whole thing, even if I had taken all the necessary security measures. So after about a year of using my web app in production, I decided to shut it down for good and to change my bank account's password.
Fortunately, nothing bad happened.
This project was the first web app that I built completely from scratch to solve a problem of my own.
For the first time ever, I used Python, Flask, React.js, and many other technologies to build a web app that I deployed in production on Linux. I played with all the things that mainstream web apps need — UI, web services, storage, cryptography, security, automation, logging, etc. So from a technical perspective, it was a relenting success.
But businesswise, it was more like a failure. On one hand, the market for budget trackers was (and still is) highly saturated. Making my app distinctive enough would have been hard. On the other hand, I should have realized that since I wasn't willing to use Mint due to my security concerns, then I probably couldn't come up with a more secure alternative myself. I certainly was a bit naive here.
So in the end, it sure wasn't a business success, but hey — at least I had a lot of fun and I gained valuable technical skills in the process.
Paypal does the same thing when you connect your Paypal account to your bank account. To validate that you really own your bank account, it gives you two choices. Either you use the "manual" method, which performs two small deposits to your bank account and later asks you to confirm what the exact amounts were. Or you use the "instant" method, which asks for your bank account's login details and automatically validates things for you. In the latter case, it's pretty obvious that your credentials end up stored somewhere on Paypal servers. By the way, Plaid does the same thing! ↩
The C# version of my arithmetic expression parser and evaluator was quite limited in that it only supported basic operators (comparison and boolean ones) and a predetermined list of transaction fields ('amount' and 'date'). But it was good enough for my purpose. ↩
In 2014-2015, I remember that none of the existing solutions were satisfying. They were either incomplete or buggy. So since this was a personal project, and since I had already developed my own JSON/JSONPath parser and evaluator for my Universal Connector Framework, I figured that I could try to roll my own. In retrospect, it was a highly educational effort! ↩
The queue was backed by a separate SQLite database, thus providing persistence to tasks. ↩
This was a nice improvement over what my bank offered. With their online app, I had to use several different screens to do the same thing, and I could only see transactions by "half-month". ↩
I could switch from one chart to the other with ease using the provided dropdown. Also, I could select/unselect some of my bank accounts using the top-level multi-select dropdown, and charts were automatically updated accordingly. ↩
A "cleanup" run was essentially used to clean up the task history so that any task performed more than 1 week earlier was deleted. The idea was just to keep the database clean. ↩
Speaking with that many people was a lot of fun, seriously, and quite a challenge! ↩