This blog post is brought to you by the developer of BitBudget. BitBudget is an automated budgeting app for Android and iOS which syncs with your bank account and helps you avoid overspending. If you’d like to quit living paycheck-to-paycheck and get a better handle on your finances, download it today! https://bitbudget.io
At the moment I happen to be working on a personal finance app called BitBudget, where I need to be able to figure out if two purchases were made with the same merchant, or different merchants. For example, does a financial transaction labeled “VZWRLSS*PRPAY AUTOPAY 02/19 PURCHASE” represent a transaction with the same company as “VZWRLSS*PRPAY AUTOPAY 03/19 PURCHASE”? What about “CALM 03/26 PURCHASE” and “CALM 02/26 PURCHASE INTERNET”? Many transactions are easy to compare, “McDonald’s” vs “Amazon”, or “Chipotle Mexican Grill” vs “Starbucks”, but once banks and payment processors start throwing in all this extra information things get tricky.
While I personally am not really a computer science / algorithms kind of guy, sometimes you really need an algorithm to get the job done! So if you find yourself in the same boat I have your answer right here. After a quick google search on this topic I discovered that there appear to be three different types of string comparison algorithms:
- Edit Distance Based
- Token Based
- Sequence Based
If you’d like to read more about the three types of algorithm’s I suggest reading: String similarity — the basic know your algorithms guide!
For my use-case, I decided that an Edit Distance Based algorithm would be the quickest strategy to implement myself, so that’s what I’m going with and would like to present below. The actual code that I will end up using in my app may feature a few modifications for speed, such as shortening the number of iterations before returning a result, but what I have so far should be a good start if you need something quick and easy: