How I Moved a Plaid App to a Pentadata App
How I moved a Plaid app to a Pentadata app
In this post, you’ll see how you can take your existing app that uses the Plaid API and add into it integration with Pentadata APIs in just a few steps. These two suites of APIs can coexist, however, if your objective is to retrieve transactions from bank accounts and cards, you‘ll likely want to migrate over time and keep just one integration.
Throughout this post, I’ll assume your main goal is to retrieve payments data (more often called “transactions”) for each users’ card and bank accounts. If you are working on personal-finance management (PFM) app, then you might want to give some actionable insight to your users based on their spending. More on this in one minute.
That’s not the only use case, of course, and you can get a good idea of the permitted use cases both from Plaid’s website  and Pentadata’s . Generally speaking, we’ve found that loyalty offers and merchant geo-localization are two other common use cases.
Both APIs discussed in this article are web services and they have several common elements, as well as important differences.
- Plaid’s APIs give a blend of resource access and procedure calls and, as such, don’t exactly follow the CRUD standards, and they come with their own terminology (items, links, etc.) that I’ll review in a second.
- Pentadata’s APIs use more developer-friendly wordings (Account, Person) and the specific API calls are pretty much what you’d expect in a Restful service.
Here’s the plan for the remainder of the article:
First, I will review with you a prototype implementation for an app that uses Plaid’s API. The content for this first part will largely be taken from a tutorial on their website . If you have already integrated with their system, then this section can be helpful to review some of your architectural choices. If you haven’t, it will probably clarify a few things that aren’t straightforward. In fact, I was able to figure out a few details only because the Pentadata systems are built to integrate with many different data sources at the same time, and the knowledge we accumulated over so many data integrations made the concepts in Plaid more familiar to me.
Then, you will see the differences between the two systems’ integrations. I will discuss several pros and cons of the two on a general ground, though which one matters the most will eventually depend on the specific details of your own system.
With these differences in mind, in the final technical section, we will see what a typical Pentadata app looks like. I will draw the parallel between each pair of steps in the two integrations, but, as you are going to see, there are way fewer steps in one of the two cases.
So, you are on the Plaid website and trying to understand the steps needed at a minimum to access payment transactions of the users of your app. Here are a few useful tips:
- First, you will have to have a user registered in your app. This person wants to link a financial account of theirs into your app. In general, you would say that you have a user and their profile in memory.
- As it normally happens, your user will click a button in your front-end application, which signals to your back-end that they trust you and want to add their financial account.
- To let the user do that, you have to ask Plaid API for a “link token.” There are several objects called “token” in Plaid’s system, so jot a note down: At this step, you want to have the link token.
- To generate the link token you need to invoke the endpoint /link/token/create. This must be done from your back-end because the call requires confidential information about your Plaid’s account, which has to be encrypted in transit.
- When the Link object is active, it will display a UI that your user will interact with. Therein, they can select their bank and verify their credentials (username and password). In some cases Plaid may store those credentials in their database; in others, they will use a safer OAuth protocol with the bank.
- As a reminder, so far the user has completed the bank authentication but they are still waiting on your UI for some sort of confirmation.
- The next step for you is to send the public token from your front-end to your back-end. You will want to do this in the same onSuccess function where you received the token, nesting some error handling therein.
- Moving on, the back-end receives the public token from your front-end, and it will use it to ask for a third (and last) type of token: the access token. To do that, you can use /item/public_token/exchange and be sure to call it from the back-end only because, again, this call needs your confidential Plaid account’s data.
It takes three API calls to Plaid, alternating front-end and back-end, and a whole new terminology to retrieve the piece of information that will give you access to the payment data, the access token.
As you are going to see, with Pentadata’s API, it will take just one API call to achieve the same goal.
If everything goes well, you will get the access token. By the way, I noticed that in some parts of Plaid’s docs the access token is also called “Item” (which I find a very counterintuitive name, you may even call that a thing, or stuff), so don’t get confused if you read Item somewhere.
Now, this is a pretty sensitive piece of data, in that it represents access to a financial account. On top of that, getting this token required a lot of work between the front-end and back-end, so you likely will want to store it. And you should store it now.
How to do that? There are several options and strategies that you can use.
One option is to tie each access token to the user that it was generated for (the guy who’s still waiting, staring at your UI). If you are using a relational data model, then this would mean to have a many-to-many relation like this:
That brings up the question about the uniqueness of entries in the relation: Without that, there’s no way your architecture will meet the first normal form (see ). Now, you might want to assign the unique constraint to the pair of columns in the figure above, but that feels odd to me. After all, the access token is not tied to your system. What if duplicates are possible (now, or in the future)?
A safer option is to add a serial integer, which is often used as a catch-all solution. If that works for you, it depends on how you are going to use the access token. Possible usage is, on scheduled intervals, to look up all access-tokens for the same user (and this is easy with the relation format above), and use each access token to perform an operation (e.g., retrieve payment transactions). This approach will require resolving duplicates, as briefly discussed a few paragraphs above.
Another possibility is to execute some additional operations before saving the access token. You could use the endpoint /accounts/get to immediately retrieve all accounts that are connected to this access token. Each account has a unique ID (as clearly stated in Plaid’s documentation), hence you could have a relationship like this:
Because the account IDs are unique, they could be foreign keys to a different data table where you save information about each account (e.g., the bank’s name). And, for the table above, you could enforce the uniqueness of the “account_id” column, so as to prevent duplicates.
The downside of this approach is that these additional operations take more time, and the user is still waiting.
Finally, a mixed solution would be to save immediately the access token (like in the first option), notify and thank the user, and then start an offline process to clean up the data and bring it in the format of the second option. If you think about this, then you realize that any way you will have to have an offline process to retrieve the transactions, and immediately the burden to have this additional infrastructure doesn’t seem too heavy (and even if it was, it really seems the only solution). In a simple, yet effective way, you could have a distributed cron system that schedules these processes. All considered this mixed solution would be my choice among the ones discussed up to this point.
Is your storage safe?
The following sentence (copied from Plaid’s docs on April 7th, 2021) caught my attention: By default, the access_token associated with an Item does not expire and should be stored in a persistent, secure manner.
In other words, you are required to store that string. And that string is a precious one, as it gives you access to a user’s financial account. Now, if that’s spoofed, the attacker would still need your Plaid credentials, which you should not keep in the same storage, preferably in no storage at all. (At Pentadata, we don’t store the credentials at all, they are environment variables not part of the git history.)
Safeness also means reliability. What happens if you lose an access token? From my understanding of Plaid’s system, you would have then lost access to the user account and will have to ask them to reauthenticate.
As you’re going to see in a second, because Pentadata systems expose CRUD operations to the users (you), and they are connected directly with the banks, we don’t require you to store anything, which increases safety and reliability.
Lastly, once the access token is saved (in any format you choose) you can start getting transaction data using the /transactions/get endpoint. Depending on your architectural choice, you can do that by the user (like in the first option discussed above) or by account (like in the second and third options). Either way, this will have to be an asynchronous process; two options are cron (my preference) or Airflow. Keep in mind that when you will start having a lot of users, and hence a lot of financial accounts, you shouldn’t be fetching all transactions for all accounts in one single run of the cron job. If you do that, you run into the risk that one execution of the job hasn’t finished yet and the next one is about to start! A queue-based approach makes more sense in my opinion, where you would cyclically put all account IDs and access-token, paired, in a queue, and then a process would de-queue each pair individually and execute the process for that pair, before de-queuing the next one.
Summary so far
You’ve seen so far what a prototypical app that uses Plaid to access users’ financial transactions looks like. In summary:
- Run a call from your servers to Plaid servers to get a link token.
- Pass the link token to your front-end.
- “Execute” that access token into the Link Plaid project, in your front-end.
- Get the public token from the Link object.
- Exchange it with your back-end.
- From the back-end, use the public token to get an access token.
- Store the access token.
- Notify the user on the front-end that everything worked well.
- Start fetching transactions (in your back-end) using the access token.
Switching to Pentadata
At this point I want to take a look with you at an illustrative app that utilizes Pentadata API for the same purpose, that is, to access the user’s financial transactions. A picture is worth a thousand words, so let me start with that, and then I will comment on every step.
Yes, it’s really that much simpler and quicker. The main reason is that this part of our API suite is architected around (and very close to) REST  and uses the standard CRUD operations . This simplifies the procedure and adds some benefits for the developers, such as not having to remember a lot of end-point names. In CRUD, a POST is to create a resource, so that’s what you’ll do with Pentadata. There’s no need to name the end-point with a trailing /create. A GET is, well, to get resources—no need to append a /get to the end-point.
The first resource you create is a Person. That’s really just a person, a user of your app. We are not big on mysterious names.
How to create a Person? Like in any other Restful system, you call POST /persons. Notice that the endpoint name is pluralized, according to the most widely accepted standard.
Why do you need to create a Person? First of all, you are not required to create such a resource for all your users; you have to do it only for those that you want to add to your Pentadata profile. Now, to your question, the reason is that accessing financial transactions is only one part of our APIs. The Person resource is used for other products such as the PayMeBack, which enables cashback and thus the card-linking offer, and the Auth, which enables you to verify the identity of your users, and that’s why it’s always the first step. Last but not least, Pentadata is a 100% consumer opt-in platform, so we want to make sure you’ve got their permission to perform operations.
Once that’s done the API response will give you the Person ID.
The next step is to create another resource, the Account. That’s again just what the name says, a financial account. You do this with a POST /accounts.
The action of creating an account will return to you a URL. That’s really just a URL (on secure https). You have to open that in your app, either web or mobile. It will display a secure log-in screen where the user can authorize the financial accounts that they want to share with you. After that, they are instantly redirected to a page of your choice.
Our systems are architected to integrate multiple, heterogeneous sources of data (which, incidentally, is why we are familiar with Plaid). Therefore, the account URL can change depending on what data source we are using. You have full control of that, and there’s even an optional argument to the POST /accounts call, named “bank” that you may or may not send. If you know in advance what bank the user will select, then we can choose a better data source for them. If you don’t, just leave it blank and we will fall back on our default. Overall, we’ve covered 97% of financial institutions in North America (as of March 2021).
So far, two resources have been created: Person and Account.
What’s next? Nothing, you’re finished. Your user has already landed on the confirmation page that you told us to redirect them to. We handle the synchronization with the financial institutions transparently and populate the resources in your Pentadata profile asynchronously.
What you should be doing at this point, is to start the same background process you used with Plaid, for instance via cron, and check when accounts and transactions are flowing in.
Our recommendation is to have a process that every so often (more on this in a second), checks what accounts the user added (which you can do via GET /persons/<id>/accounts) and then for every account what are the new transactions (via GET /accounts/<id>/transactions, that accepts timestamp arguments, see our docs ). Again, nothing magic here, just standard CRUD operations.
How often should you be running this process? My recommendation is to have an increasing time lapse between runs. It doesn’t have to be too complicated: you can keep running the process every 5 minutes until you start seeing transactions for the first time, and then run it every 30 minutes. You can even run it every 6 hours if that makes sense to you. The cost won’t change, so choose the best option for your architecture.
An improvement on this is to mark each individual account resource as “synced” or “not synced.” Every account would start in the “not synced” state, and the first time some transactions are found in it, then it would be marked as “synced” and moved on to a slower schedule thereafter. For the sake of simplicity, you can have two processes that run the same code except that one does it every five minutes and only for “not synced” accounts, and the other runs every 30 minutes for “synced” accounts.
What data should you store? You are not required to store anything at all unless you want. Again, because we expose resources via REST, you can just check the state at every time you’d like.
Here’s what you can do if you don’t want to store anything. It’s a bit extreme, but it works:
- Get all Person resources that you have previously created (GET /persons). This returns every resource ID paired with the email you sent when you created it.
- For every such person, get all accounts (GET /persons/<id>/accounts).
- For every such account, get all transactions (GET /accounts/<id>/transactions), with the timestamps argument that are best for you, right then.
- Show transactions to the user, aggregate them, plot graphics, etc. Make sure you show transactions only to the correct user! To do this, you’ll have to match the emails that you got at step 1.
If you do this, you have basically zero loads on your data storage (which can help save some money). There’s no free lunch though, and your data flow will be a bit slower every time because you have to run multiple API calls. But, of course, you would not execute this logic “for all users,” and “for all accounts.” That was just an example. When a user is logged in to your app then you could fetch their transactions from us, just-in-time. No need to do that for users that are not logged in.
Is your account reliable?
When you connect to Pentadata’s API, your account is as safe as it can get these days, because we use OAuth with JWT .
Your JWT token expires quickly (and it’s refreshed via API), and, more importantly, it’s not tied to any resource, so even if you lose it, it’s not too bad. You will never have to recreate an Account for the user.
My recommendation would be to save the IDs for Person and Account resources; these are just random digit sequences. Every person’s ID will have a one-to-one relationship with your user ID. Then, Account IDs are in a many-to-one relation with person IDs: In simpler terms, a person can have multiple accounts, but the same account belongs to one person only.
With that taken into account, and again assuming you are using a relational data model, I would add two tables into it: One to store the first kind of relation (your user IDs to the person IDs), and another for the second kind (every person ID to a series of account IDs).
Pentadata integration review
As you just saw, systems integration with the Pentadata’s Transactionz API  is very simple:
- Create Person resources.
- Create Account resources.
- Fetch Transactions from Accounts.
At any point in time you can get transactions, list all Person and Account resources that you have, or you can store them in your data layer. We don’t set constraints on your architecture, but don’t hesitate to reach out and ask for recommendations. You can also use our testing environment, which is a safe sandbox, free to access.