Asynchronous Synchronization For Decoupling
Problem Statement
Beanworks builds an Accounts Payable Automation software that integrates with various ERPs. Major part of any integration software is the data synchronization. At Beanworks, it's one-way content synchronization where destination always stays as a source of truth. Beanworks imports lists (ie. vendors, GL accounts) and supported types of documents from ERP (ie. purchase orders or payments created in ERP) and exports supported types of documents to ERP (ie. fully approved PO, invoices or payments created in Beanworks).
As the number of customers grew and the size of each customer got bigger, we started to notice a scaling problem with the existing export (data sync to ERPs) workflow. The export requests were handled synchronously - which meant the client has to wait for the server's response until export is complete. There were 3 main problems with this approach:
- Blocking
The synchronous API blocks the client until the client receives a response from the server. The user, as a result, waits indeterminate amount of time until every invoice selected to be posted to ERPs. This greatly disrupts the user's workflow as they are not allowed to navigate away from the page.
- Browser timeout
The blocking operation is not as bad if the response time is 'somewhat' reasonable. However, we started running into browser timeout as the response took too long, especially with large volume customers. The affected customers were exporting a huge batch of invoices at a time (up to a couple hundred of invoices). Export was coupled with multiple, expensive processing logic (serialization, pdf/csv generation, payment matching ,etc) which exponentially increased the execution time.
- Sync collision due to concurrency
Most of the ERPs allows a specific number of concurrent connections at a time. In order to prevent hitting the API request limit set by different ERPs, we've introduced a Lock table in the database that keeps track of locks created during sync operations defaulted to expire in an hour. Any subsequent sync requests during this time would encounter this lock and will throw an exception, alerting customers to try again later. We place the burden of retrying in the hands of customers.
Solution - Asynchronous syncs using AWS SQS
The team quickly came to an agreement of using message queues to asynchronously process sync operations. RabbitMQ and AWS SQS were the top contenders and we decided to go with AWS SQS1.
Challenges
There were a number of challenges implementing this changes.
There were multiple initiatives and feature work involving sync operations that had to be worked in parallel. As a team, we had to make sure that we came up with a design where it would allow parallel development by adhering to Dependency Inversion principle - high-level modules should not depend on low-level modules; both should depend on abstractions. We collaborated on a design document before any development work started where we agreed upon a sync service class. Any other initiatives that needed to call a data sync method now does not have to care if it is done synchronously or asynchronously - it simply calls
syncToErp
method and implementation detail of sync execution is hidden from the caller.There are a lot of of ERPs that we integrate with which creates a challenge of QA and the scope of refactor. We made use of a feature flag and explicit checks in the codebase to scope it to a few ERPs in the beginning. This allowed more iterative development and faster releases.
There is now an added complexity due to introducing another service to data synchronization. Before, it was a straightforward logic flow encapsulated in a single codebase. Now, there is external messaging queue that is introduced that makes the debugging experience more complicated. We made sure that we set up monitoring on the health of the queue and detailed logging in case thigns go south.
Result
Asynchronous syncs greatly improved the user experience. We purposely did not set up any benchmark for this refactor as it's impossible to measure the difference in performance. After all, making import and export operation asynchronous does not improve performance by itself - it simply implements mechanics in which it allows us to delay and carry out the operation in the background. It offers a completely different (and better) user experience in that we free up the client as soon as the request is made and result of syncs are shared asynchronously. The client now receives the response almost instantaneously!
© Hannah Kim.