Principal requirements for mission critical payments infrastructure are:
- High Availability (HA): There should not be any single point of failure to bring the system down as a whole.
- High Performance (HP): The execution of the software steps should be achieved with efficiency and minimum latency, without impacting quality.
- Scalability: The system should scale linearly with increase in usage and additional cost of adding new customers should be negligible
- Reliability: The system should be consistent in its performance notwithstanding load and volume.
- Security: The system should be secured from all forms of breaches and follow all good practices of data governance for data at rest, data in motion and follow principles of data dignity.
- User Experience (UX): Ultimately, the systems are built for users to consume the services. The experience of the users becomes game changer and can even impact the business as a whole.
- Information and Analytics Services (IAS): It is the information and analytic services that support the operations and compliance users for the OEM, system owners as well as their customers. part 1 of this article, we will focus on Resiliency – the key behind mission critical digital payment infrastructure
Often, we get to hear that a payment system is 99.999% available. Which translates to maximum of 5 minutes of downtime in a year or ~864 milliseconds of disruption per day. Does this mean that the system does not experience a failure due to load, or due to wrong data, or due to hardware failure? No, that is not true, however, the system is resilient to such faults and can still operate without appreciable degradation of services.
RS Software have been associated for now more than 30 years in building, enhancing, maintaining and operating mission critical payment system platforms – both card-based payments and account-based payments – that exhibit high resiliency.
What does it take to build such a resilient system? It starts with taking responsibility for not only the system of immediate concern but expanding the sense of responsibility beyond the ambit of “my” system well into the adjacent layers of operation, and sometimes all the way to the far edges of the technology rails. Further, for the system of immediate concern, all components – online, offline, monitors, and others, all need same level of attention and resilience. Essentially each component of the system must be capable of working in synch. With all the other components. The weakest link in the chain will bring down the performance of the entire system drastically.
When payment systems are built, they need to cater to the existing offerings and must be flexible and scalable to support the business innovations that are dynamic and market driven.
If the payments using cards were built just to cater to retail payment to merchants for groceries, they could not have supported card-to-card immediate payments. As an example, in 2021, Visa Direct transactions grew 60% YoY. If real-time payments were built only with P2P payments in mind, the rising use-case of P2 (typically small merchants) could not be supported. For instance, in India, large share of the UPI transactions are now in the category of P2M, as of August 2022.
A simplistic view of a payment system is one that moves money between two entities executed within the framework of financial governance. Thus, there are multiple steps associated with it, the salient ones are – payment initiation, payment completion (settlement), notification of debit or credit or both, and statutory postings.
Today payment initiations and notifications are typically available as APIs and this is helping FinTechs to build innovative services for the community at large and thus foster adoption. Forbes Business Council predicts that embedded payment revenues will grow from $43 billion in 2021 to $138 billion in 2026.
Now, a closer look will reveal that the fault could originate in any of the steps, and this could be due to human fault, software system fault, hardware system fault, governance fault or even a pan-systemic fault. The fault most likely will result in downtime that will need time to recover (Recovery Time). But the even more serious issue would be data loss, for which one can only go back to certain point (Recovery Point) in the past till which the data was saved.
Payment systems are built with low Recovery Time Objective (RTO) and very low Recovery Point Objective (RPO). Typical deployment architecture patterns that are used to maintain low RTO and RPO are – Active-Passive, where one site is active and the other site is hot standby, and Active-Active, where both the sites are hot and if needed one can take up the load of the other site near instantly.
RS Software designs the payment systems with all the aspects mentioned above to maintain the resilience of the systems. We apply AI/ML at appropriate points in the system so that the system can intelligently react to exigencies. If an entity, say a bank, is not responding within “usual” response time, our systems can offer users alternative options or defer to “smart retries” in order to uphold a delightful user experience.
Resilience is the underlying principle that drives the design of mission critical payment systems, and RS Software remains associated with mission critical payment platform that move money across the globe.
To be continued – stay tuned for part 2.