Technical challenges

The value of Instant Payments for the users comes from the fact that Instant Payments promise almost immediate completion of the payment, 24 hours a day, 7 days a week. But this value comes with challenging non-functional requirements as a trade-off.

  • Low latency

    Instant Payments need to be instant. The current specifications from the EPC SCT Inst scheme rulebook list a 10 second target, while several communities (most notably The Netherlands and Belgium) aim for an even faster 5 second roundtrip. Because this is an end-to-end target, the allotted time is divided between the participants, the clearing house and the network, meaning each party should aim for sub-second processing. For a typical participant, this includes validating the payment message from the customer, checking the account balance, debiting the account in the ledger, checking for reach and availability of the beneficiary bank, fraud checking, sending out the message to the clearing house and correlating the return message. Sub-second processing times can only be met if participants make sure the process is optimized for processing single messages instead of files and batches. They need to make use of parallel processing steps (validating the message and checking the reach and availability can be done in parallel to the fraud check for instance) and most importantly, they need to optimize the database I/O as this typically is the biggest bottleneck. This can be done with various methods; by loading static tables in memory, by optimizing indexes or even switching to different database technology, such as MongoDB or Cassandra.

    24/7 availability

    Instant Payments not only need to be instant, they need to be available. This requirement may have the biggest impact from a technical and operational perspective. And although running high available systems is not complex from a technical viewpoint (we have been running 24/7 available payment processing systems in cards for decades), it does come with its own challenges. The high availability requirements can only be met by running an active-active setup, where two datacenters are processing in parallel and one datacenter can process the full load when another datacenter goes down and there are no delays in switching traffic from one datacenter to the next. This setup can be made even more robust by running multiple nodes per datacenter. The location of the datacenters need to spaced sufficiently apart in order to have different risk profiles (i.e. a grid power outage in one datacenter does not affect the second site), but close enough to maintain a low latency. The multiple datacenter setup can also be used to do upgrades and maintenance without downtime.

    Connectivity

    This brings us to connectivity, as both the requirements of (guaranteed) low latency and high availability apply here. The current “go-to” network for financial messaging, SWIFT, has recently announced their play in Instant Payments, which will most likely offer the required latency and availability figures necessary for Instant Payments. Alternatives are available in SIANet and EBICS, where the latter is a self-managed network. Participants could also opt for a (dual) leased line or MPLS connection directly into their clearing and settlement mechanism (CSM).

    Scalability

    Looking at implementations in the UK (growth from 20 million transactions per month half year after launch to 100 million transactions per month 6 years later), Sweden (Swish on boarded 5 million users in 3 years) and Denmark (28 transactions per capita in 2 years with 4% monthly growth), any Instant Payments system needs to be scalable from the start. It is critical the Instant Payments processing is designed to be scalable (up and out) without impacting the latency of the service, for instance by using components designed with parallelism and asynchronism in mind. By using a load balancer/dispatcher model, horizontal scalability can be obtained, which is crucial in order to keep the TCO under control, as costs rise exponentially when scaling vertically, while costs rise linearly when scaling horizontally. Additionally, methods such as caching, asynchronous processing and striving for statelessness are crucial in obtaining cost efficient performance and scalability.