How do you build a back end that handles millions of concurrent users efficiently and consistently?

Brands such as Nike, Adidas or Supreme have created a new trend in the market called “drops” where they release a limited number of items. There is usually a limited run or pre-issued limited offer before the actual issue.

This poses some special challenges because every sale is basically black Friday, and you have thousands (or millions) of users trying to buy a very limited number of items at the same time.

Main requirements

  • All customers can view project changes (inventory, description, etc.) in real time;
  • Handle sudden increase in load;
  • Minimize the interaction between the client and the back end;
  • Dealing with concurrency while avoiding race conditions;
  • Handling thousands of concurrent requests for the same project;
  • Fault tolerance
  • Automatic scaling
  • High availability;
  • Cost effectiveness.

The backend architecture

It all starts backstage with us. We need to keep drops information and available items in an appropriate data structure. We’re going to use two systems. PostgreSQL and Redis.

PostgreSQL, MySQL, Cassandra, or any other persistent database engine can be used. We need persistent and consistent sources of facts to store our data.

Redis is not just a simple key-value in-memory database, it also has special data structures (such as HASH and LIST) that can improve system performance by several orders of magnitude.

Redis is going to be our main force and take most of our traffic. The data structures selected will be the Redis HASH for each item and the LIST for each checkout queue.

Using HASH instead of TEXT, we can update only a single element of each project property, such as the project inventory. It even allows us to INCR/DECR integer fields without having to read the property first.

This will be very convenient because it will allow non-blocking concurrent access to our commodity information.

Every time a backend employee user makes any changes, Redis needs to be updated to ensure that we have an accurate copy of the PostgreSQL database for DROP data.

  • 1. Get auction information

When the user launches the application, the first thing it needs to do is request the DROP payload and connect to the WebSocket.

This way, it only needs to request information once and then gets it in real time every time there is a relevant update instead of constantly polling the server. This is called the push architecture and is particularly useful here because it enables “real time” while minimizing our network interference. Enable Redis to cache return information to protect our primary database.

But we need to consider certain things that might happen to Redis, and not trust it to always have our data. It’s a good habit to always think of Redis as a cache, even if it behaves like a regular database.

An easy way to do this is to verify that Redis does not return any data, then immediately load it from PostgreSQL and save it to Redis. Only then should you return the data to the user. This way, we minimize your interaction with the master database while ensuring that you always have the correct data.

Our flow is usually unevenly distributed. Identify the most affected endpoints and ensure that they have an extensible architecture.

  • 2. Before

This is the most critical and dangerous part of our system. We need to consider that the market is flooded with robots built specifically to take advantage of these systems. An entire underground economy thrives on buying and reselling these items. For this article, we will focus on usability and consistency without delving into the many options we must try to secure the system.

When DROP starts, the checkout endpoint will be filled with requests. This means that each request needs to be processed as efficiently as possible, ideally using in-memory databases and O (1) operations only while maintaining order, since this is a first-come-first-served business model.

This is where Redis shines, it checks all the markers!

  • O (1) Read/write operation;
  • In-memory database;
  • Single-threaded order without adding mutex complexity to our code;
  • Very high throughput and very low latency;
  • Specialized data structures, such as HASH and LIST.

At a very high level, when a user requests an checkout endpoint, we simply check to see if the Drop has started (Redis), if it is still available (Redis), if the user is already in the queue (Redis), and if he passes all the checks, to save the request to the checkout queue (Redis).

This means that we can do most O (1) operations in milliseconds and use the in-memory database to process checkout requests without worrying about concurrency and consistency.

  • 3. Top up your credit card and complete the transaction

Now, there is only one queue for each item, and all users who are trying or currently trying to buy that item are stored in those queues in order, and we have “all the time in the world” to process them asynchronously.

As long as you ensure that no queue is handled by multiple worker processes at the same time, we can have as many servers as we need and use an automatic scaling group to scale as needed to handle all queues.

This simple architecture allows us to try and collect fees from credit cards and even leave some room for retries and other checks.

If the first user in the queue fails due to CC problems or other issues, we can proceed to the next user in the queue.

It also allows us to treat unique item items or items with inventory in exactly the same way (just set inventory to 1).

Technical challenges

  • Load balancer

If you are using an AWS hosted load balancer, you may run into problems. Before the descent begins, the flow will be very low compared to where it started. This means that your load balancer allocates only a few cells (nodes), and it takes a few minutes to scale up to the required traffic during DROP.

Okay, we don’t have a few minutes, do we? … And we don’t want our load balancer to return 502 errors to the user.

We have at least two options here. Warm up your Load Balancer cluster with simulated traffic (for example, using lambda) or run your own Load Balancer cluster using HAProxy before discarding begins.

Both are effective, depending on the size of your team and your experience with these systems.

The third option is to contact AWS so they can preheat the LB, but since this is a manual process, I do not recommend doing so.

  • Redis is scalable

Regarding our Redis, if you start to have a lot of project and/or user involvement, it’s best to expand.

The best approach is to take a multiple-write node (clustered mode) approach rather than a master/replica architecture. This is mainly to avoid lag problems. Keep in mind that we want consistency without too much code complexity. Active – Active multi-zone is challenging and expensive, but sometimes it’s the only option.

You can assign placement items on these nodes using modular or deterministic hash functions.

Asynchrony is very important here. In this way, our total delay will only be the maximum delay from Redis nodes, not the sum of all delays from all used nodes.

In our use case, using the item ID as a partitioning key works well because the load will be evenly distributed throughout the key space.

Redis Master – Master Challenge

Major issues with Redis scaling to multiple data center regions around the world

  • The CAP theorem
  • Consistency between regions
  • Managing Multiple Deployments
  • It is no longer possible to maintain consistency using a simple Redis LIST
  • The cost of
  • Multiple zone failover
  • complex

Every time we need to deal with distributed systems, we need to make choices/compromises based on the CAP theorem. Our two main requirements are “availability” and “consistency”. We need to make sure we don’t oversell our goods because most of them are in limited stock. Most importantly, people are betting on a particular project, which means they won’t accept it if we tell them later that the shoes they bought aren’t usable after all.

In practice, this means that if a region cannot communicate with the main region, it will block checkout. This also means that if the primary regions are offline, each region will not work properly.

We decided to get A and C from the CAP theorem, but unfortunately that doesn’t mean we got them for free, just because we gave up the P requirement…

If we just used Cassandra or CockroachDB, we’d be off the hook!

Inspired by Google Spanner and its concept of external consistency, we decided to “use atomic clocks and GPS” as our source of truth. Fortunately, AWS released a free service called Time Sync that was perfect for our use case!

With this AWS service, all of our computers around the world are synchronized.

Relying on the machine clock eliminates the need to obtain a timestamp from the API during a transaction, thus reducing wait time and eliminating the need to install circuit breakers (as is the case with external calls).

When the order arrives, we just need to get the current time and send it to the main Redis instance. Using an asynchronous model is important for handling these requests because “remote” requests can take a while to complete, which can cause bottlenecks when trying to service thousands of concurrent requests.

Previously, we used a Redis LIST per item to keep “order orders”. Now that we are using time stamps to maintain our consistency across the globe, the LIST is no longer the best data structure. … Unless you want to reshuffle the cards every time you get an out-of-order packet or perform an O (N) operation when you need to pop up the first order.

Sort set (ZSET) can be saved!

Using time stamps as scores, Redis will keep everything normal for us. Adding a new order to the project is simple:

ZADD orders:<item_id> <timestamp> <user_token>

To view the order for each item, you can:

ZPOPMIN orders:<item_id>
 

The cost of

Running multiple locales is often expensive, and our case is no exception.

Replicating PostgreSQL, Redis and EC2 globally is not cheap, forcing us to iterate on solutions.

Finally, we need to understand where we need to optimize and where we can compromise. I think this duality applies to almost all situations.

The user flow requiring the lowest latency is loading the application and placing orders in the queue. Everything else is lagging behind. This means we can focus on that path and be less stringent about other user interactions.

The most important thing we did was remove the local PostgreSQL instance, adjust the back end to use Redis only for the critical path, and tolerate the final consistency so that we could get rid of it. This also helps reduce API computing requirements (win-win!) .

We also used AWS bidding instances for apis and asynchronous workers.

Finally, we optimized the PostgreSQL cluster to get out of trouble with smaller than usual instances.

In summary, our Redis is used as:

  • cache
  • Checkout queue
  • Asynchronous task queue
  • Use the Websockets backend for the channel

This represents significant savings in cost and overall complexity.

Multiple zone failover

Multi-zone failover can take different forms. Unfortunately, none of them can be called perfect. We will need to choose where to compromise:

  • RTO (Recovery time Target)
  • RPO (Recovery Point target)
  • The cost of
  • complex

Every use case is different, and the solution is almost always “it depends.”

For this use case, we used three mechanisms:

  • RDS automatic backup – Multi-region single-region point-in-time recovery
  • Scripted Manual Snapshot-Multi-zone Recovery using AWS Lambda
  • Read only copies across regions – Multi-region availability and optimal data persistence

To avoid the high cost of having a mostly idle copy of PostgreSQL, this instance is a smaller machine that can be scaled up and upgraded to a “primary” instance in case of emergency. This means we have to endure a few minutes of offline time, but I think it’s a good compromise for this use case.

Promoting a read-only copy is usually faster than restoring a snapshot. Plus, there’s less chance of losing data this way.

Just make sure the smaller instances can keep up with the write load. You can monitor it (or better yet, configure alerts for it!) by monitoring its replica latency. . A reboot will “fix” the delay, but if the size difference is at the root of the problem, you are just pushing the problem, and you should increase the instance size.

Manual snapshots give you an additional layer of security and “data versioning” at the expense of S3 storage.

AWS has a very good article on DR here.

Finally, I recommend ensuring consistency and persistence, even if there is a price to pay for some manual interaction with the process. If you have the engineering time and ability to safely automate the entire process and run the walkthrough from time to time to ensure that all automation is reliable, do it. It’s the Holy Grail of Netflix and the like.

If not, it is highly unlikely that an entire region will be lost, and you can manually failover to a new region in less than 30 minutes as long as you have an alert (if you follow the steps in sequence).

www.jdon.com/56418