I believe that most of you will learn projects from the Internet. Your first project is e-commerce, and you will use e-commerce apps all the time in your life, such as Taobao and JINGdong. Technical people all know that the business logic of e-commerce is simple, but most e-commerce will involve high concurrency and high availability, and have high requirements on concurrency and data processing. So what I’m going to do today is I’m going to talk about how inventory is deducted in the case of high concurrency, right?

The technical points we need to pay attention to for inventory deduction are as follows:

  1. The current remaining quantity is greater than or equal to the current amount to be deducted. Oversold is not allowed
  2. If the number of the same data has concurrent deduction, the consistency of concurrent deduction must be ensured
  3. You need to ensure availability and performance, at least in seconds
  4. A deduction contains multiple target quantities
  5. If there are multiple deductions, one of the deductions fails and the rollback is required
  6. There has to be a deduction to get it back
  7. The amount returned must be added back and cannot be lost
  8. A deduction can be refunded multiple times
  9. Restitution needs to be idempotent

The first solution: pure MySQL deduction implementation

As the name implies, deduction business is completely dependent on MySQL and other databases to complete. Without relying on some other middleware or cache. The benefits of a pure database implementation are simple logic and low development and deployment costs. (Suitable for small and medium-sized e-commerce).

The realization of pure database can meet the functional requirements of deduction business, mainly depends on two points:

  1. Optimistic locking based on database ensures strong consistency of concurrent deduction
  2. A database-based transaction implements a rollback of a batch deduction failure

Based on the above scenario, it contains a deduction service and a quantity database

If the data volume single database pressure is very large, you can also master and sub – database sub – table, service can do cluster, etc.

A complete process is the first data verification, in which to do some parameter format verification, here to do interface development, to maintain a principle is the principle of distrust, all data do not believe, all need to do verification judgment. Secondly, inventory deduction can also be pre-checked. For example, there are only 8 stocks in the current inventory, but the user wants to buy 10. In this case, the data verification can be pre-intercepted to reduce the database write operation. Pure read does not lock and has high performance. This method can be used to improve the concurrency.

update xxx set leavedAmount=leavedAmount-currentAmount where skuid='xxx' and leavedAmount>=currentAmount
Copy the code

This SQL implements atomicity in a manner similar to optimistic locking. Check whether the remaining number is greater than or equal to the required number after WHERE. Otherwise, the number fails.

After the deduction is complete, the flow data needs to be recorded. Each deduction requires the external user to pass in a UUID as the stream number, which is globally unique. The user passing in a unique number when deducting serves two purposes:

  1. When the user returns the quantity, this code is brought back to identify which deduction in history the return belongs to.
  2. Idempotent control. When the user calls the deduction interface and times out, because the user does not know whether it succeeded, the user can use this number to retry or reverse lookup. This number is used for anti-duplicate identification during retries

When the user buys only one item, if there are 8 items in the inventory, the item passes the verification. However, in the subsequent actual deduction, because other users are also making concurrent deduction, unreal reading may occur. At this time, the actual deduction by users is less than one, resulting in failure. This scenario results in one more database query, reducing the overall deduction performance. This is the time to upgrade the MySQL architecture

MySQL > Update MySQL

More than one query will increase the pressure on the database, and also have a certain impact on the overall performance. In addition, the interface provided to query the inventory quantity can also put pressure on the database, and read requests are much more than write requests.

According to the business scenario analysis, the request to read inventory is generally generated when the customer browses the goods, while the request to call the inventory reduction is triggered basically when the user buys. The business value of a user’s purchase request is greater than that of a read request, so it needs to be protected for a write request. To solve the above problems, the overall architecture of MySQL can be upgraded

The overall upgrade strategy adopts the way of read and write separation. In addition, the master/slave replication directly uses the existing functions of MySQL and other databases. The changes are very small, as long as two data sources are configured in the deduction service. When the customer queries the remaining inventory and deducts the pre-validation in the service, it can be read from the database. The real data deduction is done using the primary database.

After read/write separation, according to the 80/20 rule, 80% of the traffic is read, and the primary library pressure is reduced by 80%. However, the use of read/write separation will also lead to the problem of inaccurate data read, but the inventory quantity itself is changing in real time, temporary differences in business can be tolerated, and the actual deduction will ensure the accuracy of data.

Based on the above, you can also upgrade to increase the cache

Although the pure database scheme can avoid the situation of overselling and underselling, but the concurrency is really low, the performance is not very optimistic. So I’m going to upgrade here

The second option: cache deducts

This is the same as the previous inventory deduction. But now the deduction service relies on Redis instead of the database.

In view of the problem that Redis hash structure does not support batch operation of multiple keys, we can use Redis+ Lua script to achieve batch deduction of single thread request.

Upgrading to pure Redis to implement deductions can also be problematic

  1. Redis hangs. If the operation to reduce the inventory in Redis has not been performed, it simply needs to return a failure to the client. If it has been executed to Redis after inventory deduction hangs. That’s when you need a reconciliation process. By comparing Redis with the data in the database whether the consistency, combined with the deduction service log. When inconsistencies are found and logging deductions fail, Redis can add more inventory data from the database than Redis.
  2. The Redis deduction is complete and the asynchronous refresh of the database failed. At this time, the data in Redis is accurate, and the inventory of the database is much. After the logging combined with the deducting service determines that Redis deducts successfully but fails to record data asynchronously, more inventory data than Redis can be deducted from the database.

Although the use of pure Redis scheme can improve the concurrency, because Redis does not have transaction characteristics, there will be extreme cases of Redis data cannot be rolled back, resulting in under-sold situations. It is also possible for asynchronous write library failures to occur, resulting in multiple cache data that can never be retrieved.

Third option: database + cache

Sequential write performance is better

When performing data operations to disk, continuous appending to the end of a file provides much better performance than random modification. For traditional mechanical hard drives, each random update requires the head of the mechanical keyboard to address the disk and then update the target data, which is very performance consuming. But to add to the end of the file, each write only need a magnetic head address, the magnetic head to the end of the file, the subsequent sequential write can be added continuously.

For solid-state drives, although head movement is avoided, there is still a certain addressing process. In addition, random updates of file contents are similar to table updates of a database, with the performance cost of locking.

Databases also perform better when inserted than updated. For database updates, to ensure the consistency of concurrent updates to the same data, locks will be added during the update, but locking is very performance consuming. In addition, for update conditions without indexes, to find the data that needs to be updated, the whole table needs to be traversed in O(N) time. Insert only appends at the end, which is very good.

The architecture of sequential writing

Through the above theory, we can get a deduction architecture with both performance and high reliability

The difference between the above architecture and the pure cache architecture is that writes to the database are not asynchronous, but synchronous at the time of deduction. Synchronous writes to the database use insert operations, which are sequential writes, rather than updates, which change the number of databases, so performance is better.

Insert’s database, called the task library, stores only the raw data for each deduction, not the actual deduction (that is, no update). Its table structure is roughly as follows:

Create table task{id bigint not null comment "task id ", task_id bigint not null}Copy the code

The content stored in the task list can be structured data such as JSON or XML. Using JSON as an example, the data content can be as follows:

{" deduction ": uuid," skuid1 ":" number ", "skuid2" : "number", "XXXX" : "XXXX"}Copy the code

There must be a repository of business data that stores the real aggregate data minus the name companies and SKUs. In the case of data in another library, only this table can be asynchronously synchronized.

The deduction process

The difference between this and pure caching is the addition of transaction start and rollback steps, as well as the synchronous database write process

The task library stores JSON data in plain text and cannot be used directly. The data needs to be dumped to an actual business repository. There are two types of data stored in the business library. One is the flow data of each deduction, which differs from the data in the task sheet in that it is structured rather than large field content of JSON text. The other type is summary data, that is, how much is currently in total for each SKU and how much is currently left (that is, the amount that needs to be deducted when synchronizing from the task library). The table structure is roughly as follows:

Id bigint not NULL, uuid bigint not NULL comment 'sku_id bigint not null ', sku_id bigint not null comment' sku_id ', Num int not null comment '数 据'}comment '数 据'Copy the code

The real-time data summary table for commodities is structured as follows:

Create table {id bitint not NULL, sku_id unsigned bigint not NULL comment 'iD ', Total_num unsigned int not NULL comment 'total_num unsigned int not null comment'}comment 'record table'Copy the code

In the overall process, or reuse the last lecture of pure cache architecture process. When a new item is added or an existing item is replenished, the corresponding number of new items will be synchronized to the cache through Binlog. When deducting, the quantity in the cache shall prevail