Tair, the self-developed in-memory database of Ali Cloud, was born in 2009. It is a cloud native in-memory database that supports high concurrency and low latency access. It is fully compatible with Redis and has passed the test of Double 11 promotion for many years.
As a sharp tool to promote the peak of carrying traffic in Double 11, Tair supports the core experience scenes of e-commerce transactions. It not only maintains sub-millisecond smooth delay at the peak of billions of QPS, but also makes technological innovations in the core experience scenes of e-commerce transactions.
The technical innovation made by Tair in the core inspection scene of 2021 Double 11 comes from a sub-product code-named TairSQL internally. TairSQL serves the corresponding scene with the latest technologies such as persistent memory storage, efficient transaction processing model and lightweight user interface access. Support the new retail real-time discount computing engine to obtain user-level coupons, red envelopes, points and other asset data to ensure price consistency. For the first time in the assisted shopping cart scenario, the real-time display of the coupon price under high concurrent traffic enables users to see the actual price of goods at a glance, significantly improving consumer shopping experience and user efficiency.
In the peak period of Double 11, users will automatically receive coupons after placing orders and the write-off of assets after successful transactions will bring corresponding write traffic to the database system. The write delay of millisecond level must be kept at a low level to ensure that users can feel the consistent changes of the price in the shopping guide scenes such as commodity search and detail display. At the same time, TairSQL needs to respond to high-throughput query loads, and latency requirements are more demanding.
TairSQL use persistent memory as the data storage medium, eventually reduce the access link on the IO delay, no need to traditional database product time consuming, the data on the disk cache out to exchange frequently, and in view of the index according to the data access frequency on, user and reasonable data distribution, make high frequency index update queries in DRAM is complete;
In a horizontally scaled cluster, each node serves dozens of partitions, and each partition uses a single thread response transaction model to avoid the overhead of lock contention and provide smoother P99 access latency. The lightweight user interface access technology reduces the overhead of SQL parsing and compiling for each user request. Combined with the transaction processing model, the user’s read and write requests can be processed and returned within hundreds of us.
TairSQL’s key persistent memory storage technology is another milestone in Tair’s continued innovation in technology applications: Based on Intel Outton ™ persistent memory hardware, Tair officially began to invest in the research and implementation of persistent memory in 2018, which was successfully applied to the core cluster of e-commerce goods on Double 11 of that year, serving the KV cache scenario, greatly reducing the cost. It is the first product that uses Intel persistent memory hardware in production environment in China.
Tair persistent memory is compatible with Redis, and data persistence does not depend on traditional disks. It ensures the persistence of each operation and provides throughput and delay similar to Redis community edition, greatly improving the reliability of service data. TairSQL based on Tair persistent memory architecture is compatible with SQL write query, and serves the business with high throughput and demanding delay requirements. The peak value of a single cluster can reach 400W write and 800W read, and the query delay is stable below 1.5ms, further expanding the breadth of computing scenarios supported by Tair.
Not only does Tair constantly explore the application of new computing scenarios, but also pays close attention to the details of system operation and daily user feedback for the covered scenarios, so as to make in-depth and meticulous continuous optimization:
By serving more and more user scenarios on the cloud and within the group, Tair collects user feedback that sets higher demands on the range of supported scenarios, access performance, and cost performance. Based on these requirements, Tair persistent memory solves the core optimization technology to dynamically and adaptively move data between DRAM and persistent memory, ensuring that the space occupied by user index and data area is maintained within a fixed proportion, and meeting the data storage requirements in different user scenarios.
In addition, it is deeply integrated with the Kernel technology of the Aliyun Linux operating system, which is compatible with data snapshot requirements in scenarios such as active/standby replication and real-time backup, and greatly reduces the impact of real-time snapshot latency in scenarios with large memory footprint. In addition to cover more support model and the performance optimization of high-frequency scene, in terms of providing higher cost performance, Tair lasting memory model can simplify the lasting memory storage structure of independent research and development of metadata takes up space, and lists and Hash users against high frequency compression, using the data structure of the intensification of transparent under the stable data persistence performance, Achieve 1-2 times the data compression rate, significantly reducing the hardware cost of the data persistence version.
TairCPC, which made its debut in the 2020 Singles’ Day, has also shifted its capabilities to Tair persistent memory products this year. The aggregative operator Sketches capability provided by TairCPC is sunk into the storage engine in the form of modules, which can make use of a small space to do high-performance calculation of sampling data. Users can directly return real-time calculation results after incremental writing. The risk control business of TairCPC is used as the core module of the group’s transaction link, which directly affects the security of the entire online transaction. TairCPC is used in the real-time risk control scenario of the core real-time computing link of the product.
With the help of Tair persistent memory, the scene of this year’s Double Eleven saves about 1/3 of the storage space. With the cost advantage of persistent memory, the cost of users is greatly reduced. For TairCPC and Tair persistent memory, a lot of performance optimization is carried out, which makes the performance of many scenarios equal to that of memory, and improves the performance of slow search by an order of magnitude, effectively improving the system stability. Complete persistence of data (RPO=0) is achieved with little impact on performance.
The innovation made by Tair cannot be achieved without the support of Ali Cloud’s perfect infrastructure: DBaaS, the cloud native database management and control platform, quickly realizes the general capabilities such as security audit, high availability, elastic scaling and intelligent diagnosis provided by Ali Cloud database, as well as the enterprise-level capabilities such as data flashback and global distribution provided by Tair. For Tair persistent memory type, DBaaS combined with Ali Cloud container service ACK supports the affinity scheduling of persistent memory resources and computing resources to reduce the delay of persistent memory access, provide QoS policy support for persistent memory, and guarantee the security and controllability of service and product consistency experience.
Dragon bare-metal server provides lasting memory series product, provides a flexible Tair services, the basis of targeted against sudden flow, optimization of network technology make Tair response to high throughput, scene of memory and other hardware risk intelligent prediction Tair can foresee big promote the peak level of risk in advance to avoid;
Aliyun Linux not only ADAPTS persistent memory hardware, but also optimizes Tair’s unique persistent memory data snapshot support and real-time snapshot latency reduction.
Although 2021 Tmall Double 11 has come to an end, we will not stop. Tair, the cloud native memory database, will continue to explore new application scenarios to provide users with better and more comprehensive database services.
The original link
This article is the original content of Aliyun and shall not be reproduced without permission.