Introduction: This paper briefly introduces the large-scale order system scheme based on MySQL and Tablestore. This solution supports big data storage, high-performance data retrieval, SQL search, real-time and full data analysis, and is simple to deploy and low cost to operate and maintain.

The author source yihong Nantucket | | ali technology to the public

A background

Order system exists in all walks of life, such as e-commerce orders, bank statements, operators’ phone bills, etc. It is a very broad and universal system. For this type of system, the development of the past decade has become a classic practice. However, with the development of the Internet and enterprises’ emphasis on data, more and more orders need to be stored and persisted, which brings new challenges to the importance of data and the expansion of data scale. First of all, the order volume brings challenges to data storage, persistence and access, which not only increases the difficulties faced by development, but also brings challenges to the operation and maintenance of the system. Secondly, with the development of big data technology and the continuous improvement of operation level, the subsequent data analysis work of order data, such as stream batch processing and ETL, is becoming more and more important, which also puts forward higher requirements for data storage system.

This paper proposes a design scheme of large-scale order system based on MySQL + Tablestore. This scheme is based on the idea of hierarchical storage, using Tablestore to assist MySQL to complete the order system support. In the system, the transaction ability of MySQL is used to deal with the write operation and partial read operation which have strong requirement for transaction. Use Tablestore’s retrieval ability, big data storage ability to make up for MySQL’s shortcomings in function. Detailed article: Evolution of data storage architecture for application systems on the cloud.

This paper is an architecture part of the large-scale order system with MySQL + Tablestore hierarchical storage architecture.

  • First elaborate, in the large-scale order system, what needs exist, what pain points exist.
  • Then compare the traditional architecture, its current situation, what are the disadvantages of each, which can not meet the requirements.
  • Then I will describe the MySQL + Tablestore architecture and explain how this architecture meets the needs of a large-scale order system.

Ii Requirement Scenario

Order system, facing c-terminal, in addition to high system performance requirements, for data storage, subsequent data calculation, real-time data processing, data batch processing have certain requirements. They also have different requirements for different roles, such as C-terminal customers, product operation, and system operation and maintenance.

1 C-end requirements

For C-side clients and c-side development, the system first needs to support high concurrency and high stability. Second, the system needs to be able to support searches based on user IDS and searches for records under user IDS that contain specific keywords. Specific requirements include:

  • Find the user’s orders in the last month based on the user ID.
  • Query order details based on the order number.
  • Search for items that users have purchased that contain a certain keyword.

This system index ability and search ability have higher requirements.

2 Operation Requirements

Operation students need to be able to use SQL to analyze real-time data without affecting the online situation, and be able to search according to non-primary key fields; They also need system support for streaming batch computing, streaming data processing for real-time data statistics, and batching for historical data statistics. Common demand scenarios of operation students are as follows:

  • Count the users who have consumed in a flagship store.
  • Count the customers who have consumed a certain product and what other products they have purchased, and then recommend products to customers.
  • Real-time statistics of the double eleven after the start of the real-time transaction amount, for the promotion of real-time data display.
  • Count the turnover of a store in the past 10 years.
  • Rely on order data to do real-time updated portrait analysis of customers to support product recommendations.

3 O&M Requirements

Operation and maintenance students pay more attention to system stability and complexity and expect low operation and maintenance costs. The order system based on MySQL + Tablestore can liberate the operation and maintenance students from the tedious operation and maintenance work and greatly reduce the operation and maintenance cost. It can do:

  • The system is highly available and has strong concurrent capability.
  • The system complexity is low, and you do not need to maintain multiple clusters or pay attention to data synchronization between clusters. Therefore, o&M is easy to perform.

Traditional order system

1 Order system architecture evolution

The simplest order system is a single point of MySQL architecture, but with the growth of order scale, users use MySQL to replace the single point of MySQL scheme. However, in this scheme, when the amount of data reaches the bottleneck of the current MySQL cluster, cluster expansion will still be quite difficult, requiring a larger cluster and a large amount of data migration. The problem of data iteration and expansion is difficult to overcome in MySQL scheme.

NoSQL was introduced, and MySQL + HBase came into being. In this solution, real-time data and historical data are stored in layers. Only real-time data is stored in MySQL, and historical data is archived in HBase. This solution solves storage and o&M problems caused by data expansion. However, data stored in HBase cannot be properly used and the solution does not support the search function.

Therefore, Elasticsearch is introduced in the architecture to form the MySQL + HBase + Elasticsearch solution. This solution takes advantage of the data retrieval capability provided by Elasticsearch to solve the order data search problem. However, in this architecture, users need to maintain HBase and Elasticsearch clusters and synchronize data to HBase and Elasticsearch. Therefore, maintenance costs are high. And this architecture still cannot support data analysis and processing such as stream batch processing and ETL.

MySQL database and table scheme

MySQL itself has powerful data query and analysis functions. The order system created based on MySQL can deal with multi-dimensional query and statistical scenarios of order data. With the increase of order data volume, users will adopt separate database and separate table solutions to solve the problem caused by data expansion through this pseudo-distributed solution. However, once the data reaches the bottleneck, it is necessary to re-create a larger scale branch library + full migration of data, and trouble will continue to appear. Data iteration and expansion are difficult to overcome in MySQL scheme. Just rely on MySQL traditional order scheme shortcomings highlighted.

1. Longitudinal data expansion (data scale) : The scheme of database and table is adopted. MySQL needs to estimate the database scale during the deployment.

2. Data horizontal (field dimension) expansion: Schema needs to be predefined, and new fields are added in iteration and change is complicated. When the dimension reaches a certain amount, the database performance will be affected. Data inflation also increases the difficulty and cost of system operation and maintenance. In addition, MySQL clusters generally use double capacity expansion strategy. In the scenario of restorage and low computing order, CPU waste is also serious.

MySQL + HBase solution

The introduction of dual data scheme arises at the historic moment. The problem of data volume expansion can be solved to a certain extent through the scheme of real-time data and historical data storage. The scheme classifies data into two parts: real-time data and historical data. At the same time, expired data can be synchronized to historical data through the data synchronization service.

1. Real-time order data (for example, orders in the last 3 months) : Real-time orders are stored in MySQL database. The speed of the total expansion of real-time orders is limited, and the multi-dimensional query and analysis of real-time data are ensured.

2. Historical order data (for example, orders made three months ago) : The historical order data is stored in HBase, and the distributed NoSQL database of HBase effectively solves the problem of order data inflation. It also ensures the persistence of historical order data;

However, this scheme sacrifices the use value of historical order data to users, merchants and platforms, and assumes that the demand frequency of historical data is extremely low. But once there is a demand, it needs to scan the full table, query speed is slow, I/O cost is high. However, maintaining data synchronization brings problems such as data consistency and soaring synchronization operation and maintenance costs.

MySQL, HBase, and Elasticsearch

The MySQL + HBase + Elasticsearch solution uses the Elasticsearch cluster to solve the data retrieval problem that other solutions cannot handle.

1. Real-time order data (for example, orders in the last 3 months) : Real-time orders are stored in MySQL database. The speed of the total expansion of real-time orders is limited, and the multi-dimensional query and analysis of real-time data are ensured.

2. Historical order data (for example, orders made three months ago) : The historical order data is stored in HBase, and the distributed NoSQL database of HBase effectively solves the problem of order data inflation. It also ensures the persistence of historical order data;

3. Data search: Data synchronization synchronizes fields to be retrieved from HBase to Elasticsearch, which provides data search capabilities for the system. Then check MySQL for complete order information if necessary;

Although this scheme solves the problem of data expansion, it can also support data retrieval. However, it can be seen that the customer needs to maintain at least two sets of clusters and pay attention to two data synchronization tasks. This solution has high system complexity and operation and maintenance costs. In addition, this solution still does not support streaming batch processing of data, data ETL reprocessing.

2. Summary of traditional order structure

In short, MySQL database and table scheme cannot solve the expansion problem caused by data expansion. The architecture based on MySQL + HBase has obvious shortcomings in data retrieval. The MySQL + HBase + Elasticsearch scheme can solve the problems of capacity expansion and data retrieval, but it has a complex system and high maintenance cost. In addition, this scheme cannot provide effective support for data analysis and data reprocessing ETL work. MySQL + Tablestore not only solves the problems of capacity expansion and retrieval, but also supports batch processing of data stream and ETL reprocessing, with low system complexity and operation and maintenance costs, which can meet various requirements of large-scale order system.

MySQL + Tablestore

Tablestore is a multi-model structured data storage developed by Ali Cloud, providing massive structured data storage and fast query and analysis services.

After MySQL + Tablestore, it can well meet various needs in large-scale order scenarios. Its overall architecture is shown in figure.

MySQL handles the writing of orders and basic reading of recent data, and uses data synchronization tools such as DTS to synchronize data to Tablestore in real time. In Tablestore, with its secondary index and multivariate index, you can handle the retrieval needs. Through DLA, you can use SQL to query Tablestore directly. The Tablestore channel service connects to Spark Streaming and Flink for real-time data analysis. Tablestore and ODPS docking, can be very convenient to achieve the ETL operation of order data. Combined with OSS and Tablestore, order data can be archived and full historical data can be analyzed in OSS.

1 Data Synchronization

In traditional order architecture, developers inevitably have to deal with data synchronization into HBase or Elasticsearch. This not only aggravates the development work of developers, but also improves the difficulty of operation and maintenance. In Tablestore, Ali Cloud provides DataX, Data Transmission Service (DTS) and Canal Data synchronization tools to complete Data synchronization from MySQL to Tablestore. Users can complete real-time data synchronization with little development and configuration work. Easy to operate, high real-time, greatly reduce the maintenance cost.

2. Data Retrieval

Tablestore provides secondary index and multivariate index to support data retrieval. Secondary indexes can perform data queries based on primary key columns and predefined columns, such as the order status of the user in the past month. Multivariate index, based on inverted index and column storage, provides a more powerful data retrieval function, which solves the complex query problems of big data. It can fulfill requirements such as searching for users who have purchased a product.

Tablestore’s multivariate index makes up for the shortcomings of MySQL and HBase in search. Compared to Elasticsearch, multivariate indexes do not require users to maintain clusters and data synchronization tasks, which is much cheaper.

3 Data analysis based on SQL

Tablestore supports SQL reading and writing data in Tablestore in a variety of ways. If you want to directly read the data in Tablestore, it is recommended to directly use Tablestore SQL support ability to operate; If you want to perform federated queries from multiple data stores, SQL supported by DLA is recommended. For both forms of SQL, Tablestore fully optimizes them with multivariate indexes. With SQL processing ability, developers can be more efficient code development, code migration work. Querying Tablestore directly with SQL will also unload traffic for the MySQL primary library.

4 Real-time data analysis

Tablestore channel service, you can pass changes in Tablestore library data into the channel. Using Spark Streaming or Flink to connect channels, real-time data analysis requirements such as real-time transaction volume statistics can be realized.

5 Analysis of historical data

Tablestore can deliver data to the OSS system, so that the order can be archived. In addition, the OSS system can be connected to Spark, so that the full historical data can be analyzed. In this way, recent data is stored in Tablestore, full historical data is stored in OSS, and analysis involving full historical data is supported by OSS.

6 ETL data reprocessing

By connecting Tablestore data to ODPS, the powerful data processing ability of ODPS can be used to do ETL operations on data more conveniently and reprocess data.

Five summarizes

This paper briefly introduces the large-scale order system scheme based on MySQL and Tablestore. This solution supports big data storage, high-performance data retrieval, SQL search, real-time and full data analysis, and is simple to deploy and low cost to operate and maintain.

The original link

This article is the original content of Aliyun and shall not be reproduced without permission.