I. Project background

Order system exists in all walks of life, such as e-commerce orders, bank statements, operators’ phone bills, etc. It is a very broad and universal system. For this type of system, the development of the past decade has become a classic practice. However, with the development of the Internet and enterprises’ emphasis on data, more and more orders need to be stored and persisted. The importance of data and the expansion of data scale bring new challenges.

Demand scenarios

An e-commerce platform A needs to persist the order data generated by all platforms. At the same time, based on all the order data, the system needs to provide a variety of inquiry services for a variety of roles: consumers, shop owners, platform three groups. Consumers can query their historical orders, merchants can count hot products, and the platform can analyze user behavior and transaction scale on the platform. The main query methods cover multidimensional order retrieval, as well as order data analysis, statistics, etc., such as: consumer-oriented: [A consumer] * [nearly 1 year] * [sell computer] order query; Salespersons: [B salespersons] * [recent 1 month] sales orders; .

Technical point

In the order scenario, technical points usually need to be considered mainly include the following aspects:

  • Query ability: need to have a variety of query types, such as multidimensional, range, fuzzy query and so on, with sorting, statistics and other functions;
  • Data volume: It can store massive data and meet the requirements of strong consistency, high availability and low cost.
  • Service performance: To deal with high concurrency requests and high concurrency, while ensuring low latency;

Realize multi-dimensional and real-time query function, which is the core function of order management solution.Project sample

Ii. Program evolution

For order scenarios, e-commerce usually adopt MySQL’s traditional solution. With the help of the powerful query ability of relational database, users can directly achieve multi-dimensional query and data statistics of order data through SQL statements. The so-called data inflation can be divided into horizontal and vertical dimensions. Horizontal dimension refers to the new field dimension introduced by continuous iteration, and vertical dimension refers to the total amount of stored data. In the face of these two types of order data bloat, the single MySql scheme becomes increasingly difficult. The combination scheme of SQL + NoSQL (hereinafter referred to as the combination scheme) came into being, with the help of the respective advantages of the two databases to solve the respective needs of different scenarios. However, the combined scheme also brings new problems. The combined scheme sacrifices space cost and increases development workload and operation complexity. Additional overhead is incurred in ensuring data consistency. Let’s take a look at the following general scenarios:

Conventional scheme

1. MySql database and table scheme

MySql itself has powerful data query and analysis functions. It can create an order system based on MyQql, which can deal with multi-dimensional query and statistical scenarios of order data. With the increase of order data volume, users will adopt separate database and separate table solutions to solve the problem caused by data expansion through this pseudo-distributed solution. However, once the data reaches the bottleneck, it is necessary to re-create a larger scale branch library + full migration of data, and trouble will continue to appear. Data iteration and expansion are difficult to overcome in MySql scheme. Just rely on MySql traditional order scheme shortcomings highlighted. 1. Longitudinal data expansion (data scale) : The scheme of database and table is adopted. MySql needs to estimate the database scale during the deployment. 2. Data horizontal (field dimension) expansion: Schema needs to be predefined, and new fields are added in iteration and change is complicated. When the dimension reaches a certain amount, the database performance will be affected.

2. MySql+HBase solution

The introduction of dual data scheme arises at the historic moment. The problem of data volume expansion can be solved to a certain extent through the scheme of real-time data and historical data storage. The scheme classifies data into two parts: real-time data and historical data. At the same time, expired data can be synchronized to historical data through the data synchronization service. 1. Real-time order data (for example, orders in the last 3 months) : Real-time orders are stored in MySql database. The speed of the total expansion of real-time orders is limited, and the multi-dimensional query and analysis of real-time data are ensured. 2. Historical order data (for example, orders made three months ago) : The historical order data is stored in HBase, and the distributed NoSql database of HBase effectively solves the problem of order data inflation. It also ensures the persistence of historical order data; However, this scheme sacrifices the use value of historical order data to users, merchants and platforms, and assumes that the demand frequency of historical data is extremely low. But once there is a demand, it needs to scan the full table, query speed is slow, I/O cost is high. However, maintaining data synchronization brings problems such as data consistency and soaring synchronization operation and maintenance costs.

MySql+Elasticsearch

MySql+Elasticsearch also stores data in two parts, which can solve the problem of order index dimension growth to a certain extent. Users maintain data synchronization services to ensure data consistency between the two parts. 1. Full data: The full order data is stored in the MySql database, and the data other than the order ID is stored as a whole field. The full data is stored as a persistent data store and is also used for backlookup of non-index fields. Select * from Elasticsearch (based on Lucene distributed index database), select * from Elasticsearch (based on Lucene distributed index database), select * from Elasticsearch (based on Lucene distributed index database), select * from Elasticsearch (based on Lucene distributed index database), select * from Elasticsearch (based on Lucene distributed index database), select * from Elasticsearch (based on Lucene distributed index database), select * from Elasticsearch (based on Lucene distributed index database); This scheme dealt with the problem of data dimension expansion, but with the continuous expansion of orders, the problem of poor MySql scalability was exposed again. At the same time, the data synchronization scheme to Elasticsearch has high development, operation and maintenance costs, and there are drawbacks in the scheme selection.

Ability to analyze MySql HBase Elasticsearch TableStore
storage Line storage Column storage Indexes are stored Column storage + index storage
scalability Single machine, poor scalability Horizontal scaling Horizontal scaling (Automatic) horizontal scaling
consistency Strong consistency Strong consistency, timing consistency
Strong consistency, timing consistency
retrieve Weak support Does not support support support
The amount of data ~ 1T, ~ billion lines ~10 PB, ~ trillion rows ~1 PB, ~ 100 billion rows ~10 PB, ~ trillion rows

TableStore scheme

If the use of TableStore research and development of multiple index (SearchIndex) scheme, it can be a perfect solution to order system problems of hundreds of millions of orders. TableStore has the characteristics of ready-to-use, charge by quantity and so on. Multiple indexes can be created at any time, which is an excellent solution for metadata management of massive e-commerce orders. TableStore, as a fully hosted, distributed NoSql type data storage service provided by Ali Cloud, has the functions of [massive data storage], [hot data automatic fragmentation], [massive data multidimensional retrieval] and so on, which naturally solves the challenge of big explosion of order data; At the same time, SearchIndex provides multi-dimensional data search and statistics capabilities on the basis of ensuring high availability of user data. Create a variety of indexes for a variety of scenarios to achieve a variety of pattern retrieval. Users can create and open indexes only when needed. The TableStore ensures the consistency of data synchronization, which greatly reduces the workload of users’ scheme design, service operation and maintenance, code development and so on.

Order system page overview based on table storage

The sample is embedded in the console of table storage. Users can log in to the console to experience the system. (If you are a new user of table storage, you need to click Open service to experience the system. Note: This sample provides order data on the order scale of [million]. Official console address: project example

Two, construction preparation

If you have a good experience of order system of 100 million magnitude and want to start your own system construction journey, just follow the following steps to set up:

1. Enable table storage

The form storage service is opened through the console. The form storage is out-of-the-box (postpaid), and the free amount of the function test is provided to the users by the pay-per-quantity method. Form storage official website console, free limit description.

2. Create an instance

Use the console to create a table storage instance and select a Region that supports multivariate indexes. (SearchIndex has not been commercialized yet. It will be available in Beijing, Shanghai, Shenzhen and Hangzhou for the time being, and will be available gradually in the future)

After creating an example, submit work order to apply for multi-index function invitation test (open by default after commercialization, no charge for use).

  • Invitation test address: Submit work order, choose “Form storage” > “Product function and feature Consultation” > “Create work Order”, the application content is as follows:
  • Problem description: Please fill in the application for SearchIndex Test
  • Confidential Information: Please fill in [region + instance name], for example: Shanghai +myInstanceName

3. SDK download

Using the SDK with multi-index (SearchIndex), the official website, temporarily added new features to the Java, Go, node.js SDK

java-SDK

< the dependency > < groupId > com. Aliyun. Openservices < / groupId > < artifactId > tablestore < / artifactId > < version > 4.7.4 < / version > </dependency>Copy the code

go-SDK

$ go get github.com/aliyun/aliyun-tablestore-go-sdkCopy the code

4. Table design

The order system is not only an order table, it should include: consumer table, sales table, product table, supplier table, transaction order table, payment order table and so on. In this example, the pork loin uses the basic four tables (consumer table, salesman table, product table, and trade order table). The order table is just as follows: Table name: order_contract

The column name The data type The index type Fields that
_ID (primary key column) String
MD5(oId) To avoid hotspots
oId String KEYWORD The order no.
pName String TEXT Product name. Index of type TEXT can be fuzzily queried but cannot be sorted
totalPrice double DOUBLE The order price
orderTime long LONG Order time (timestamp)
. . . .

Iii. Start building (core code)

1. Create table

Four tables: order table, customer table, sales table, and product table Users only need to maintain one instance and create it as follows: Create and manage data tables on the console (users can also create data tables using the SDK directly) :

SearchIndex = SearchIndex = SearchIndex = SearchIndex = SearchIndex = SearchIndex = SearchIndex

3. Data import

Insert some test data (100 million data are inserted in the console sample, and users can insert a small amount of test data themselves through the console);

The order number Order (MD5) (primary key) Consumer number Consumer name Shop assistant No. Salesman’s name Product number Product name Product brand The product type Place the order of time Payment time Payment status The product is monovalent The number of The total price
o0000000000 c49f5fd5aba33159accae0d3ecd749a7 c0019 Chen elimination s0020 The sale of chu ten p0003004 vivo x21 vivo Mobile phone The 2018-07-17 21:00:00
no 2498.99 2 4997.98
Consumer Id (Primary Key) Consumer name Consumer points Registration time
c0001 Zhao a elimination 818 The 2018-07-07 14:33:51
Shop Assistant NO. (Primary key) Salesman’s name Salesman’s score Date of entry
s0001 Sell a zhao 613 The 2018-07-07 14:27:59
Product Number (Primary key) Product name Product brand The product type The product is monovalent The new time
p0001001 iphone 6 apple Mobile phone 6969.00 The 2018-07-07 14:44:39

4. Data reading

Data reads fall into two categories:

The primary key to read

Obtain primary key columns based on native table storage: getRow, getRange, batchGetRow, etc. Primary key read for index (automatic) backlookup, users can also provide primary key (order MD5) single query page, query speed is very fast under the order of billions. Single primary key query does not support multi-dimensional retrieval.

The index reading

Query: Search interface based on the new SearchIndex feature. Users are free to design multi-dimensional condition combination queries for index fields. By setting and selecting different query parameters, different query conditions and different sorting methods are constructed. Currently support: precise query, range query, prefix query, matching query, wildcard query, phrase matching query, word string query, and through Boolean and, or combination. For example, the combination of [C0001 consumer, order with consumption above 99.99] is as follows:

List<Query> mustQueries = new ArrayList<Query>();

TermQuery termQuery = new TermQuery();
termQuery.setFieldName("cId");
termQuery.setTerm(ColumnValue.fromString("c0001"));
mustQueries.add(termQuery);

RangeQuery rangeQuery = new RangeQuery();
rangeQuery.setFieldName("totalPrice"); RangeQuery. SetFrom (ColumnValue. FromDouble (99.99)); mustQueries.add(rangeQuery); BoolQuery boolQuery = new BoolQuery(); boolQuery.setMustQueries(mustQueries);Copy the code

Welcome aboard

In this way, the core code of the system has been completed, based on table storage to build an order system, is not very simple?