Billions of data, millisecond return, how to build

This article has been sent to my github address github.com/allentofigh… Welcome everyone star support

preface

Rapid development in recent years, the company’s business, the amount of data the explosive growth, followed by the huge amounts of data query and other challenges, we need the data measured in billions, or even billions level when the size of the still can return at the rate of second level even millisecond, so obviously depends on the search engine’s help, in the search engine, ES (ElasticSearch) is undoubtedly the best among them. It has ranked first in the evaluation of DBRanking’s search engine for many consecutive years and is also the first choice of most large companies. So what advantages does it have compared with traditional DB such as MySQL? When the data reaches PB, how to ensure the real-time performance of ES index data to better meet the needs of business?

In this paper, we will talk about some ideas on how to build quasi-real-time index based on our practical experience in ES, hoping to inspire you. The table of contents is as follows

Why use search engine, traditional DB such as MySQL is not fragrant
- The shortage of the MySQL
- ES introduction
ES index data build
Pb-level ES quasi – real – time index data construction

Why use search engine, traditional DB such as MySQL is not fragrant

The shortage of the MySQL

MySQL is not suitable for mass data query, it is only suitable for mass data storage, but can not cope with a variety of complex conditions under the mass data query, some people say that adding index can avoid full table scan, improve the speed of query, why it is not suitable for mass data query, there are two reasons:

* 1 * indexed, indeed, can improve the query speed, but add multiple index in MySQL eventually when executing SQL it will only choose the lowest cost index, if there are no indexes meet the search criteria, will trigger a full table scan, and even if you are using the composite index, also will conform to the principle of the most left prefix to hit the index, But huge amounts of data in a variety of query conditions are likely to do not conform to the leftmost prefix principle and index fails, and we know that store is need cost, if you are indexed for each case, the innoDB, for example, an index, each additional creates a B + tree, if it is a huge amounts of data, will increase a lot of storage costs, I’ve had feedback that one of their tables is only 10GB in size, while the index size is 30GB! What a huge cost! So don’t think more indexes are better.

*2, * Some query conditions are MySQL and index can not solve, for example, I want to query all goods in the title with “Gree air conditioning” keywords, if you use MySQL to write, will write the following code

SELECT * FROM product WHERE title like '% Gree air Conditioner %'
Copy the code

This will not hit any index, triggering a full table scan, and you can not expect everyone to be able to enter the goods he wants, people will make mistakes, we often make mistakes like “Gree air conditioning” into “grid space”, then the SQL statement will become:

SELECT * FROM product WHERE title like '% grid air conditioner %'
Copy the code

In this case, even if you trigger a full table scan, you will not be able to query any items.

ES introduction

Rather than saying that the above points are the shortcomings of MySQL, MySQL itself is not designed for mass data query, but has specialized technology. Mass data query also needs a special search engine. Among them, ES is the king, which is an open source distributed search analysis engine built on Lucene engine. It provides near-real-time query of PB data and is widely used in full-text search, log analysis, and monitoring analysis.

It mainly has the following three characteristics:

Easily support a variety of complex query conditions: It is distributed real-time file storage, will putEach fieldAll of them are indexed (inverted index). With efficient inverted index, customized scoring, sorting ability and rich word segmentation plug-ins, it can meet the requirements of full-text retrieval under arbitrary complex query conditions
Strong scalability: Naturally supports distributed storage, and implements distributed horizontal expansion of hundreds or thousands of servers through extremely simple configuration, easily processing PB level structured or unstructured data.
High availability and good Dr Performance: High availability is ensured through the use of active and standby nodes and automatic fault detection and recovery

Let’s start by understanding some important concepts of ES in a MySQL analogy

Through the form of analogy, it is not difficult to see the following concepts of ES: 1. DataBase of MySQL is equivalent to Index, a logical collection of data. The main work of ES is to create indexes and query indexes. 2, a database will have multiple tables, the same Index will have multiple types 3, a table will have multiple rows (Row), the same type will have multiple documents. Schema specifies the name of the table, whether to create an index, whether to create a word, whether to create a word, etc. The same Mapping also specifies the processing rules of the Type field, such as how to create an index, whether to create a word, and so on. Just specify it in the Mapping

So how can indexes in ES be so efficient, with huge amounts of data reaching the second level? It USES a variety of optimization methods, the main reason is that it USES a technique called inverted index method to generate the index, to avoid the full document scanning, then what is inverted index, through documents to find the key words such as data we are referred to as the row index, hence, by keywords to find the document in the form of what we would call a inverted index, Suppose you have three documents

In order to find a document with comming in it, if you want to forward the index, you need to extract the contents of each document to see if there is such a word. This will no longer cause a full table scan. Then, the mapping relationship between each participle and the documents containing this participle is established. If there are multiple documents containing this participle, the documents will be sorted according to the importance degree, namely the weight of the documents (tF-IDF is usually used to score the documents), so we can get the following relationship

In this way, we need to find all the documents with comming, just check once, and in this case, the performance of querying multiple words is also very good, as long as the document list corresponding to multiple conditions is queried, and then take the intersection, which greatly improves the query efficiency.

Voiceover: There are some simplified procedures here, such as actually locating the word from the word list first, but these procedures are quick to ignore and can be consulted by interested readers.

In addition to inverted index, the distributed architecture of ES is also naturally suitable for massive data query. Let’s take a look at the architecture of ES

An ES cluster is composed of multiple node nodes, and each index exists on multiple node nodes in Shard (index subset) data. In this way, when a query request comes in, the corresponding results can be queried in each node and integrated. The query pressure is distributed to multiple nodes, avoiding the shortage of CPU, disk, memory and other processing capacity of a single node.

In addition, when a new node is added, it will automatically migrate some fragments to the new node to achieve load balancing. This function is automatically completed by ES. Compared with a sub-database and sub-table of MySQL, developers need to introduce middleware like Mycat and specify sub-database and sub-table rules and other tedious processes, isn’t it a huge progress? This means that ES has a very strong ability of horizontal expansion, the cluster can be easily expanded to hundreds of thousands of nodes, easily support pB-level data query.

Of course, ES is not only powerful, it also uses such as master and standby sharding to improve the search swallowing rate, node failure detection, Raft selection mechanism to improve disaster tolerance and so on, these are not the focus of this article, readers can refer to, after the above brief summary we only need to understand one point: The distributed architecture of ES naturally supports massive data queries.

How to generate ES index data (index)

### How to build ES index

In order to build ES index data, you must have a data source first. Generally, MySQL is used as the data source. You can directly fetch data from MySQL and then write ES.

The most commonly used business scenarios in e-commerce apps must be that users input keywords to query corresponding commodities. Then what information will there be? A commodity will have multiple SKUs (SKUs are categories of different specifications under the same commodity. For example, iPhone 6, iPhone 6s, etc.) will have its basic attributes such as price, title, etc. Goods will have categories (home, clothing, etc.), brand, inventory, etc. In order to ensure the rationality of table design, we will design several tables to store these attributes. Suppose there are product_SKu (SKU table), product_property (basic property table), SKu_stock (inventory table), product_category (category table), then in order to display all this information in the product display list, We must join these tables and then write ES, so that the query will fetch all the product information in ES.

This scheme is obviously not feasible because it directly performs join operation in MySQL, which will greatly affect the performance of online DB service when commodities reach tens of millions. Then how to generate intermediate tables? Since it is not feasible to operate directly in MySQL, Whether the synchronization of data in MySQL to another place to do the operation of the generated intermediate table, is to add a middle tier for processing, thus avoiding the direct manipulation of online DB, speaking of which I believe you will further understand to computer’s words: there is nothing to add a middle tier can’t solve, if you have, then add a layer.

This middle layer is hive

What is a hive

Hive is a hadoop-based data warehouse tool used to extract, transform, and load data. It is a mechanism for storing, querying, and analyzing large-scale data stored in Hadoop. The idea is to convert writable Hive SQL into complex, unwritable Map-Reduce programs. In other words, if there is a large amount of data, you can greatly improve performance by synchronizing MySQL data to Hive and generating product_tmp intermediate table from Hive. Hive temporary tables are stored in hbase (a distributed, column-oriented open source database). After hive temporary tables are generated, dump task is periodically triggered to invoke the indexer. Then, the indexer reads full data from hbase, processes service data, and updates the data to the ES index

This may seem like a nice way to build an index, but we need to know that hive is very time-consuming to execute join tasks. In our production scenario, due to tens of millions of data, it usually takes dozens of minutes to execute join tasks. The whole process from the execution of join task to the final update to ES usually takes at least half an hour. If the price, inventory, online status (such as being removed from the shelves) and other important fields of the product are changed during this period, the index cannot be updated, which will have a great impact on user experience. Before optimization, we often see products searched through ES that are on-line but actually off the shelves, which seriously affects user experience. Then, how to solve this problem? There is a feasible solution: to establish a wide table

Since hive join is a major performance bottleneck, can we avoid this process by combining product_sku, product_property, sku_stock tables into a large table (we call it a wide table) in MySQL?

After MySQL is synchronized to Hive, hive does not need to perform time-consuming join operations, greatly improving the overall processing time. The drop from hive MySQL synchronization to ES index dump was reduced from more than half an hour to less than a few minutes, which looks good, but the index delay of a few minutes is still unacceptable.

Why Cannot Hive import indexes in real time

Because Hive is built on top of static batch-based Hadoop, Hadoop typically has high latency and a lot of overhead when submitting and scheduling jobs. Hive cannot perform operations such as fast and low-latency queries on large data sets, and tens of millions of data can be imported from indexers to ES clusters at least in minutes.

In addition, with the introduction of wide tables, maintenance becomes a new problem. Imagine that skU inventory changes, products are removed from the shelves, and prices are adjusted. In addition to modifying the records of the original table (SKu_stock, Product_Categry, etc.), all records changed to the original table should also be updated corresponding to all records in the wide table. This is a maintenance nightmare because you need to change the wide table logic immediately after all the merchandise related table changes, and the wide table change logic changes are tied together!

Pb-level ES quasi – real-time index construction

How to solve it? If we can monitor the db field changes in real time, and then synchronize the changes to ES and wide table in real time, it will solve our problem.

How can I monitor table field changes in real time?

Answer: the binlog

Let’s review the master-slave synchronization principle of MySQL

MySQL master writes data changes to binary log (binary log events)
MySQL slave copy master binary log events to its relay log
MySQL slave replays events in the relay log to reflect data changes to its own data

We can see that the key principle of master-slave replication is that the Master and Slave follow a protocol to monitor the binlog in real time to update the Slave table data. Can we also develop a component that follows this protocol? What if the component gets the binlog as a Slave and listens for table field changes in real time? That’s what Canal, Alibaba’s open source project, does, and it works like this:

Canal emulated the interaction protocol of the MySQL slave, disguised itself as the MySQL slave, and sent the dump protocol to the MySQL master
MySQL master receives dump request and starts pushing binary log to slave
Canal parses binary log objects (originally byte streams)

If you are only interested in certain table fields, then canal will be able to fetch the binlog. Who should I send the binlog to? All this requires a unified management component, which Is what Alibaba’s Otter does.

What is the otter

Otter is a distributed database synchronization system provided by Ali, which is based on database incremental log parsing and quasi-real-time synchronization to the MySQL database of the machine room or the remote machine room. Its overall architecture is as follows:

Note: The above is our company’s business structure modified according to OTTER, which is slightly different from the original OTter, but almost the same

The main workflow is as follows

Configure zK in Manager, the table to listen on, the Nodes responsible for listening on the table, and then synchronize the configuration to Nodes
After node is started, its canal will listen on binlog, and the data will be sent to MQ after four stages: S (Select), E (extract), T(transform), and L(load)
The business can then subscribe to MQ messages to do the associated logic processing

Voice-over: ZooKeeper mainly coordinates the work between nodes. For example, when data is synchronized across equipment rooms, data may be synchronized from nodes in equipment room A to nodes in equipment room B. Zookeeper is used for coordination.

Note that node has four phases: S, E, T, and L. Their main functions are as follows

Select stage: To solve the differences of data sources, for example, access Canal to obtain incremental data, or access other systems to obtain other data.
Extract stage: assemble data, assemble and filter data for various data sources, such as mysql, Oracle, Store,file, etc.
Transform phase: The process of data extraction and transformation, converting data to the type required by the target data source
Load phase: data loading, such as writing data to the migrated database, MQ, ES, etc

The above set of Data services based on ali Otter transformation we call it DTS (Data Transfer Service), that is, Data transmission Service.

After setting up this set of services, we can subscribe to MQ to write ES in real time to update the index in real time, and can also subscribe to MQ to update the wide table field, to achieve the above mentioned wide table field update and the logic of the original table, based on DTS service index improvement architecture is as follows:

Note: “build data” this module for real-time index update is transparent, the module is mainly used in the update or insert MySQL table wide, because for wide table, it is a few tables and sets of data, so it’s not listening to which field change which is updated, it wants to put all the goods involves all the table data back to update to the wide table.

So, we can solve the index delay problem by using the MySQL wide table full update + real-time index update based on DTS, and can achieve the second ES index update!

There are a couple of questions that you might be interested in, so LET me just list them

Which fields to subscribe to

For MySQL wide table because it will keep full information of the goods, so it needs to subscribe to all fields, but for the red box in the real-time index update, it only needs to subscribe to the inventory, price and other fields, because these fields if you don’t update in time, will have great impact on sales, so we only pay attention to these sensitive index of real-time field.

With real-time index updates, do YOU still need full index updates

Yes, for two main reasons:

Real-time update relies on the message mechanism, so data integrity cannot be guaranteed 100%. Full update is required to support, which is rare, and there are alarms due to message backlog. Therefore, we only perform full index update once a day
Indexes can be quickly rebuilt after an index cluster exception or crash

Does the full index update overwrite the live index

Will, imagine a scenario, you are in a time triggered the real-time index, then the whole quantity index also in execution, has not been performed to real-time index update the record, in this way when the full quantity index after execution will real-time index update data to the cover off before, for this situation is a feasible approach is that if the full quantity index is in the building, Real-time index update messages can be deferred for processing until the full update is complete. For this reason, we usually perform full index in the early morning, which is the peak business period, so we can avoid such problems.

conclusion

This paper briefly summarizes some ideas of our company to build real-time ES index under PB level data, hoping to be helpful to you. The paper only briefly mentions the application of Ali middleware such as ES, Canal and Otter, but does not introduce the detailed configuration and principle of these middleware too much. For example, ES has done a lot of work to improve search efficiency and optimize storage space. For example, canal has made high availability and OTter has realized the principle of cross-machine room synchronization. It is suggested that interested readers can study it carefully later, and I believe you will benefit a lot.

Welcome to pay attention to “code sea” public number, common progress ^_^

Shoulders of giants

Elasticsearch profile, and compared with MySQL query principle: www.jianshu.com/p/116cdf583…
www.cnblogs.com/zhjh256/p/9…
Installation of otter otter – node installation (more than single node installation) : blog.csdn.net/u014642915/…
MySQL and the Lucene index contrast analysis: developer.aliyun.com/article/504…
10 minutes quick start massive data search analysis engine Elasticearch: www.modb.pro/db/29806
ElasticSearch and Mysql query principle analysis and comparison: www.pianshen.com/article/425…
Take you into the same god: Elasticsearch index mechanism zhuanlan.zhihu.com/p/137574234

Billions of data, millisecond return, how to build

preface

Why use search engine, traditional DB such as MySQL is not fragrant

The shortage of the MySQL

ES introduction

Pb-level ES quasi – real-time index construction

conclusion

Related Posts

Kafka Configuration 6– Set and add SASL users or user permissions in Windows

Leetcode104- Maximum depth of binary tree | brush problem clocking

[Vulnerability Analysis] Summary of vulnerability of remote command execution