This article comes from the public number “fat rolling pig learning programming”, please indicate the source of reprint.

From today on, if you want to play ElasticSearch, learn distributed search engine, follow the fat pig!

Since it is the first lesson of ES, the most important thing is to make you love it! Do not want to say those simple advantages, concepts, directly on the big factory production cases, is the most attractive to you! Follow the big factory, no problem!

Why ES?

A technical service component, first of all, need to understand its comprehensive use scenarios, to more targeted research and promotion. The first thing you need to know is why you need to learn ElasticSearch. This ranking is why you should learn it!

Why are you so high up in the queue? It’s just a search engine. Well, you probably think “Search engine” when you say Elasticseach. Similar Baidu search, taobao search the sort of. And I’m writing this article to correct you for that “wrong” view.

Elasticseach did start out as a search engine, but has now evolved into a versatile data product. So your mind should never be limited to search engines.

ElasticSearch is one of the most popular ElasticSearch applications in the world.

  • Real-time Log Analysis
  • Search service
  • The data analysis
  • Data monitoring
  • The query service
  • The back-end storage

ElasticSearch is available on Tencent

ElasticSearch is widely used in Tencent, mainly including real-time log analysis scenario, search service, and time series data analysis.

  • Search services: For example, Tencent documents do full text retrieval based on ES, and a large number of commodity searches of e-commerce customers such as Pinduoduo and Mogujie are based on ES.
  • Log analysis: This is the most widely used area of ES. It supports full stack log analysis, including various application logs, database logs, user behavior logs, network data, security data, and so on. ES has a complete log solution that can be implemented in seconds from collection to display.
  • Timing analysis: The typical scenario is monitoring data analysis, such as cloud monitoring. The whole monitoring of Tencent Cloud is based on ES. Iot scenarios are also included, and there is also a lot of time series data. Time series data is characterized by high write throughput, and ES provides rich multidimensional statistical analysis operators.

Real-time Log Analysis

Typical logs are as follows:

  • Operation logs, such as slow logs and exception logs, are used to locate service problems.
  • Business logs, such as user click and access logs, can be used to analyze user behavior.
  • Audit logs can be used for security analysis. ES perfectly solves the need of real-time log analysis, and it has the following features:

Elastic Ecology offers a complete log analysis solution that can be easily deployed by any developer, o&M user using mature components.

  • In the Elastic ecosystem, logs are typically generated to be accessible within 10s. Compared with traditional big data solutions of dozens of minutes and hours, timeliness is very high. ES has a complete log solution (ELK) that can be implemented in seconds from capture to display.

  • With support for inverted indexes, column storage, and other data structures, ES provides very flexible search analysis capabilities.

  • Interactive analysis is supported, with ES search response times of seconds, even in the case of trillion-level logs.

Log is the most basic and extensive form of data in the Internet industry. ES perfectly solves the real-time analysis scenario of log, which is also an important reason for the rapid development of ES in recent years

Search service

Search services, typical scenarios include: commodity search, similar to jingdong, Taobao, Pinduoduo commodity search; APP search, support APP search in the APP store; Site search, support forum, online documents and other search functions. We support a number of search services, which have the following features:

  • High performance: a maximum of 10w+ QPS for a single service, 20ms to 20ms flat ring, and a P95 delay of less than 100ms.
  • Strong correlation: The search experience mainly depends on whether the search results highly match the user’s intention, which needs to be evaluated by the accuracy rate, recall rate and other indicators.
  • High availability: The search scenario requires high availability and supports disaster recovery (Dr) when a single server fails. Any e-commerce service, such as Taobao, JD.com and Pinduoduo, can make headlines after an hour’s outage.

Time series data analysis

Timing data analysis, typical timing data include: Metrics, that is, traditional server monitoring; The monitoring of the whole Tencent cloud is based on ES. APM, application performance monitoring; Iot data, sensor data generated by intelligent hardware, industrial iot, etc. Time series data is characterized by high write throughput, and ES provides rich multidimensional statistical analysis operators. This type of scenario has the following characteristics:

  • High concurrent write: the maximum write throughput of a single online cluster is 600+ nodes and 1000w/s.

  • High query performance: the query delay for a curve or time line is 10ms to 10ms.

  • Multi-dimensional analysis: Flexible and multi-dimensional statistical analysis capabilities are required. For example, we can conduct statistical analysis flexibly according to regions and business modules when viewing monitoring.

Through the case of Tencent, we know three application scenarios.

  • Real-time log analysis scenario

  • Search service

  • Time series data analysis

In addition, we can summarize several advantages of ES from these three application scenarios:

1, with high availability, high scalability;

2, fast query speed, good performance;

3, search powerful, highly matched user intent.

Therefore, you can see that the application advantages of ES in real-time log analysis and search are simply unbeatable! At least at present, in these two aspects, there is no strong opponent!

ElasticSearch app in Jingdong

Through the case of JINGdong, we will talk about the application scenarios of ES in query, retrieval and data analysis

Elasticsearch is used in many scenarios on JD.com due to its high performance and low barriers to use. It covers multiple business lines of JD as well as many application scenarios:

Structured data queries that complement relational databases

The main business applications are commodities, promotions, coupons, orders, cash registers, logistics, checking accounts, comments and other large data volume query. The core requirements of this scenario are high performance, stability, and high availability. In some scenarios, search requirements are required to speed up relational databases. Service systems synchronize data through binlog synchronization or double-write.

Full-text search function

The main application scenarios are application, security, risk control, transaction and other operation logs, as well as the commodity search of some categories on JD. In this logging scenario, the write requirements are high and the query performance and high availability requirements are relatively low. The number of large service writes reaches tens of millions per second. The storage unit is PB. These scenarios have high requirements on disk and memory, so JINGdong has also made corresponding optimization to reduce memory consumption, improve the overall utilization rate of disk, and use cheaper disks to reduce costs.

Real-time data analysis engine to form statistical reports

The main application business is the analysis of logistics, order data analysis, user portrait and so on. Due to the large latitude of business data analysis, flink, Storm and other streaming analysis are not suitable for some report scenarios, and real-time batch processing is a problem, so the near real-time analysis of Elasticsearch becomes the choice of these services.

From the case of JINGdong, we seem to see that WE can use ES to replace the relational database in some scenarios. Not only that, ES has a place in the field of real-time data analysis!

Where is ElasticSearch going

Where through the case, chat about ES in the query aspect of the application scenario, can be simply understood as “replace “mysql. Notice that you can’t completely replace the quotation marks with your eyes closed. Transactional, for example.

In 2015, the average daily order volume of Qunar hotel reached 30W +, and with the aggregation of multi-platform orders, the average daily order volume could reach about 100W.

In the original hot-table database mode, the orders of the last 6 months are placed in a table, and the historical orders are placed in the history table. The history table stores a full amount of data. When users query the order time span of more than 6 months, they can query the historical order table. The data volume of the heat table in this split table method is about 4000W, which can be solved at that time. But apparently can not meet ctrip Elong order access demand.

If we continue with the heat meter, the data volume will exceed 100 million. The full data table may hold more than 400 million data for 2 years. Therefore, it is urgent to find an effective way to solve this problem. As the estimated 400 million data volume also needs to be based on the reservation date, check-in date, check-out date, order number, contact name, phone number, hotel name, order status…… And so on a number of conditions query. Therefore, it is meaningless to simply perform table operation according to a certain dimension.

Mysql is not friendly enough to support a large number of queries, so Elasticsearch distributed storage cluster was introduced to solve the problem of order data storage and search.

The order model is abstracted and classified, and the common search fields and the underlying attribute fields are separated. DB to do sub-database sub-table, store order details; Elasticsearch stores the search field.

Order complex queries go directly to Elasticsearch, simple queries based on OrderNo go to DB, as shown below.

In the case of Qunar, we seem to see that ES can handle complex queries that a relational database cannot support.

conclusion

When should ElasticSearch be used?

Typical search scenario: Use it with your eyes closed!

Typical log analysis scenario: Use it with your eyes closed!

3, relational database queries have bottlenecks: consider using it! Why consider? The strength of ES is in queries, but practice has proven that when used as a database, there is a delay in queries immediately after writing.

4. Data analysis scenarios: Consider using it! Why consider? Simple generic scenario requirements can be used on a large scale, but for specific business scenario domains, more specialized data products such as complex aggregation, ClickHouse is better than Elasticserach for multi-billion level data aggregation requirements.

What are the advantages of ElasticSearch?

1, very simple horizontal expansion, distributed architecture, can easily expand the capacity of resources horizontally and vertically, can meet the requirements of hardware resources for different data levels and query scenarios. It can be built by hundreds of machines to 10,000 machines to meet the rapid search of PB level, and it can also build a small company of stand-alone service.

2. Fast query speed: Lucene is used as a search engine at the bottom of ES, and multiple optimizations are made on this to ensure users’ demand for data query data. It can “replace” the traditional relational database, and can also be used for complex data analysis and near real-time processing of massive data.

3. High correlation: ES provides a perfect internal scoring mechanism, which will sort documents according to the frequency of word segmentation and other information, so as to ensure that documents with higher correlation rank higher. In addition, it also provides a variety of query methods including fuzzy query, prefix query, wildcard query, etc., to help users quickly and efficiently search.

4, many function points but relatively simple to use, out of the box, performance optimization is relatively simple

Rich ecosystem, active community, suitable for a variety of tools. The following diagram, processing log and output to Elasticsearch, you can use the logging tools, such as Logstash (www.elastic.co/products/lo… Kibana, the legendary ELK stack. In addition, the current mainstream big data frameworks almost all support ES, such as Flink and ES is a perfect partner.

Reference for this article:

Tencent Trillion level Elasticsearch technology decryption

How to choose a search engine? Ctrip hotel order Elasticsearch actual combat

Elasticsearch in jingdong usage scenario

This article comes from the public account: [Fat rolling pig learning programming]. A set of appearance level and talent in a suit, not smart but hard enough female program yuan. Programming in comic form so easy and interesting! Beg attention!