Elasticsearch has been researched for a long time, and now we are going to discuss the core knowledge of Elasticsearch from the following 9 aspects. Welcome to discuss…… #0. Hit the road with Questions – How is ES created? ## (1) Thinking: How to retrieve large-scale data? For example, when the amount of system data is 1 billion, 10 billion, we will usually consider the following aspects when doing the system architecture: 1) what database is good? (mysql, Sybase, Oracle, Dameng, Avatar, mongodb, hbase…) 2) How to solve single point of failure; (LVS, F5, A10, Zookeep, MQ) 3) How to ensure data security (Hot backup, cold backup, remote live) 4) How to solve the retrieval problem; (Database proxy middleware: mysql-Proxy, Cobar, MaxScale, etc.) 5) How to solve statistical analysis problems; (Offline, near real time)
For relational data, we usually adopt the following or similar architecture to solve the query bottleneck and write bottleneck: Key points to solve: 1) solve the data security problem through master/slave backup; 2) Solve single point of failure through heartbeat monitoring of database agent middleware; 3) Distribute query statements to each slave node for query through proxy middleware, and summarize the results
As we learned from the previous discussion, putting data in memory, or not, doesn’t completely solve the problem. All put in memory speed problem is solved, but cost problem comes up. In order to solve the above problems, the following methods are usually used to find solutions from the source: 1. 2. Separate data from indexes; 3. Compress data; This brings us to Elasticsearch.
#1. All the ES basics
Elasticsearch is an open-source, highly extensible, distributed full text search engine that can store and retrieve data in near real time. Its scalability is very good, can be extended to hundreds of servers, processing PB level of data. Elasticsearch is also developed in Java and uses Lucene as its core for all indexing and searching, but it aims to hide the complexity of Lucene with a simple RESTful API to make full-text searching easy.
##1.2 Lucene and ES? 1) Lucene is just a library. To use it, you have to use Java as the development language and integrate it directly into your application. Worse, Lucene is so complex that you need to know a lot about retrieval to understand how it works.
2) Elasticsearch is also developed in Java and uses Lucene as its core to implement all indexing and searching functions, but it aims to hide the complexity of Lucene with a simple RESTful API to make full text search easy.
##1.3 ES mainly solves the following problems: 1) Retrieving relevant data; 2) Return statistical results; 3) Be fast.
Once the ElasticSearch node is started, it uses multicast (or unicast, if the user has changed the configuration) to find and connect to other nodes in the cluster. This process is shown below:
##1.6 Key concepts of ES data architecture (compared to relational database Mysql)
What is ##1.7 ELK? ELK= ElasticSearch +Logstash+ Kibana ElasticSearch: Backend distributed storage and full-text search Logstash: log processing, Kibana: data visualization. The ELK architecture creates a powerful management chain for distributed data storage, visual queries, and log parsing. The three cooperate with each other, learn from each other, and jointly complete the distributed big data processing work.
#2. ES Features and Advantages 1) Distributed real-time file storage, each field can be stored in the index, so that it can be retrieved. 2) Distributed search engine for real-time analysis. Distributed: The index is split into shards, each of which can have zero or more copies. Each data node in the cluster can host one or more shards and coordinate and process various operations; Load rebalancing and routing are done automatically in most cases. 3) Can be extended to hundreds of servers, processing PB level of structured or unstructured data. It can also run on a single PC (tested) 4) Support plug-in mechanism, word segmentation plug-in, synchronous plug-in, Hadoop plug-in, visual plug-in, etc.
(1) Hardware configuration: CPU 16 cores AuthenticAMD Total Memory: 32GB Total hard disk: 500GB Non-SSD
(1) Average index throughput: 12307docs/s (document size: 40B/docs) 2) Average CPU usage: 887.7% (16 cores, average CPU: 55.48%) 3) Build index size: 4) Total write capacity: 20.2123 GB 5) Total test time: 28m 54s
# # 3.2 performance esrally tools (recommended) use reference: blog.csdn.net/laoyang360/…
#4. Why use ES? 1) In early 2013, GitHub dropped Solr for ElasticSearch. GitHub uses ElasticSearch to search 20 TERabytes of data, including 1.3 billion files and 130 billion lines of code.
2) Wikipedia: Launch the core search architecture based on ElasticSearch. 3) SoundCloud: “SoundCloud uses ElasticSearch to deliver instant and accurate music search to 180 million users”. 4) Baidu: At present, Baidu widely uses ElasticSearch as text data analysis to collect all kinds of index data and user-defined data from all servers of Baidu. Through multi-dimensional analysis and display of various data, it helps locate and analyze instance anomalies or business-level anomalies. At present, it covers more than 20 business lines within Baidu (including Casio, cloud analysis, network alliance, prediction, library, direct number, wallet, risk control, etc.), with a maximum of 100 machines and 200 ES nodes in a single cluster, and imports 30TB+ data every day.
In practice, almost every system will have a search function. When the search reaches a certain level, it will become difficult to maintain and expand, so many companies will separate the search module to implement ElasticSearch, etc.
ElasticSearch has expanded beyond its original role as a pure search engine to include aggregation and visualization. If you have millions of documents that need to be located by keyword, ElasticSearch is the best choice. Of course, if your document is JSON, you can use ElasticSearch as a “NoSQL database” and use the ElasticSearch aggregation feature to analyze the data in a multi-dimensional way.
I think it’s ok to replace Elasticsearch as internal storage in some scenarios, but I think it’s ok to replace Elasticsearch as internal storage in some aspects, if your business doesn’t have special requirements for operational tasks. And the permission management is not so fine, because the ES permission is not perfect. Since our application scenario of ES is only for data aggregation operations within a certain period of time, without a large number of single document requests (such as finding a user’s document by userID, similar to the application scenario of NoSQL), whether it can replace NoSQL needs your own testing. If I had a choice, I would try to use ES instead of traditional NoSQL because its horizontal scaling mechanism is too convenient.
#5. What is the application scenario of ES? We usually face two problems: 1) New system development attempts to use ES as storage and retrieval server; 2) The upgrade of the existing system needs to support full-text retrieval service, which requires ES. The use of these two architectures is described in more detail in the following links. Blog.csdn.net/laoyang360/…
1) How to analyze and process 3.2 billion real-time logs dockone. IO /article/505 2) Ali ES build their own log collection and analysis system. 3) Business log processing tech.youzan.com/you-zan-ton… 4) ES implementation on-site search www.wtoutiao.com/p/13bkqiZ.h…
#6. How do I deploy ES? ##6.1 ES deployment (no installation required) 1) Zero configuration, out of the box 2) No cumbersome installation configuration 3) Java version requirements: 1.7 I use the lowest 1.8 [root @ laoyang config_lhy] # echo $JAVA_HOME/opt/jdk1.8.0 _91 4) download address: download. Elastic. Co/elasticsear… /usr/local/elasticSearch-2.3.5./bin/elasticsearch bin/elasticsearch -d
Necessary Head, Kibana, IK (Chinese word segmentation), graph and other plug-ins detailed installation and use. Blog.csdn.net/column/deta…
Write the BAT script to implement the next key installation of Windows. 1) One-click install ES and necessary plug-ins (head, Kibana, IK, Logstash, etc.) 2) Run ES as a service after installation. 3) Save at least 2 hours of time than their own fumble installation, very high efficiency. The script description: blog.csdn.net/laoyang360/…
##1) JAVA API interface www.ibm.com/developerwo…
# # 2) common RESTful API interface to add, delete, change, check operation implementation: blog.csdn.net/laoyang360/…
#8. What if ES encounters a problem? 1) Foreign: discuss.elastice.co / 2) Domestic: elasticsearch.cn/
# reference: [1] www.tuicool.com/articles/7f… [2] zhaoyanblog.com/archives/49… [3] Elasticsearch server development [4] Elasticsearch In Action [5] Elasticsearch In Action [6]
#9. Anything else? Elasticsearch Methodology: the 10 most effective ways to Improve your Average Programmer! (free full version) blog.csdn.net/laoyang360/… — — — — — — — — — — — — — — — — — more ES of iso related actual combat experience sharing, please scan below WeChat Ming yi world 】 【 qr code number public attention. (Update at least once a week!)
Screwing Elasticsearch
Thinking at home by my bed
Author: Ming yi Reprint please indicate the source, the original address: blog.csdn.net/laoyang360/… If you feel this article is helpful to you, please click “top” to support, your support is the biggest motivation for me to persist in writing, thank you!