What is Elasticsearch?
Elasticsearch (ES for short) is a full-text search engine based on Lucene’s open source, distributed, Restful interface and distributed document database. Naturally distributed, highly available, scalable, and able to store, search, and analyze large amounts of data in a short amount of time.
What is full text search?
Full-text search is also called the full-text retrieval, refers to scan the every word in the article, for each word into establishing an index, indicated the word in the article and the number of occurrences of position, the end user input keywords query request, the search engine will find according to pre-established index, and the results of the query response to the user. There are two key words: word segmentation and index. Elasticsearch does both of these things internally, partitioning the saved text according to the rules, and indexing those terms for users to query.
What is an inverted index?
In the full-text search process, the index created by keywords is called an inverted index. As the name implies, the establishment of a forward relationship “text content – keyword” is called a forward index. It will be introduced later. Here’s an example:
- Text 1: I have a friend who loves smile
- Text 2: I have a dream today
After English word segmentation and inversion index establishment, a simple “keyword – text” mapping is obtained as follows:
keywords | Text Numbers |
---|---|
I | 1, 2, |
have | 1, 2, |
a | 1, 2, |
friend | 1 |
who | 1 |
loves | 1 |
smile | 1 |
dream | 2 |
today | 2 |
With this mapping table, a search for the keyword “have” immediately returns two records with id 1,2, and a search for today returns the record with id 2, which is very high search performance. Of course, the inverted index maintained by Elasticsearch contains more information, but this is just a brief introduction to how it works.
What scenarios does Elasticsearch apply to?
Common scenarios
- Search scenarios Common search scenarios include e-commerce websites, recruitment websites, news websites, and various apps.
- Log analysis class scene classic combination of ELK (Elasticsearch/Logstash/Kibana), can complete the log collection, the log storage, query interface log analysis basic functions, the implementation of the scheme is very popular, most corporate log analysis system is to use it.
- Data early warning platform and data analysis scenarios For example, e-commerce price early warning is set on the supported e-commerce platforms. When the preferential price is lower than a certain value, a notification message is triggered to inform users of purchase. Data analysis is common, such as analyzing the top 10 brands in sales volume of e-commerce platforms, analyzing the top 10 attention, comments and visits of blog systems and headline websites.
- Compared with large retail supermarkets, the commercial BI system needs to analyze the consumption amount of users in the last quarter, age group, the distribution of the number of visitors in each period of each day and other information, output corresponding report data, predict the hot products in the next quarter, and recommend appropriate products according to the age group. Elasticsearch does data analysis and mining, Kibana does data visualization.
A common case
- Wikipedia, Baidu Encyclopedia: full text search, highlighting, search recommendations
- Stack Overflow: With full-text search, you can search for solutions based on the key information of error messages.
- Github: Search through hundreds of billions of lines of code for the key code you want.
- Log analysis system: ELK platform built by enterprises.
- , etc.
Elasticsearch architecture diagram
Simple definition of architecture components:
- Gateway An underlying storage system, usually a file system, that supports multiple types.
- Distributed Lucence Directory Based on lucence distributed framework, encapsulates the establishment of inverted index, data storage, translog, segment and other implementation.
- The main modules of ES at the module layer include index module, search module and mapping module.
- Discovery cluster node Discovery module, used for communication between cluster nodes, election coordinate node operation, supporting a variety of Discovery mechanisms, such as Zen, EC2, etc.
- Script Script parsing module used to support scripts written in query statements, such as painless, Groovy, Python, etc.
- Plugins are third-party plug-ins that provide advanced functionality and support customization.
- Transport/JMX communication module, data transmission, the underlying use of netty framework
- Restful/Node interface for accessing the Elasticsearch cluster
- X-pack ElasticSearch is an extension that integrates security, warning, monitoring, graphics, and reporting capabilities with seamless access and pluggable design.
Elasticsearch is installed
The website address
https://www.elastic.co/cn/ has each version of the download address, official documents and use the sample, please download the installation package.
The source address
https://github.com/elastic/elasticsearch has each version of the source address, you can switch to the specified version, for 6.3.1 choose current version
Installation steps
- JDK 1.8 or later is required
- Download the installation package from the official website and decompress it in a specified directory
- Run bin/ elasticSearch (Linux) binelasticSearch. bat(Windows)
- Curl http://localhost:9200/ or open your browser at http://localhost:9200/ and see the following response:
{ "name" : "node-1", "cluster_name" : "hy-application", "cluster_uuid" : "lJ4DRWOvQauAy-VEYiZc2g", "version" : {" number ":" 6.3.1 ", "build_flavor" : "default", "build_type" : "tar", "build_hash" : "eb782d0", "build_date" : "2018-06-29T21:59:26.107521z ", "build_snapshot" : false, "lucene_version" : "Minimum_wire_compatibility_version" : "5.6.0", "minimum_index_compatibility_version" : "5.0.0"}, "tagline" : "You Know, for Search" }Copy the code
- Run bin/ Kibana (Linux) or binkibana.bat(Windows). If Kibana and ElasticSearch are deployed on the same machine, use the default configuration file.
- Enter http://192.168.17.137:5601/ validation kibana, browsers, start to appear the interface for the following said success: figure 2
summary
Elasticsearch is the first version of the Elasticsearch system to be installed and verified. Elasticsearch is the first version of Elasticsearch to be installed and verified. Or small and medium-sized applications, the amount of data is less, the operation is not very complex, can be directly started. Unless otherwise specified, use version 6.3.1 as an example to learn Elasticsearch.
Focus on Java high concurrency, distributed architecture, more technical dry goods to share and experience, please pay attention to the public account: Java architecture community