1. Core concepts of ElasticSearch
- What is the elasticsearch
Es is a real-time distributed search and analysis engine, which is used for full-text search, structured search and analysis. It provides restful API interface, Java API interface (as well as API interface of other languages) and easy to use
- Near real time
There is a small delay (about 1 second) between writing data and being searchable; Es-based search and analysis can be performed in seconds (inverted index)
- Cluster (Cluster)
A cluster contains multiple nodes, and which cluster each node belongs to is determined by a configuration. For small and medium sized applications, one node per cluster is normal to start with
- The Node (the Node)
A node in a cluster also has a name (randomly assigned by default). The node name is important (when performing o&M operations). The default node is added to a cluster named ElasticSearch
- Index = Index
The index contains a bunch of structured document data,A library for analogical relational databases
- Type (Type)
Each index can have one or more types. Type is a logical data category in index.In 6.x, type is supported, but in 7.x, type is _doc. In 8.x, the type class will be removed completely
- Document (Document)
Document is the smallest data unit in ES, a document can be a customer data, an order data,Analogy row of a relational database
- The Field (Field)
Field is the smallest unit of es. A document has multiple fields, and each field is a data field,Fields in analog relational databases
- The Mapping (Mapping)
How data is stored on index objects requires a mapping configuration,Schame for analog relational databases
- Shards (fragmentation)
Because ES is a distributed search engine, the index is usually divided into different parts, and the data distributed on different nodes is called sharding. ES automatically manages and organizes sharding and rebalances shard data when necessary, so users don’t have to worry about the details of shard processing. By default, the maximum number of documents per shard is 2 billion
- Replica (copy)
By default, ES creates five master shards for each index and one replica shard for each index. That is, each index costs 5 master shards, and each master shard has a corresponding copy
Curl Command & Kibana Developer tools
2.1 the curl command
- The curl is introduced
The CURL command is an open source file transfer tool that uses THE URL syntax to access urls in the command line. Using the CURL command, you can easily implement common GET/POST requests
- The format of the curl
-- curl -x Request type request Url -d 'data'
For details about the curl command, see the curl command
- The curl elasticsearch operation
Because es operates in RESTfull style, you can operate on ES based on CURL
Curl -xput HTTP:/ / 192.168.64.129:9200 / index11? prettyCurl -xdelete HTTP:/ / 192.168.64.129:9200 / index11? pretty
Copy the code
2.2 Kibana developer tools
You can also debug ElasticSearch using the Kibana developer tool, which is the most common way to debug ElasticSearch
Build ElasticSearch and Kibana
3. Index management
The ES index management API mainly contains the following apis:
- Create Index
Create index:
PUT /index
{
"settings" : { / / @ 1
"number_of_shards" : 1
},
"mappings" : { / / @ 2
"_doc" : {
"properties" : {
"field1" : { "type" : "text"}}}},"aliases" : { / / @ 3
"alias_1" : {},
"alias_2" : {
"filter" : {
"term" : {"user" : "kimchy"}},"routing" : "kimchy"}}}Copy the code
Indexes are created using a PUT request
@1: index configuration properties @2: define mapping, somewhat similar to defining table structure in a relational database @3: specify alias Settings for the index
Note: Creating an existing index is an error.
- Delete Index
Delete index:
PUT /index
Copy the code
The DELETE request is used to DELETE an index, which is relatively simple
- Get index
Get index:
GET /index
Copy the code
Getting the index is simple
- View index list
GET /_cat/indices? vCopy the code
- Number of updated copies
PUT /index/_settings
{
"number_of_replicas": 1
}
Copy the code
- Viewing index Configuration
GET /index/_settings
Copy the code
4. Term & match_all query
- The term query
PUT /index/_doc/1
{
"name":"Chinaman"."code":"CHN"} # GET /index/_search {"query": {
"term": {
"name": {
"value": "The"}}},"from": 0."size": 1."highlight": {
"fields": {
"name": {}}}}Copy the code
For the words stored in ES, word segmentation is adopted by default, such as: “Chinese”, or divided into three words: “Zhong”, “guo” and “ren”, so when using term to query the name field, it can also be queried with a word “guo”.
Note: this can be worn if paging queries are requiredfrom
.size
The two fields are paginated and added with highlighting if neededhighlight
Logo can be
- match_all
If you look at this query, you can see that this is a method for querying all data
GET /index/_search
{
"query": {
"match_all": {}}}Copy the code
5. Query range & exists
- Range queries
As you can hear from the name, this is a range query method operation is relatively simple:
# range Query age greater than or equal to30And less than or equal to100GET /index/_search {"query": {
"range": {
"age": {
"gte": 30."lte": 100}}}}Copy the code
The parameters acceptable for range query are:
Gte: greater than or equal to GT: greater than or equal to LTE: less than or equal to lt: less than or equal to Boost: Set the boost value of the query. The default value is 1.0
- The exists query
Exists Queries a field in a data
GET /index/_search {GET /index/_search;"query": {
"exists": {
"field": "age"}}}Copy the code
6. Match the query
GET /index/_search {"query": {
"match": {
"name": "中李"}}}Copy the code
Explanation: The content of the name field of the query “Zhongli” will be divided into two words “Zhongli” and “Li” by the participle. As long as the data contains two words “Zhongli”, the data will be queried. Therefore, when using match query, the query content is segmented and then queried
7. Boolean queries
Why is it necessary to use bool query? Because bool query can easily query subsets of multiple data and can carry out some filtering operations, which will be used in the multi-condition search of e-commerce.
The bool query consists of the following types:
Must :[] : The returned document must satisfy the conditions of the MUST clause and participate in the calculation of the score
Filter :[] : The returned document must satisfy the filter clause. But it does not participate in the calculation of points, as Must does
Should :[] : The document returned may satisfy the conditions of the should clause. In a Bool query, if there is no must or filter, and there is one or more should clauses, then one should clause is returned. The minimum_should_match parameter defines at least a few clauses.
Must_nout :[] : Documents returned must not meet the conditions defined by must_NOT
7.1 bool. Must calculate
GET /index/_search {GET /index/_search;"query": {
"bool": {
"must": [{"match": {
"name": "Zhongmei"}}, {"range": {
"age": {
"gte": 30."lte": 100}}}]}}Copy the code
Explanation: The must query contains two query conditions match and range two conditions, this query is a subset of the two data sets, and the data has _score value, the larger the score value indicates the higher the matching intensity
7.2 bool. The filter is calculated
Bool filter (bool)7.1GET /index/_search {GET /index/_search;"query": {
"bool": {
"must": [{"match": {
"name": "Zhongmei"}}]."filter": [{"range": {
"age": {
"gte": 30."lte": 100}}}]}}Copy the code
The effect of filter is the same as that of MUST, except that it does not participate in the score calculation
7.3 bool. Must_not calculation
Must_not = GET /index/_search {"query": {
"bool": {
"must": [{"match": {
"name": "China Lemei"}}]."must_not": [{"range": {
"age": {
"gte": 30."lte": 100}}}]}}Copy the code
Explanation: You can see that the data in the query satisfies the must condition, but does not contain the must_NOT condition. It is verified that what is sought is the difference set of data, which is often used to filter a batch of data after querying a certain batch of data.
7.4 bool. Should calculate
GET /index/_search {GET /index/_search;"query": {
"bool": {
"should": [{"match": {
"name": "In"}}, {"range": {
"age": {
"gte": 30."lte": 60}}}]}}Copy the code
Should condition is or, as long as one of the conditions is met will be queried
8. Dynamic mapping
One of the most important features of Elasticsearch is that it lets you start exploring data as quickly as possible. To index documents, you don’t have to first create indexes, define mapping types, and define fields,You just create the document, and it automatically creates indexes, indexes, types, and fields
. The process of automatically mapping a type to infer the type of a document from its value is the mechanism this article focuses on: the dynamic type mapping mechanism.
- If a new field is encountered, the object throws an exception
- And the inner object
stash
New fields are created dynamically when they are encountered
Dynamic mapping mechanisms include the following mapping rules: Dynamic field mappings and Dynamic templates
8.1 the Dynamic field of the mappings
Dynamic field mapping rules. By default, Elasticsearch adds a new field to the type map when it finds a field in a document that has not been seen before
JSON datatype | Elasticsearch datatype |
---|---|
null | Type mappings are not automatically added |
true or false | boolean |
floating point number | float |
integer | long |
object | object |
array | Judge by the first non-null value in the array |
string | Date, double, Long, text(with keyword subfield) |
8.2 the Dynamic templates
Dynamic field Mappings by default, parameter values are inferred based on the data types supported by ElasticSearch. Dynamic field mappings allow you to add field type mappings by defining custom mapping rules based on which field values are inferred.
PUT /index
{
"mappings": {
"index": {
"dynamic_templates": [{"es": {
"match": "*_es"./ / @ 1
"match_mapping_type": "string"."mapping": {
"type": "string"."analyzer": "spanish"}}}, {"en": {
"match": "*"./ / @ 2
"match_mapping_type": "string"."mapping": {
"type": "string"."analyzer": "english"}}}}}}]Copy the code
@1: matches fields whose names end with _es. @2: matches all other string fields
9. A static mapping
Static mapping includes the following fields. Only some fields are listed here.
9.1 modify the mapping
PUT /index/_mapping
{
"properties" : {
"age" : {
"type" : "long"
},
"name" : {
"type" : "text"."fields" : {
"keyword" : {
"type" : "keyword"."ignore_above" : 256}}},"email": {"type": "keyword"
},
"is_handle": {"type": "boolean"}}}Copy the code
9.2 Main Fields
- Long integer
- The string text (text is a word segmentation)
- boolean
- date
- Keyword (keyword is not a word segmentation, such as search, email, or phone)
"name" : {
"type" : "text"."fields" : {
"keyword" : {
"type" : "keyword"."ignore_above" : 256}} # where text is the type of the name field and keyword is the extended attribute, as long as it does not exceed the specified value256Characters are also indexed with the keyword once to ensure that exact matches are matchedCopy the code
The keyword search mode is as follows:
GET /index/_search {GET /index/_search;"query": {
"term": {
"name.keyword": {
"value": "Chinaman"}}}}Copy the code
For more details, please refer to the official Field documentation
10. Shard & copy
- View the sharding of the index
GET /index/_settings
Copy the code
Number_of_shards indicates the number of fragments in the current index, and number_of_replicas indicates the number of replicas
- Sets the number of shards and replicas
When creating an index, you can set the number of shards:
PUT /test_index
{
"settings" : {
"index" : {
"number_of_shards" : 2."number_of_replicas" : 2}}}Copy the code
- Modifying the number of copies
PUT /test_index/_settings
{
"index": {"number_of_replicas" : "3"}}Copy the code
11. A participle
11.1 Default word segmentation
- concept
Analysis: Text Analysis is the process of converting a full text into a series of words (term/token), also known as word segmentation. Analysis is implemented by Analyzer. Analyzer: Consists of three types of building blocks: Character filters, Tokenizers, and token filters
Character filter Character filter: Pre-process your text before it’s ready for word segmentation. The most common way is to filter HTML tags (hello –> hello), & –> and (I&you –> I and you) tokenizers: Token filters Token filters Token filters Token filters Token filters Token filters Token filters Token filters Process the cut words by changing case (e.g., converting “Quick” to lowercase), removing words (e.g., stop words like “a”, “and”, “the”, etc.), or adding words (e.g., synonyms like “jump” and “leap”)
Character Filters– >Tokenizer– >Token Filter Analyzer = CharFilters (0 or more) + Tokenizer(exactly one) + TokenFilters(0 or more)
- Elasticsearch’s built-in word splitter
Standard Analyzer – Default word Analyzer, by word segmentation, lower case processing Simple Analyzer – By non-letter segmentation (symbols are filtered), lower case processing Stop Analyzer – Lower case processing, Stop word filtering (the, A,is) Whitespace Analyzer – Split by space, no lower case Keyword Analyzer – Regardless of words, directly treats input as output Patter Analyzer – Regular expressions, Default \W+(non-character split) Language – Customer Analyzer custom word divider provides a word divider for more than 30 common languages
11.2 Word segmentation analysis
POST _analyze
{
"analyzer": "standard"."text": "Like X for the Fourth of July."
}
Copy the code
11.4 IK word divider
The most recommended Chinese word divider is the IK word divider
- The installation
Github for Ik: github.com/medcl/elast… Note that the version of the IK spliter is the same as the version of the ES you installed
Unzip the ik package and copy it to the plugins directory of the ES installation. Then restart es to load the appropriate files
- use
Ik_smart: will do the coarse-grained split
ik_max_word
: will do the most fine-grained text split
11.5 Hot Word Configuration
Go to the config directory of the IK participle in the plugins directory of es
[root@localhost config]# cd /data/elasticsearch-7.15. 0/plugins/ik/config/
[root@localhost config]#
[root@localhost config]# cat IKAnalyzer.cfg.xml
<?xml version="1.0" encoding="UTF-8"? > <! DOCTYPE properties SYSTEM"http://java.sun.com/dtd/properties.dtd"> <properties> <comment>IK Analyzer extension configuration </comment> <! -- Users can configure their own extended dictionary here --> <entry key="ext_dict">extar_my.dic</entry> <! -- Users can configure their own extension stop word dictionary --> <entry key="ext_stopwords">stop_my.dic</entry> <! -- Users can configure the remote extension dictionary --> <entry key="remote_ext_dict">http:/ / 192.168.64.129 / hot dic < / entry ><! -- Users can configure the remote extension stop word dictionary here --> <! -- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>
[root@localhost config]#
Copy the code
- Expand the hot words
Dic file is added to the directory, and the contents can be customized. Then modify ikAnalyzer.cfg. XML file to add the configured hot words to the configuration file
<entry key="ext_stopwords">extar_my.dic</entry>
Copy the code
- Stop words
Dic file is added in the directory, and the contents can be customized. Then modify IKAnalyzer.cfg. XML file to add the configured hot words to the configuration file. If the disabled dictionary is added, words in the dictionary will not be divided
<entry key="ext_stopwords">stop_my.dic</entry>
Copy the code
- Remote load hot words
When loading a remote dictionary file, ES will pull up the remote file every few seconds, so if there are hot words in the remote file, you just need to modify the hot words in the remote file, without restarting ES.
<entry key="remote_ext_dict">http:/ / 192.168.64.129 / hot dic < / entry >
Copy the code