1. Core concepts of ElasticSearch

  • What is the elasticsearch

Es is a real-time distributed search and analysis engine, which is used for full-text search, structured search and analysis. It provides restful API interface, Java API interface (as well as API interface of other languages) and easy to use

  • Near real time

There is a small delay (about 1 second) between writing data and being searchable; Es-based search and analysis can be performed in seconds (inverted index)

  • Cluster (Cluster)

A cluster contains multiple nodes, and which cluster each node belongs to is determined by a configuration. For small and medium sized applications, one node per cluster is normal to start with

  • The Node (the Node)

A node in a cluster also has a name (randomly assigned by default). The node name is important (when performing o&M operations). The default node is added to a cluster named ElasticSearch

  • Index = Index

The index contains a bunch of structured document data,A library for analogical relational databases

  • Type (Type)

Each index can have one or more types. Type is a logical data category in index.In 6.x, type is supported, but in 7.x, type is _doc. In 8.x, the type class will be removed completely

  • Document (Document)

Document is the smallest data unit in ES, a document can be a customer data, an order data,Analogy row of a relational database

  • The Field (Field)

Field is the smallest unit of es. A document has multiple fields, and each field is a data field,Fields in analog relational databases

  • The Mapping (Mapping)

How data is stored on index objects requires a mapping configuration,Schame for analog relational databases

  • Shards (fragmentation)

Because ES is a distributed search engine, the index is usually divided into different parts, and the data distributed on different nodes is called sharding. ES automatically manages and organizes sharding and rebalances shard data when necessary, so users don’t have to worry about the details of shard processing. By default, the maximum number of documents per shard is 2 billion

  • Replica (copy)

By default, ES creates five master shards for each index and one replica shard for each index. That is, each index costs 5 master shards, and each master shard has a corresponding copy

Curl Command & Kibana Developer tools

2.1 the curl command

  • The curl is introduced

The CURL command is an open source file transfer tool that uses THE URL syntax to access urls in the command line. Using the CURL command, you can easily implement common GET/POST requests

  • The format of the curl

-- curl -x Request type request Url -d 'data'

For details about the curl command, see the curl command

  • The curl elasticsearch operation

Because es operates in RESTfull style, you can operate on ES based on CURL

Curl -xput HTTP:/ / 192.168.64.129:9200 / index11? prettyCurl -xdelete HTTP:/ / 192.168.64.129:9200 / index11? pretty

Copy the code

2.2 Kibana developer tools

You can also debug ElasticSearch using the Kibana developer tool, which is the most common way to debug ElasticSearch

Build ElasticSearch and Kibana

3. Index management

The ES index management API mainly contains the following apis:

  • Create Index

Create index:

PUT /index
{
    "settings" : {                 / / @ 1
        "number_of_shards" : 1
    }, 
    "mappings" : {                / / @ 2
        "_doc" : {
            "properties" : {
                "field1" : { "type" : "text"}}}},"aliases" : {                  / / @ 3
        "alias_1" : {},
        "alias_2" : {
            "filter" : {
                "term" : {"user" : "kimchy"}},"routing" : "kimchy"}}}Copy the code

Indexes are created using a PUT request

@1: index configuration properties @2: define mapping, somewhat similar to defining table structure in a relational database @3: specify alias Settings for the index

Note: Creating an existing index is an error.

  • Delete Index

Delete index:

PUT /index
Copy the code

The DELETE request is used to DELETE an index, which is relatively simple

  • Get index

Get index:

GET /index
Copy the code

Getting the index is simple

  • View index list
GET /_cat/indices? vCopy the code

  • Number of updated copies
PUT /index/_settings
{
  "number_of_replicas": 1
}
Copy the code
  • Viewing index Configuration
GET /index/_settings

Copy the code

4. Term & match_all query

  • The term query
PUT /index/_doc/1
{
  "name":"Chinaman"."code":"CHN"} # GET /index/_search {"query": {
    "term": {
      "name": {
        "value": "The"}}},"from": 0."size": 1."highlight": {
    "fields": {
      "name": {}}}}Copy the code

For the words stored in ES, word segmentation is adopted by default, such as: “Chinese”, or divided into three words: “Zhong”, “guo” and “ren”, so when using term to query the name field, it can also be queried with a word “guo”.

Note: this can be worn if paging queries are requiredfrom.sizeThe two fields are paginated and added with highlighting if neededhighlightLogo can be

  • match_all

If you look at this query, you can see that this is a method for querying all data

GET /index/_search
{
  "query": {
    "match_all": {}}}Copy the code

5. Query range & exists

  • Range queries

As you can hear from the name, this is a range query method operation is relatively simple:

# range Query age greater than or equal to30And less than or equal to100GET /index/_search {"query": {
    "range": {
      "age": {
        "gte": 30."lte": 100}}}}Copy the code

The parameters acceptable for range query are:

Gte: greater than or equal to GT: greater than or equal to LTE: less than or equal to lt: less than or equal to Boost: Set the boost value of the query. The default value is 1.0

  • The exists query

Exists Queries a field in a data

GET /index/_search {GET /index/_search;"query": {
    "exists": {
      "field": "age"}}}Copy the code

6. Match the query

GET /index/_search {"query": {
    "match": {
      "name": "中李"}}}Copy the code

Explanation: The content of the name field of the query “Zhongli” will be divided into two words “Zhongli” and “Li” by the participle. As long as the data contains two words “Zhongli”, the data will be queried. Therefore, when using match query, the query content is segmented and then queried

7. Boolean queries

Why is it necessary to use bool query? Because bool query can easily query subsets of multiple data and can carry out some filtering operations, which will be used in the multi-condition search of e-commerce.

The bool query consists of the following types:

Must :[] : The returned document must satisfy the conditions of the MUST clause and participate in the calculation of the score

Filter :[] : The returned document must satisfy the filter clause. But it does not participate in the calculation of points, as Must does

Should :[] : The document returned may satisfy the conditions of the should clause. In a Bool query, if there is no must or filter, and there is one or more should clauses, then one should clause is returned. The minimum_should_match parameter defines at least a few clauses.

Must_nout :[] : Documents returned must not meet the conditions defined by must_NOT

7.1 bool. Must calculate

GET /index/_search {GET /index/_search;"query": {
    "bool": {
      "must": [{"match": {
            "name": "Zhongmei"}}, {"range": {
            "age": {
              "gte": 30."lte": 100}}}]}}Copy the code

Explanation: The must query contains two query conditions match and range two conditions, this query is a subset of the two data sets, and the data has _score value, the larger the score value indicates the higher the matching intensity

7.2 bool. The filter is calculated

Bool filter (bool)7.1GET /index/_search {GET /index/_search;"query": {
    "bool": {
      "must": [{"match": {
            "name": "Zhongmei"}}]."filter": [{"range": {
            "age": {
              "gte": 30."lte": 100}}}]}}Copy the code

The effect of filter is the same as that of MUST, except that it does not participate in the score calculation

7.3 bool. Must_not calculation

Must_not = GET /index/_search {"query": {
    "bool": {
      "must": [{"match": {
            "name": "China Lemei"}}]."must_not": [{"range": {
            "age": {
              "gte": 30."lte": 100}}}]}}Copy the code

Explanation: You can see that the data in the query satisfies the must condition, but does not contain the must_NOT condition. It is verified that what is sought is the difference set of data, which is often used to filter a batch of data after querying a certain batch of data.

7.4 bool. Should calculate

GET /index/_search {GET /index/_search;"query": {
    "bool": {
      "should": [{"match": {
            "name": "In"}}, {"range": {
            "age": {
              "gte": 30."lte": 60}}}]}}Copy the code

Should condition is or, as long as one of the conditions is met will be queried

8. Dynamic mapping

One of the most important features of Elasticsearch is that it lets you start exploring data as quickly as possible. To index documents, you don’t have to first create indexes, define mapping types, and define fields,You just create the document, and it automatically creates indexes, indexes, types, and fields. The process of automatically mapping a type to infer the type of a document from its value is the mechanism this article focuses on: the dynamic type mapping mechanism.

  • If a new field is encountered, the object throws an exception
  • And the inner objectstashNew fields are created dynamically when they are encountered

Dynamic mapping mechanisms include the following mapping rules: Dynamic field mappings and Dynamic templates

8.1 the Dynamic field of the mappings

Dynamic field mapping rules. By default, Elasticsearch adds a new field to the type map when it finds a field in a document that has not been seen before

JSON datatype Elasticsearch datatype
null Type mappings are not automatically added
true or false boolean
floating point number float
integer long
object object
array Judge by the first non-null value in the array
string Date, double, Long, text(with keyword subfield)

8.2 the Dynamic templates

Dynamic field Mappings by default, parameter values are inferred based on the data types supported by ElasticSearch. Dynamic field mappings allow you to add field type mappings by defining custom mapping rules based on which field values are inferred.

PUT /index
{
    "mappings": {
        "index": {
            "dynamic_templates": [{"es": {
                      "match":              "*_es"./ / @ 1
                      "match_mapping_type": "string"."mapping": {
                          "type":           "string"."analyzer":       "spanish"}}}, {"en": {
                      "match":              "*"./ / @ 2
                      "match_mapping_type": "string"."mapping": {
                          "type":           "string"."analyzer":       "english"}}}}}}]Copy the code

@1: matches fields whose names end with _es. @2: matches all other string fields

9. A static mapping

Static mapping includes the following fields. Only some fields are listed here.

9.1 modify the mapping

PUT /index/_mapping
{
  "properties" : {
    "age" : {
      "type" : "long"
    },
    "name" : {
      "type" : "text"."fields" : {
        "keyword" : {
          "type" : "keyword"."ignore_above" : 256}}},"email": {"type": "keyword"
    },
    "is_handle": {"type": "boolean"}}}Copy the code

9.2 Main Fields

  • Long integer
  • The string text (text is a word segmentation)
  • boolean
  • date
  • Keyword (keyword is not a word segmentation, such as search, email, or phone)
"name" : {
  "type" : "text"."fields" : {
    "keyword" : {
      "type" : "keyword"."ignore_above" : 256}} # where text is the type of the name field and keyword is the extended attribute, as long as it does not exceed the specified value256Characters are also indexed with the keyword once to ensure that exact matches are matchedCopy the code

The keyword search mode is as follows:

GET /index/_search {GET /index/_search;"query": {
    "term": {
      "name.keyword": {
        "value": "Chinaman"}}}}Copy the code

For more details, please refer to the official Field documentation

10. Shard & copy

  • View the sharding of the index
GET /index/_settings
Copy the code

Number_of_shards indicates the number of fragments in the current index, and number_of_replicas indicates the number of replicas

  • Sets the number of shards and replicas

When creating an index, you can set the number of shards:

PUT /test_index
{
  "settings" : {
    "index" : {
      "number_of_shards" : 2."number_of_replicas" : 2}}}Copy the code
  • Modifying the number of copies
PUT /test_index/_settings
{
  "index": {"number_of_replicas" : "3"}}Copy the code

11. A participle

11.1 Default word segmentation

  • concept

Analysis: Text Analysis is the process of converting a full text into a series of words (term/token), also known as word segmentation. Analysis is implemented by Analyzer. Analyzer: Consists of three types of building blocks: Character filters, Tokenizers, and token filters

Character filter Character filter: Pre-process your text before it’s ready for word segmentation. The most common way is to filter HTML tags (hello –> hello), & –> and (I&you –> I and you) tokenizers: Token filters Token filters Token filters Token filters Token filters Token filters Token filters Token filters Process the cut words by changing case (e.g., converting “Quick” to lowercase), removing words (e.g., stop words like “a”, “and”, “the”, etc.), or adding words (e.g., synonyms like “jump” and “leap”)

Character Filters– >Tokenizer– >Token Filter Analyzer = CharFilters (0 or more) + Tokenizer(exactly one) + TokenFilters(0 or more)

  • Elasticsearch’s built-in word splitter

Standard Analyzer – Default word Analyzer, by word segmentation, lower case processing Simple Analyzer – By non-letter segmentation (symbols are filtered), lower case processing Stop Analyzer – Lower case processing, Stop word filtering (the, A,is) Whitespace Analyzer – Split by space, no lower case Keyword Analyzer – Regardless of words, directly treats input as output Patter Analyzer – Regular expressions, Default \W+(non-character split) Language – Customer Analyzer custom word divider provides a word divider for more than 30 common languages

11.2 Word segmentation analysis

POST _analyze
{
  "analyzer": "standard"."text":     "Like X for the Fourth of July."
}
Copy the code

11.4 IK word divider

The most recommended Chinese word divider is the IK word divider

  • The installation

Github for Ik: github.com/medcl/elast… Note that the version of the IK spliter is the same as the version of the ES you installed

Unzip the ik package and copy it to the plugins directory of the ES installation. Then restart es to load the appropriate files

  • use

Ik_smart: will do the coarse-grained split

ik_max_word: will do the most fine-grained text split

11.5 Hot Word Configuration

Go to the config directory of the IK participle in the plugins directory of es

[root@localhost config]# cd /data/elasticsearch-7.15. 0/plugins/ik/config/
[root@localhost config]# 
[root@localhost config]# cat IKAnalyzer.cfg.xml 
<?xml version="1.0" encoding="UTF-8"? > <! DOCTYPE properties SYSTEM"http://java.sun.com/dtd/properties.dtd"> <properties> <comment>IK Analyzer extension configuration </comment> <! -- Users can configure their own extended dictionary here --> <entry key="ext_dict">extar_my.dic</entry> <! -- Users can configure their own extension stop word dictionary --> <entry key="ext_stopwords">stop_my.dic</entry> <! -- Users can configure the remote extension dictionary --> <entry key="remote_ext_dict">http:/ / 192.168.64.129 / hot dic < / entry ><! -- Users can configure the remote extension stop word dictionary here --> <! -- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>
[root@localhost config]# 

Copy the code
  • Expand the hot words

Dic file is added to the directory, and the contents can be customized. Then modify ikAnalyzer.cfg. XML file to add the configured hot words to the configuration file

<entry key="ext_stopwords">extar_my.dic</entry>
Copy the code
  • Stop words

Dic file is added in the directory, and the contents can be customized. Then modify IKAnalyzer.cfg. XML file to add the configured hot words to the configuration file. If the disabled dictionary is added, words in the dictionary will not be divided

<entry key="ext_stopwords">stop_my.dic</entry>
Copy the code
  • Remote load hot words

When loading a remote dictionary file, ES will pull up the remote file every few seconds, so if there are hot words in the remote file, you just need to modify the hot words in the remote file, without restarting ES.

<entry key="remote_ext_dict">http:/ / 192.168.64.129 / hot dic < / entry >
Copy the code