ElasticSearch for beginners to learn the basics of combat

Hi, everyone. I’m a quick, quick, quick, quick

ElasticSearch will be used in the project. This series of articles will cover the basics of ElasticSearch and go into more detail.

[bug Mc-10899] – I wrote an article about MySQL, now I’m going to play ElasticSearch in a series.

This article will give you the basics of ElasticSearch, and you will learn more about ElasticSearch.

1. Basic Concepts

The Document (the Document)

ElasticSearch is document-oriented, and a document is the smallest unit of all searchable data, such as a single data record for MySQL

The document is serialized to JSON format and saved in ElasticSearch

Each document has a unique ID, such as the primary key ID in MySQL

JSON document

A document contains a series of fields, such as a record in data

Json document, flexible format, do not need to define the format

In the previous article, we converted CSV files into JSON files using Logstash and stored them in ElasticSearch

Metadata for the document

Index: indicates the index name of the document

Type: indicates the name of the document type

Id: indicates the unique ID of the document

Source: The raw JSON data of the document

Version: indicates the version of a document

Score: correlation score

The index

An index is a container of documents. It is a combination of a class of documents. Each index has its own mapping definition, which defines the fields and types of the included documents

Each index can define mapping, setting, mapping defines field type, setting defines different data distribution

{
  "movies" : {
    "aliases": {},"mappings" : {
      "properties" : {
        "@version" : {
          "type" : "text"."fields" : {
            "keyword" : {
              "type" : "keyword"."ignore_above": 256}}},"genre" : {
          "type" : "text"."fields" : {
            "keyword" : {
              "type" : "keyword"."ignore_above": 256}}},"id" : {
          "type" : "text"."fields" : {
            "keyword" : {
              "type" : "keyword"."ignore_above": 256}}},"title" : {
          "type" : "text"."fields" : {
            "keyword" : {
              "type" : "keyword"."ignore_above": 256}}},"year" : {
          "type" : "long"}}},"settings" : {
      "index" : {
        "creation_date" : "1641637408626"."number_of_shards" : "1"."number_of_replicas" : "1"."uuid" : "gf0M2BgnStGZZHsIJD6otQ"."version" : {
          "created" : "7010099"
        },
        "provided_name" : "movies"}}}}Copy the code

Type

Prior to 7.0, an Index could be set to multiple types, so most data was displayed as tables of type and database

After 7.0, only one type ‘_doc’ can be created for an index

If you don’t understand, you can compare MySQL with the analogy

Databases	ElasticSearch
Table	The Index (Type)
Row	Document
Column	Filed
Schema	Mapping
Sql	Dsl

node

A node is an instance of ElasticSearch, which is essentially a Java process. One machine can run multiple ElasticSearch processes, but it is recommended that one instance of ElasticSearch be run on one server in a production environment

Each node has a name, which is specified in the configuration file, or at startup time -e node.name=node1

After each node is started, it is assigned a UID and stored in the data directory

Primary node: master

By default, any node in a cluster can be selected as the master node, responsible for creating indexes, dropping indexes, tracking nodes in the cluster, and deciding on sharding to allocate to corresponding nodes. Indexing data and searching queries consume a lot of memory, CPU, and IO resources. Therefore, in order to ensure the stability of a cluster, it is necessary to actively separate the master node from the data node.

Data node: data

From its name, it can be seen that it is a node that stores index data. It is mainly used for adding, deleting, modifying, searching, and aggregation operations. Data nodes have high requirements on memory, CPU, and I/O. Therefore, you need to monitor the status of data nodes during optimization. If resources are insufficient, you need to add new nodes to the cluster.

Load balancing node: Client

Search the node can only handle the routing request, processing, distribution of index operation, such as the node is similar to Nginx load balance processing, independent of the client node in a larger cluster is very useful, it can coordinate the master node, the data node, the client node to join the state of the cluster, according to the status of the cluster can route requests directly.

Preprocessing node: ingest

You can pre-process data before indexing it. All nodes support ingEST by default, or you can configure a node to be ingEST.

shard

Fragments are divided into master fragments and duplicate fragments

Master shard: To solve the problem of horizontal scaling of data by distributing data to all nodes in the cluster. A shard is a running Instance of Lucene(search engine). The number of master shards is specified at creation time and cannot be changed later unless Reindex

Copy: To solve the problem of high availability of data, can understand the master shard copy, increase the number of copies, but also improve the availability of services to a certain extent.

How sharding is set up in production

If the number of fragments is too small, nodes cannot be added to achieve horizontal expansion. If the amount of data in a single fragment is too large, data redistribution takes time. Suppose you set the index to three master shards, then you add several instances to the cluster, and the index can only be on three servers

Too many fragments on a single node will lead to a waste of resources and affect performance

As of ElasticSearch7.0, the default master shard is set to 1, which solves the over-sharding problem

Check the cluster health status

Implement interface

get _cluster/health
Copy the code

Green: Master shards and replicas are allocated normally

Yellow: Primary fragments are normally allocated, but duplicate fragments are not normally allocated

Red: A primary shard was not allocated and an index was created when the server’s disk capacity exceeded 85%

Second, the Result Api

interface	role
get movies	View index information
get movies/_count	View the total number of documents indexed
post movies/_search	Check out the top 10 documents
get /_cat/indices/movies? v&s=index	Getting index status
get /_cat/indices? v&health=green	View the index whose status is green
get /_cat/indices? v&s=docs.count:desc	Reverse order according to document data
get /_cat/indices/kibana*? pri&v&h=health,index,pri,rep,docs,count,mt	View index specific fields
get /_cat/indices? v&h=i,tm&s=tm:desc	View the memory occupied by the index
get _cluster/health	Check the cluster health status

Basic CRUD operations for documents

Create a document

You can automatically generate a document ID or specify a document ID

The system automatically generates the document ID by calling POST /movies/_doc

When you create movies with HTTP put movies/_create/1, the url displays the specified _create. If the document with that ID already exists, the operation fails

The Index document

The difference between Index and Create is that if the document does not already exist, the new document is indexed. Otherwise existing documents are deleted and new documents are indexed with version information +1

You can see that the previous document has been updated to the latest niUNIu because there was a document id=1 and you can see that the version information has also been increased by 1

The update document

The update method does not delete the original document, but performs a true data update

Get a document

Retrieve the document to find, return status code 200, document meta information, here need to pay attention to the version information, the same ID of the document, even deleted version number will continue to increase

Unable to find document, return status code 404

Bulk Api

Supports operation on different indexes in one Api call, including index, CREATE, UPDATE, and DELETE

You can specify the index in the URL or in the payload of the request

The failure of a single operation does not affect other operations, and the return result includes the result of each operation

Multi-index Bulk batch operations:

post _bulk
{"index": {"_index" : "test1"."_id" : "1"}}
{"name":"kaka_bulk"}
{"delete": {"_index":"test1"."_id":"2"}}
{"create": {"_index":"test2"."_id":"3"}}
{"name":"kaka_create"}
{"update": {"_id":"1"."_index":"test1"}}
{"doc": {"name":"kaka_bulk"}}
Copy the code

Returns the result

{
  "took" : 165,
  "errors" : false."items": [{"index" : {
        "_index" : "test1"."_type" : "_doc"."_id" : "1"."_version" : 1,
        "result" : "created"."_shards" : {
          "total": 2."successful" : 1,
          "failed": 0}."_seq_no": 0."_primary_term" : 1,
        "status": 201}}, {"delete" : {
        "_index" : "test1"."_type" : "_doc"."_id" : "2"."_version" : 1,
        "result" : "not_found"."_shards" : {
          "total": 2."successful" : 1,
          "failed": 0}."_seq_no" : 1,
        "_primary_term" : 1,
        "status": 404}}, {"create" : {
        "_index" : "test2"."_type" : "_doc"."_id" : "3"."_version" : 1,
        "result" : "created"."_shards" : {
          "total": 2."successful" : 1,
          "failed": 0}."_seq_no": 0."_primary_term" : 1,
        "status": 201}}, {"update" : {
        "_index" : "test1"."_type" : "_doc"."_id" : "1"."_version" : 1,
        "result" : "noop"."_shards" : {
          "total": 2."successful" : 1,
          "failed": 0}."status": 200}}]}Copy the code

Note that the BULK API has strict json syntax requirements. Each JSON string cannot be wrapped on a single line, and there must be a line break between one JSON string and another.

Single index Bulk operations are performed in batches

When operating on the same index, the BULK statement can also be changed in the following manner

post test1/_bulk
{"index": {"_id" : "1"}}
{"name":"kaka_bulk"}
{"delete": {"_id":"2"}}
{"create": {"_id":"3"}}
{"name":"kaka_create"}
{"update": {"_id":"1"}}
{"doc": {"name":"kaka_bulk"}}
Copy the code

You can try the results of a single column for yourself, and you can see the difference between single-indexed bulk and multi-indexed BULK.

Bulk size Specifies the optimal bulk size

Bulk requests are loaded into memory. If the volume is too large, performance deteriorates. Therefore, you need to constantly try to determine the optimal bulk size, which should be between 5 and 15MB, and adjust the number of bulk requests based on the current data volume.

Batch read _mGET

The reason is the same as MySQL, as long as the batch within a reasonable range will reduce the cost of network connection, thus improving performance

Note that each JSON file obtained in batches must be separated by commas (,); otherwise, a JSON parsing exception is reported

get /_mget
{
  "docs": [{"_index":"test"."_id":"1"},
    {"_index":"movies"."_id":"2"}}]Copy the code

Batch search _msearch

post kibana_sample_data_ecommerce/_msearch
{}
{"query": {"match_all": {}},"size": 1} {"index":"kibana_smaple_sample_data_flights"}
{"query": {"match_all": {}},"size": 1}Copy the code

Common error status

The problem	why
Unable to connect	The network is faulty or the cluster is abnormal
Connection cannot be closed	The network is faulty or the node is faulty
429	The Cluster is too Busy
4xx	Request size error
500	Cluster internal error

Four, inverted index

The inverted index is composed of the word dictionary and the inverted list. The word dictionary records the words of all documents and records the association of the inverted list of words

The inverted list records the combination of documents corresponding to words, which is composed of inverted index items, including document ID, word frequency TF, position and offset

Case study:

Document ID	The document content
1	kaka ElasticSearch
2	ElasticSearch kaka
3	ElasticSearch niuniu

The inverted list is:

Document ID	Word frequency	location	The offset
1	1	1	10, 10 > <
2	1	0	< 0, 13 >
3	1	0	< 0, 13 >

ElasticSearch can set its own inverted index for each field in a JSON document, or specify that some fields are not inverted indexed

If you do not do inverted index, although you can save storage space, but the field cannot be searched

5. Use Analyzer for word segmentation

The process of converting an entire text into a series of words is called a participle

Analysis is implemented through Analyzer, either through ElasticSearch’s built-in Analyzer or using custom Analyzer

In addition to converting this entry on write, the same parser is used to analyze query statements

Example: ElasticSearch Kaka

The word has been converted to elasticSearch and kaka. Note that the word has been converted to lower case

The composition of the Analyzer

Character Fiters: For raw text processing, such as removing HTML

Tokenizer: Shards words according to rules

Token Filter: Process the segmented words, convert them to lowercase, remove stopWords and add synonyms

ElasticSearch’s built-in word splitter

# Standard Analyzer - Default word Analyzer, word segmentation, lowercase processing
Split words only, and lower case words
get _analyze
{
  "analyzer":"standard"."text":"If you don't expect quick success, you'll get a pawn every day"
}

# Simple Analyzer - According to non-letter sharding (symbols are filtered), lowercase processing
# Split by non-letter (e.g., between letters), all non-letter (e.g., 2 below) are removed
get _analyze
{
  "analyzer" :"simple"."text":"3 If you don't expect quick success, you'll get a pawn every day kaka-niuniu"
}

# Whitespace Analyzer - Split by space, no lower case
# just split by space, nothing else
get _analyze
{
  "analyzer":"whitespace"."text":"3 If you don't expect quick success, you'll get a pawn every day"
}

# Stop Analyzer - Lower case processing, Stop word filtering (the, a, is)
# Split by non-letter (e.g., between letters), all non-letter (e.g., 2 below) are removed
# Remove the, a, is and other modifiers compared to Simple Analyze
get _analyze
{
  "analyzer":"stop"."text":"4 If you don't expect quick success, you'll get a pawn every day"
}

# Keyword Analyzer - Treats input directly as output, regardless of words
Use this if you don't want to use any participles
get _analyze
{
  "analyzer":"keyword"."text":"5 If you don't expect quick success, you'll get a pawn every day"
}

# Patter Analyzer - Regular expressions, default \W+(non-character delimited)
# Use regular expressions for word segmentation, default is \W+, non-character symbols for segmentation
get _analyze
{
  "analyzer":"pattern"."text":"6 If you don't expect quick success, you'll get a pawn every day"
}

# Language one provides word segmentation for more than 30 common languages
# Divide words in different languages
# will change the plural to singular and will remove the "ing" from the word
get _analyze
{
  "analyzer":"english"."text":"7 If you don't expect quick success, you'll get a pawn every day kakaing kakas"
}

# Chinese word segmentation
# This needs to be installed
/bin/ elasticSearch -plugin install analysis-icu
/bin/ elasticSearch > /dev/null 2>&1 &
get _analyze
{
  "analyzer":"icu_analyzer"."text":"Hello, I'm Kaka."
}
Copy the code

Other Chinese participles

The most used IK participle is just a custom thesaurus that supports hot update dictionary

Thulac, a set of word dividers for natural Language of Tsinghua University

Six, the Search Api

Search through Url Query

Such as:

get /movies/_search? q=2012&df=title&sort=year:descCopy the code

Q: Specifies the Query statement. Query String Syntax is used

Df: query fields. If this parameter is not specified, all fields will be queried

Sort: Sort, FROM, and size are used for paging

Profile: You can view how queries are executed

Specify field query, generic query

Df = df = df = df = df = df

According to the information on the right of the following figure, the specified field queries the data that exists in 2012 in title

It is also possible to write a specified field query in this way

get /movies/_search? q=2012&df=title {"profile":true
}
Copy the code

As you can see on the right side of the figure below, a generic query looks for the number of 2012 in all fields

Group and quote queries

If the value of the query is Beautiful Mind, it is equivalent to Beautiful OR Mind, similar to the OR statement in MySQL

If you query for “Beautiful Mind”, it is equivalent to Beautiful AND Mind, similar to the AND statement in MySQL, meaning that the fields in the query must contain not only Beautiful but also Mind

Note: at first glance there is no difference, but the difference is the absence of quotation marks

# PhraseQuery

Beautiful and mind must exist in the title field

# "description" : """title:"beautiful mind""""get /movies/_search? q=title:"Beautiful Mind"
{
  "profile":"true"
}


# TermQuery

"Beautiful" or "mind" is required

# "type" : "BooleanQuery",
# "description" : "title:beautiful title:mind",get /movies/_search? q=title:(Beautiful Mind) {"profile":"true"
}
Copy the code

Boolean operations

Can be used AND/OR/NOT OR && / | | /! Here you’ll notice that they’re all capitalized, with + for must and – for not mast

# title must contain beautiful and mind
# "description" : "+title:beautiful +title:mind"get /movies/_search? q=title:(Beautiful AND Mind) {"profile":"true"
}


# title "beautiful" must have no mind
# "description" : "title:beautiful -title:mind"get /movies/_search? q=title:(Beautiful NOT Mind) {"profile":"true"
}


# title contains beautiful, must also contain mind
# "description" : "title:beautiful +title:mind"get /movies/_search? q=title:(Beautiful %2BMind) {"profile":"true"
}
Copy the code

Range query, wildcard query, fuzzy matching

# year Films older than 1996
[] {}
# "description" : "year:[1997 TO 9223372036854775807]"get /movies/_search? q=year:>1996 {"profile":"true"
}

Data for b exists in # title
# "description" : "title:b*"get /movies/_search? q=title:b* {"profile":"true"
}

# For fuzzy matching is very necessary, because there will be a user will type the wrong word, we can do approximate matching
# "description" : "(the title: beautiful) ^ 0.875"get /movies/_search? q=title:beautifl~1 {"profile":"true"
}
Copy the code

7. Request Body Search

In daily development, the most common use is to do this in the Request Body, followed by the clicking instances

Normal query

Sort: indicates the field to be sorted

Source: Look up those fields

From: number of pages

Size: number of pages

post movies/_search
{
  "profile":"true"."sort": [{"year":"desc"}]."_source": ["year"]."from": 0."size": 2."query": {"match_all": {}}}Copy the code

Script field

This application scenario is not consistent with the foreign currency function of Kakaka recently. Each contract has its own different exchange rate, so we need to figure out how much the contract amount is

post /movies/_search
{
  "script_fields": {"new_field": {"script": {"lang":"painless"."source":Value + "doc [' year ']. 'year'"}}},"query": {"match_all": {}}}Copy the code

In this example, the current data is spelled with “year” of the new field and returned, the result is as follows

    {
        "_index" : "movies"."_type" : "_doc"."_id" : "3844"."_score": 1.0."fields" : {
          "new_field" : [
            "1989"]}}Copy the code

As you can see from the above result, only the script field is returned, but the original field is not returned.

_source :[“id”,”title”] = _source :[“id”,”title”]

post /movies/_search
{
  "_source":"*"."script_fields": {"new_field": {"script": {"lang":"painless"."source":Value + "doc [' year ']. 'year'"}}},"query": {"match_all": {}}}Copy the code

View the returned result

    {
        "_index" : "movies"."_type" : "_doc"."_id" : "3843"."_score": 1.0."_source" : {
          "year" : 1983,
          "@version" : "1"."genre" : [
            "Horror"]."id" : "3843"."title" : "Sleepaway Camp"
        },
        "fields" : {
          "new_field" : [
            "1983"]}}Copy the code

Query expression Match

# title contains sleepaway or camp
Get /movies/_search? Q =title:(Beautiful Mind) group query returns the same result
# "description" : "title:sleepaway title:camp"
get /movies/_doc/_search
{
  "query": {"match": {"title":"Sleepaway Camp"}},"profile":"true"
}

# title must contain sleepaway and camp in the same order
Get /movies/_search? Q =title (Beautiful AND Mind) is consistent
# "description" : "+title:sleepaway +title:camp"
get /movies/_doc/_search
{
  "query": {"match": {"title": {"query":"Sleepaway Camp"."operator":"AND"}}},"profile":"true"
}

The query Sleepaway in # title can have an arbitrary value inserted between Sleepaway and Camp
# get /movies/_search? q=title:beautifl~1
# "description" : """title:"sleepaway camp"~1"""
get /movies/_doc/_search
{
  "query": {"match_phrase": {"title": {"query":"Sleepaway Camp"."slop": 1}}}."profile":"true"
}
Copy the code

Query String and Simple Query String

We can use and in the same way as the url Query String
Sleepaway and camp must be present in # title
# Get /movies/_search with url? Q = title: (Beautiful Mind)
# "description" : "+title:sleepaway +title:camp"
post /movies/_search
{
  "query": {"query_string": {"default_field":"title"."query":"Sleepaway AND Camp"}},"profile":"true"
}

# simple_query_string does not support the use of and
A sleepaway or camp exists in the # title
# "description" : "title:sleepaway title:and title:camp"
post /movies/_search
{
  "query": {"simple_query_string": {
      "query": "Sleepaway AND Camp"."fields": ["title"]}},"profile":"true"
}

If you want simple_query_string to perform Boolean operations, add default_operator
Sleepaway and camp must be present in # title
# "description" : "+title:sleepaway +title:camp"
post /movies/_search
{
  "query": {"simple_query_string": {
      "query": "Sleepaway Camp"."fields": ["title"]."default_operator": "AND"}},"profile":"true"
}
Copy the code

9. Mapping and common field types

What is the Mapping

Mapping is similar to the schema in a database. It includes defining the field name of an index, defining the data type of a field, and configuring inverted index Settings

What is Dynamic Mapping

Mapping has an attribute called Dynamic, which defines how to handle new fields contained in new documents. The three values are optional and default to true

True: Once the document with the new field is written, the Mapping is also updated

False: The Mapping will not be updated and the new fields will not be indexed, but the information will appear in _source

Strict: The document fails to be written

Common type

Json type	ElasticSearch type
string	Format the date as data, float the float, integer as long, set it to text, and add the keyword subfield
Boolean value	boolean
Floating point Numbers	float
The integer	long
object	object
An array of	Takes the type of the first non-null value
control	ignore

put kaka/_doc/1
{
  "text":"kaka"."int": 10,"boole_text":"false"."boole":true."float_text":"1.234"."float": 1.234."loginData":"2005-11-24T22:20"
}

Get kaka mapping
get kaka/_mapping
Copy the code

Return the result, and you can tell from the result that if it’s false or true in quotes it’s text and you just have to be careful about that

{
  "kaka" : {
    "mappings" : {
      "properties" : {
        "boole" : {
          "type" : "boolean"
        },
        "boole_text" : {
          "type" : "text"."fields" : {
            "keyword" : {
              "type" : "keyword"."ignore_above": 256}}},"float" : {
          "type" : "float"
        },
        "float_text" : {
          "type" : "text"."fields" : {
            "keyword" : {
              "type" : "keyword"."ignore_above": 256}}},"int" : {
          "type" : "long"
        },
        "loginData" : {
          "type" : "date"
        },
        "text" : {
          "type" : "text"."fields" : {
            "keyword" : {
              "type" : "keyword"."ignore_above": 256}}}}}}}Copy the code

Custom Mapping

Sets the field not to be indexed

To set a field to not use index, add index: false to the field and pay attention to the format of the mapping

Following the steps given by Kakha, you get an error such as Cannot search on field [mobile] since it is not indexed, which means you can’t search a field without indexing it

put kaka
{
  "mappings": {"properties": {"firstName": {"type":"text"
      },
      "lastName": {"type":"text"
      },
      "mobile": {"type":"text"."index":false
      }
    }
  }
}

post /kaka/_doc/1
{
  "firstName":"kaka"."lastName":"Niu"."mobile":"123456"
}

get /kaka/_search
{
  "query": {"match": {
      "mobile":"123456"}}}Copy the code

Set the copy_to

Set as follows, copy_to set and then search can directly use the field you define for search

put kaka
{
  "mappings": {"properties": {"firstName": {"type":"text"."copy_to":"allSearch"
      },
      "lastName": {"type":"text"."copy_to":"allSearch"}}}}Copy the code

For ease of view, insert two more pieces of data here

post /kaka/_doc/1
{
  "fitstName":"kaka"."lastName":"niuniu"
}

post /kaka/_doc/2
{
  "fitstName":"kaka"."lastName":"kaka niuniu"
}
Copy the code

The query only returns data with id 2, so using COPY_to means that all fields contain the search term

post /kaka/_search
{
  "query": {"match": {"allSearch":"kaka"}},"profile":"true"
}
Copy the code

Custom word segmentation

The Tokenizer is composed of Character Fiters, Tokenizer and Token Filter

Character Filters are used to replace, add, and delete text. Multiple Character Filters can be configured. Note that setting the Character Filters will affect the position and offset information of the Tokenizer

Character Filters come with HTMl strips to remove HTMl tags, Mapping string replacements, and Pattern replace regular matching replacements

Tokenizer deals with word segmentation and has a lot of built-in word segmentation that you’ll see in detail in this second installment

Token Filters add, modify, and delete words after the Tokenizer segmentation, such as lowercase letters, stop to remove modifiers, synonyms, and so on

Customize Character Filters

# Character Fiters HTML replacement
# removes all HTML tags from text
post /_analyze
{
  "tokenizer":"keyword"."char_filter": ["html_strip"]."text":" kaka chat "
}

# Character Fiters replacement value
# replaces "I" in text with "kaka" and "hope" with "wish"
post /_analyze
{
  "tokenizer":"keyword"."char_filter":[
    {
      "type":"mapping"."mappings": ["i => kaka"."hope => wish"]}],"text":"I hope,if you don't expect quick success, you'll get a pawn every day."
}

# Character Fiters regular expression
Use regular expressions to obtain domain name information
post /_analyze
{
  "tokenizer":"keyword"."char_filter":[
    {
      "type":"pattern_replace"."pattern":"http://(.*)"."replacement":"The $1"}]."text":"http://www.kakaxiantan.com"
}
Copy the code

Custom Token Filters

The current use of the word segmentation is whitespace, this word segmentation is separated by Spaces, but now also want to make the word smaller and filter modifiers, how to do?

post /_analyze
{
  "tokenizer":"whitespace"."filter": ["stop"."lowercase"]."text":"If on you don't expect quick success, you'll get a pawn every day"
}
Copy the code

In order not to take up space, only representative returns are copied

{
  "tokens": [{"token" : "if"."start_offset": 0."end_offset": 2."type" : "word"."position": 0}, {"token" : "you"."start_offset" : 6,
      "end_offset" : 9,
      "type" : "word"."position": 2}]}Copy the code

Practice custom word segmentation

Analyze is composed of Character Fiters, Tokenizer, and Token Filter, which can be customized

The user-defined participle must include Analyzer, Tokenizer, char_filter, and filter

This part of the definition is required to define the rules below, otherwise it cannot be used, detailed definition code to see the full version

Don’t memorize this configuration and you’ll remember it if you use it a lot

# Customize analyze
put kaka
{
  "settings": {"analysis": {"analyzer": {"my_custom_analyzer": {"type":"custom"."char_filter": ["emoticons"]."tokenizer":"punctuation"."filter": ["lowercase"."englist_stop"]}},"tokenizer": {"punctuation": {"type":"keyword"}},"char_filter": {"emoticons": {"type":"mapping"."mappings": ["123 => Kaka"."456 => xian tan"]}},"filter": {"englist_stop": {"type":"stop"."stopwords":"_english_"
        }
      }
    }
  }
}

# perform a custom participle
post /kaka/_analyze
{
  "analyzer":"my_custom_analyzer"."text":" 123 456"
}

Change the uppercase letter to lowercase
{
  "tokens": [{"token" : " kaka xian tan"."start_offset": 0."end_offset" : 8,
      "type" : "word"."position": 0}]}Copy the code

Index Template

After a new index is created and the document is inserted, the default setting and mapping will be used. If you have Settings and mappings configured, the default Settings and mappings will be overridden

Create index and insert document
post /kaka/_doc/1
{
  "gongzhonghao":"123"
}

Get Settings, mappings
get /kaka
Copy the code

The following configuration is the default

# Settings/mappings
{
  "kaka" : {
    "aliases": {},"mappings" : {
      "properties" : {
        "gongzhonghao" : {
          "type" : "text"."fields" : {
            "keyword" : {
              "type" : "keyword"."ignore_above": 256}}}}},"settings" : {
      "index" : {
        "creation_date" : "1642080577305"."number_of_shards" : "1"."number_of_replicas" : "1"."uuid" : "JJWsGYcrTam0foEQxuZqGQ"."version" : {
          "created" : "7010099"
        },
        "provided_name" : "kaka"}}}}Copy the code

Next create a template of your own

Set up a template that can be used for any index starting with test. In this template, we convert the number in the string to long instead of text
put /_template/kaka_tmp
{
  "index_patterns": ["test*"]."order": 1,"settings": {"number_of_shards": 1,"number_of_replicas": 2}."mappings": {# let time not parse as date, return as text
    "date_detection":false.# parse the number in double quotes to long instead of text
    "numeric_detection":true}}Copy the code

Create indexes

post /test_kaka/_doc/1
{
  "name":"123"."date":"2022/01/13"
}

get /test_kaka
Copy the code