Hi, everyone. I’m a quick, quick, quick, quick
ElasticSearch will be used in the project. This series of articles will cover the basics of ElasticSearch and go into more detail.
[bug Mc-10899] – I wrote an article about MySQL, now I’m going to play ElasticSearch in a series.
This article will give you the basics of ElasticSearch, and you will learn more about ElasticSearch.
1. Basic Concepts
The Document (the Document)
ElasticSearch is document-oriented, and a document is the smallest unit of all searchable data, such as a single data record for MySQL
The document is serialized to JSON format and saved in ElasticSearch
Each document has a unique ID, such as the primary key ID in MySQL
JSON document
A document contains a series of fields, such as a record in data
Json document, flexible format, do not need to define the format
In the previous article, we converted CSV files into JSON files using Logstash and stored them in ElasticSearch
Metadata for the document
Index: indicates the index name of the document
Type: indicates the name of the document type
Id: indicates the unique ID of the document
Source: The raw JSON data of the document
Version: indicates the version of a document
Score: correlation score
The index
An index is a container of documents. It is a combination of a class of documents. Each index has its own mapping definition, which defines the fields and types of the included documents
Each index can define mapping, setting, mapping defines field type, setting defines different data distribution
{
"movies" : {
"aliases": {},"mappings" : {
"properties" : {
"@version" : {
"type" : "text"."fields" : {
"keyword" : {
"type" : "keyword"."ignore_above": 256}}},"genre" : {
"type" : "text"."fields" : {
"keyword" : {
"type" : "keyword"."ignore_above": 256}}},"id" : {
"type" : "text"."fields" : {
"keyword" : {
"type" : "keyword"."ignore_above": 256}}},"title" : {
"type" : "text"."fields" : {
"keyword" : {
"type" : "keyword"."ignore_above": 256}}},"year" : {
"type" : "long"}}},"settings" : {
"index" : {
"creation_date" : "1641637408626"."number_of_shards" : "1"."number_of_replicas" : "1"."uuid" : "gf0M2BgnStGZZHsIJD6otQ"."version" : {
"created" : "7010099"
},
"provided_name" : "movies"}}}}Copy the code
Type
Prior to 7.0, an Index could be set to multiple types, so most data was displayed as tables of type and database
After 7.0, only one type ‘_doc’ can be created for an index
If you don’t understand, you can compare MySQL with the analogy
Databases | ElasticSearch |
---|---|
Table | The Index (Type) |
Row | Document |
Column | Filed |
Schema | Mapping |
Sql | Dsl |
node
A node is an instance of ElasticSearch, which is essentially a Java process. One machine can run multiple ElasticSearch processes, but it is recommended that one instance of ElasticSearch be run on one server in a production environment
Each node has a name, which is specified in the configuration file, or at startup time -e node.name=node1
After each node is started, it is assigned a UID and stored in the data directory
Primary node: master
By default, any node in a cluster can be selected as the master node, responsible for creating indexes, dropping indexes, tracking nodes in the cluster, and deciding on sharding to allocate to corresponding nodes. Indexing data and searching queries consume a lot of memory, CPU, and IO resources. Therefore, in order to ensure the stability of a cluster, it is necessary to actively separate the master node from the data node.
Data node: data
From its name, it can be seen that it is a node that stores index data. It is mainly used for adding, deleting, modifying, searching, and aggregation operations. Data nodes have high requirements on memory, CPU, and I/O. Therefore, you need to monitor the status of data nodes during optimization. If resources are insufficient, you need to add new nodes to the cluster.
Load balancing node: Client
Search the node can only handle the routing request, processing, distribution of index operation, such as the node is similar to Nginx load balance processing, independent of the client node in a larger cluster is very useful, it can coordinate the master node, the data node, the client node to join the state of the cluster, according to the status of the cluster can route requests directly.
Preprocessing node: ingest
You can pre-process data before indexing it. All nodes support ingEST by default, or you can configure a node to be ingEST.
shard
Fragments are divided into master fragments and duplicate fragments
Master shard: To solve the problem of horizontal scaling of data by distributing data to all nodes in the cluster. A shard is a running Instance of Lucene(search engine). The number of master shards is specified at creation time and cannot be changed later unless Reindex
Copy: To solve the problem of high availability of data, can understand the master shard copy, increase the number of copies, but also improve the availability of services to a certain extent.
How sharding is set up in production
If the number of fragments is too small, nodes cannot be added to achieve horizontal expansion. If the amount of data in a single fragment is too large, data redistribution takes time. Suppose you set the index to three master shards, then you add several instances to the cluster, and the index can only be on three servers
Too many fragments on a single node will lead to a waste of resources and affect performance
As of ElasticSearch7.0, the default master shard is set to 1, which solves the over-sharding problem
Check the cluster health status
Implement interface
get _cluster/health
Copy the code
Green: Master shards and replicas are allocated normally
Yellow: Primary fragments are normally allocated, but duplicate fragments are not normally allocated
Red: A primary shard was not allocated and an index was created when the server’s disk capacity exceeded 85%
Second, the Result Api
interface | role |
---|---|
get movies | View index information |
get movies/_count | View the total number of documents indexed |
post movies/_search | Check out the top 10 documents |
get /_cat/indices/movies? v&s=index | Getting index status |
get /_cat/indices? v&health=green | View the index whose status is green |
get /_cat/indices? v&s=docs.count:desc | Reverse order according to document data |
get /_cat/indices/kibana*? pri&v&h=health,index,pri,rep,docs,count,mt | View index specific fields |
get /_cat/indices? v&h=i,tm&s=tm:desc | View the memory occupied by the index |
get _cluster/health | Check the cluster health status |
Basic CRUD operations for documents
Create a document
You can automatically generate a document ID or specify a document ID
The system automatically generates the document ID by calling POST /movies/_doc
When you create movies with HTTP put movies/_create/1, the url displays the specified _create. If the document with that ID already exists, the operation fails
The Index document
The difference between Index and Create is that if the document does not already exist, the new document is indexed. Otherwise existing documents are deleted and new documents are indexed with version information +1
You can see that the previous document has been updated to the latest niUNIu because there was a document id=1 and you can see that the version information has also been increased by 1
The update document
The update method does not delete the original document, but performs a true data update
Get a document
Retrieve the document to find, return status code 200, document meta information, here need to pay attention to the version information, the same ID of the document, even deleted version number will continue to increase
Unable to find document, return status code 404
Bulk Api
Supports operation on different indexes in one Api call, including index, CREATE, UPDATE, and DELETE
You can specify the index in the URL or in the payload of the request
The failure of a single operation does not affect other operations, and the return result includes the result of each operation
Multi-index Bulk batch operations:
post _bulk
{"index": {"_index" : "test1"."_id" : "1"}}
{"name":"kaka_bulk"}
{"delete": {"_index":"test1"."_id":"2"}}
{"create": {"_index":"test2"."_id":"3"}}
{"name":"kaka_create"}
{"update": {"_id":"1"."_index":"test1"}}
{"doc": {"name":"kaka_bulk"}}
Copy the code
Returns the result
{
"took" : 165,
"errors" : false."items": [{"index" : {
"_index" : "test1"."_type" : "_doc"."_id" : "1"."_version" : 1,
"result" : "created"."_shards" : {
"total": 2."successful" : 1,
"failed": 0}."_seq_no": 0."_primary_term" : 1,
"status": 201}}, {"delete" : {
"_index" : "test1"."_type" : "_doc"."_id" : "2"."_version" : 1,
"result" : "not_found"."_shards" : {
"total": 2."successful" : 1,
"failed": 0}."_seq_no" : 1,
"_primary_term" : 1,
"status": 404}}, {"create" : {
"_index" : "test2"."_type" : "_doc"."_id" : "3"."_version" : 1,
"result" : "created"."_shards" : {
"total": 2."successful" : 1,
"failed": 0}."_seq_no": 0."_primary_term" : 1,
"status": 201}}, {"update" : {
"_index" : "test1"."_type" : "_doc"."_id" : "1"."_version" : 1,
"result" : "noop"."_shards" : {
"total": 2."successful" : 1,
"failed": 0}."status": 200}}]}Copy the code
Note that the BULK API has strict json syntax requirements. Each JSON string cannot be wrapped on a single line, and there must be a line break between one JSON string and another.
Single index Bulk operations are performed in batches
When operating on the same index, the BULK statement can also be changed in the following manner
post test1/_bulk
{"index": {"_id" : "1"}}
{"name":"kaka_bulk"}
{"delete": {"_id":"2"}}
{"create": {"_id":"3"}}
{"name":"kaka_create"}
{"update": {"_id":"1"}}
{"doc": {"name":"kaka_bulk"}}
Copy the code
You can try the results of a single column for yourself, and you can see the difference between single-indexed bulk and multi-indexed BULK.
Bulk size Specifies the optimal bulk size
Bulk requests are loaded into memory. If the volume is too large, performance deteriorates. Therefore, you need to constantly try to determine the optimal bulk size, which should be between 5 and 15MB, and adjust the number of bulk requests based on the current data volume.
Batch read _mGET
The reason is the same as MySQL, as long as the batch within a reasonable range will reduce the cost of network connection, thus improving performance
Note that each JSON file obtained in batches must be separated by commas (,); otherwise, a JSON parsing exception is reported
get /_mget
{
"docs": [{"_index":"test"."_id":"1"},
{"_index":"movies"."_id":"2"}}]Copy the code
Batch search _msearch
post kibana_sample_data_ecommerce/_msearch
{}
{"query": {"match_all": {}},"size": 1} {"index":"kibana_smaple_sample_data_flights"}
{"query": {"match_all": {}},"size": 1}Copy the code
Common error status
The problem | why |
---|---|
Unable to connect | The network is faulty or the cluster is abnormal |
Connection cannot be closed | The network is faulty or the node is faulty |
429 | The Cluster is too Busy |
4xx | Request size error |
500 | Cluster internal error |
Four, inverted index
The inverted index is composed of the word dictionary and the inverted list. The word dictionary records the words of all documents and records the association of the inverted list of words
The inverted list records the combination of documents corresponding to words, which is composed of inverted index items, including document ID, word frequency TF, position and offset
Case study:
Document ID | The document content |
---|---|
1 | kaka ElasticSearch |
2 | ElasticSearch kaka |
3 | ElasticSearch niuniu |
The inverted list is:
Document ID | Word frequency | location | The offset |
---|---|---|---|
1 | 1 | 1 | 10, 10 > < |
2 | 1 | 0 | < 0, 13 > |
3 | 1 | 0 | < 0, 13 > |
ElasticSearch can set its own inverted index for each field in a JSON document, or specify that some fields are not inverted indexed
If you do not do inverted index, although you can save storage space, but the field cannot be searched
5. Use Analyzer for word segmentation
The process of converting an entire text into a series of words is called a participle
Analysis is implemented through Analyzer, either through ElasticSearch’s built-in Analyzer or using custom Analyzer
In addition to converting this entry on write, the same parser is used to analyze query statements
Example: ElasticSearch Kaka
The word has been converted to elasticSearch and kaka. Note that the word has been converted to lower case
The composition of the Analyzer
Character Fiters: For raw text processing, such as removing HTML
Tokenizer: Shards words according to rules
Token Filter: Process the segmented words, convert them to lowercase, remove stopWords and add synonyms
ElasticSearch’s built-in word splitter
# Standard Analyzer - Default word Analyzer, word segmentation, lowercase processing
Split words only, and lower case words
get _analyze
{
"analyzer":"standard"."text":"If you don't expect quick success, you'll get a pawn every day"
}
# Simple Analyzer - According to non-letter sharding (symbols are filtered), lowercase processing
# Split by non-letter (e.g., between letters), all non-letter (e.g., 2 below) are removed
get _analyze
{
"analyzer" :"simple"."text":"3 If you don't expect quick success, you'll get a pawn every day kaka-niuniu"
}
# Whitespace Analyzer - Split by space, no lower case
# just split by space, nothing else
get _analyze
{
"analyzer":"whitespace"."text":"3 If you don't expect quick success, you'll get a pawn every day"
}
# Stop Analyzer - Lower case processing, Stop word filtering (the, a, is)
# Split by non-letter (e.g., between letters), all non-letter (e.g., 2 below) are removed
# Remove the, a, is and other modifiers compared to Simple Analyze
get _analyze
{
"analyzer":"stop"."text":"4 If you don't expect quick success, you'll get a pawn every day"
}
# Keyword Analyzer - Treats input directly as output, regardless of words
Use this if you don't want to use any participles
get _analyze
{
"analyzer":"keyword"."text":"5 If you don't expect quick success, you'll get a pawn every day"
}
# Patter Analyzer - Regular expressions, default \W+(non-character delimited)
# Use regular expressions for word segmentation, default is \W+, non-character symbols for segmentation
get _analyze
{
"analyzer":"pattern"."text":"6 If you don't expect quick success, you'll get a pawn every day"
}
# Language one provides word segmentation for more than 30 common languages
# Divide words in different languages
# will change the plural to singular and will remove the "ing" from the word
get _analyze
{
"analyzer":"english"."text":"7 If you don't expect quick success, you'll get a pawn every day kakaing kakas"
}
# Chinese word segmentation
# This needs to be installed
/bin/ elasticSearch -plugin install analysis-icu
/bin/ elasticSearch > /dev/null 2>&1 &
get _analyze
{
"analyzer":"icu_analyzer"."text":"Hello, I'm Kaka."
}
Copy the code
Other Chinese participles
The most used IK participle is just a custom thesaurus that supports hot update dictionary
Thulac, a set of word dividers for natural Language of Tsinghua University
Six, the Search Api
Search through Url Query
Such as:
get /movies/_search? q=2012&df=title&sort=year:descCopy the code
Q: Specifies the Query statement. Query String Syntax is used
Df: query fields. If this parameter is not specified, all fields will be queried
Sort: Sort, FROM, and size are used for paging
Profile: You can view how queries are executed
Specify field query, generic query
Df = df = df = df = df = df
According to the information on the right of the following figure, the specified field queries the data that exists in 2012 in title
It is also possible to write a specified field query in this way
get /movies/_search? q=2012&df=title {"profile":true
}
Copy the code
As you can see on the right side of the figure below, a generic query looks for the number of 2012 in all fields
Group and quote queries
If the value of the query is Beautiful Mind, it is equivalent to Beautiful OR Mind, similar to the OR statement in MySQL
If you query for “Beautiful Mind”, it is equivalent to Beautiful AND Mind, similar to the AND statement in MySQL, meaning that the fields in the query must contain not only Beautiful but also Mind
Note: at first glance there is no difference, but the difference is the absence of quotation marks
# PhraseQuery
Beautiful and mind must exist in the title field
# "description" : """title:"beautiful mind""""get /movies/_search? q=title:"Beautiful Mind"
{
"profile":"true"
}
# TermQuery
"Beautiful" or "mind" is required
# "type" : "BooleanQuery",
# "description" : "title:beautiful title:mind",get /movies/_search? q=title:(Beautiful Mind) {"profile":"true"
}
Copy the code
Boolean operations
Can be used AND/OR/NOT OR && / | | /! Here you’ll notice that they’re all capitalized, with + for must and – for not mast
# title must contain beautiful and mind
# "description" : "+title:beautiful +title:mind"get /movies/_search? q=title:(Beautiful AND Mind) {"profile":"true"
}
# title "beautiful" must have no mind
# "description" : "title:beautiful -title:mind"get /movies/_search? q=title:(Beautiful NOT Mind) {"profile":"true"
}
# title contains beautiful, must also contain mind
# "description" : "title:beautiful +title:mind"get /movies/_search? q=title:(Beautiful %2BMind) {"profile":"true"
}
Copy the code
Range query, wildcard query, fuzzy matching
# year Films older than 1996
[] {}
# "description" : "year:[1997 TO 9223372036854775807]"get /movies/_search? q=year:>1996 {"profile":"true"
}
Data for b exists in # title
# "description" : "title:b*"get /movies/_search? q=title:b* {"profile":"true"
}
# For fuzzy matching is very necessary, because there will be a user will type the wrong word, we can do approximate matching
# "description" : "(the title: beautiful) ^ 0.875"get /movies/_search? q=title:beautifl~1 {"profile":"true"
}
Copy the code
7. Request Body Search
In daily development, the most common use is to do this in the Request Body, followed by the clicking instances
Normal query
Sort: indicates the field to be sorted
Source: Look up those fields
From: number of pages
Size: number of pages
post movies/_search
{
"profile":"true"."sort": [{"year":"desc"}]."_source": ["year"]."from": 0."size": 2."query": {"match_all": {}}}Copy the code
Script field
This application scenario is not consistent with the foreign currency function of Kakaka recently. Each contract has its own different exchange rate, so we need to figure out how much the contract amount is
post /movies/_search
{
"script_fields": {"new_field": {"script": {"lang":"painless"."source":Value + "doc [' year ']. 'year'"}}},"query": {"match_all": {}}}Copy the code
In this example, the current data is spelled with “year” of the new field and returned, the result is as follows
{
"_index" : "movies"."_type" : "_doc"."_id" : "3844"."_score": 1.0."fields" : {
"new_field" : [
"1989"]}}Copy the code
As you can see from the above result, only the script field is returned, but the original field is not returned.
_source :[“id”,”title”] = _source :[“id”,”title”]
post /movies/_search
{
"_source":"*"."script_fields": {"new_field": {"script": {"lang":"painless"."source":Value + "doc [' year ']. 'year'"}}},"query": {"match_all": {}}}Copy the code
View the returned result
{
"_index" : "movies"."_type" : "_doc"."_id" : "3843"."_score": 1.0."_source" : {
"year" : 1983,
"@version" : "1"."genre" : [
"Horror"]."id" : "3843"."title" : "Sleepaway Camp"
},
"fields" : {
"new_field" : [
"1983"]}}Copy the code
Query expression Match
# title contains sleepaway or camp
Get /movies/_search? Q =title:(Beautiful Mind) group query returns the same result
# "description" : "title:sleepaway title:camp"
get /movies/_doc/_search
{
"query": {"match": {"title":"Sleepaway Camp"}},"profile":"true"
}
# title must contain sleepaway and camp in the same order
Get /movies/_search? Q =title (Beautiful AND Mind) is consistent
# "description" : "+title:sleepaway +title:camp"
get /movies/_doc/_search
{
"query": {"match": {"title": {"query":"Sleepaway Camp"."operator":"AND"}}},"profile":"true"
}
The query Sleepaway in # title can have an arbitrary value inserted between Sleepaway and Camp
# get /movies/_search? q=title:beautifl~1
# "description" : """title:"sleepaway camp"~1"""
get /movies/_doc/_search
{
"query": {"match_phrase": {"title": {"query":"Sleepaway Camp"."slop": 1}}}."profile":"true"
}
Copy the code
Query String and Simple Query String
We can use and in the same way as the url Query String
Sleepaway and camp must be present in # title
# Get /movies/_search with url? Q = title: (Beautiful Mind)
# "description" : "+title:sleepaway +title:camp"
post /movies/_search
{
"query": {"query_string": {"default_field":"title"."query":"Sleepaway AND Camp"}},"profile":"true"
}
# simple_query_string does not support the use of and
A sleepaway or camp exists in the # title
# "description" : "title:sleepaway title:and title:camp"
post /movies/_search
{
"query": {"simple_query_string": {
"query": "Sleepaway AND Camp"."fields": ["title"]}},"profile":"true"
}
If you want simple_query_string to perform Boolean operations, add default_operator
Sleepaway and camp must be present in # title
# "description" : "+title:sleepaway +title:camp"
post /movies/_search
{
"query": {"simple_query_string": {
"query": "Sleepaway Camp"."fields": ["title"]."default_operator": "AND"}},"profile":"true"
}
Copy the code
9. Mapping and common field types
What is the Mapping
Mapping is similar to the schema in a database. It includes defining the field name of an index, defining the data type of a field, and configuring inverted index Settings
What is Dynamic Mapping
Mapping has an attribute called Dynamic, which defines how to handle new fields contained in new documents. The three values are optional and default to true
True: Once the document with the new field is written, the Mapping is also updated
False: The Mapping will not be updated and the new fields will not be indexed, but the information will appear in _source
Strict: The document fails to be written
Common type
Json type | ElasticSearch type |
---|---|
string | Format the date as data, float the float, integer as long, set it to text, and add the keyword subfield |
Boolean value | boolean |
Floating point Numbers | float |
The integer | long |
object | object |
An array of | Takes the type of the first non-null value |
control | ignore |
put kaka/_doc/1
{
"text":"kaka"."int": 10,"boole_text":"false"."boole":true."float_text":"1.234"."float": 1.234."loginData":"2005-11-24T22:20"
}
Get kaka mapping
get kaka/_mapping
Copy the code
Return the result, and you can tell from the result that if it’s false or true in quotes it’s text and you just have to be careful about that
{
"kaka" : {
"mappings" : {
"properties" : {
"boole" : {
"type" : "boolean"
},
"boole_text" : {
"type" : "text"."fields" : {
"keyword" : {
"type" : "keyword"."ignore_above": 256}}},"float" : {
"type" : "float"
},
"float_text" : {
"type" : "text"."fields" : {
"keyword" : {
"type" : "keyword"."ignore_above": 256}}},"int" : {
"type" : "long"
},
"loginData" : {
"type" : "date"
},
"text" : {
"type" : "text"."fields" : {
"keyword" : {
"type" : "keyword"."ignore_above": 256}}}}}}}Copy the code
Custom Mapping
Sets the field not to be indexed
To set a field to not use index, add index: false to the field and pay attention to the format of the mapping
Following the steps given by Kakha, you get an error such as Cannot search on field [mobile] since it is not indexed, which means you can’t search a field without indexing it
put kaka
{
"mappings": {"properties": {"firstName": {"type":"text"
},
"lastName": {"type":"text"
},
"mobile": {"type":"text"."index":false
}
}
}
}
post /kaka/_doc/1
{
"firstName":"kaka"."lastName":"Niu"."mobile":"123456"
}
get /kaka/_search
{
"query": {"match": {
"mobile":"123456"}}}Copy the code
Set the copy_to
Set as follows, copy_to set and then search can directly use the field you define for search
put kaka
{
"mappings": {"properties": {"firstName": {"type":"text"."copy_to":"allSearch"
},
"lastName": {"type":"text"."copy_to":"allSearch"}}}}Copy the code
For ease of view, insert two more pieces of data here
post /kaka/_doc/1
{
"fitstName":"kaka"."lastName":"niuniu"
}
post /kaka/_doc/2
{
"fitstName":"kaka"."lastName":"kaka niuniu"
}
Copy the code
The query only returns data with id 2, so using COPY_to means that all fields contain the search term
post /kaka/_search
{
"query": {"match": {"allSearch":"kaka"}},"profile":"true"
}
Copy the code
Custom word segmentation
The Tokenizer is composed of Character Fiters, Tokenizer and Token Filter
Character Filters are used to replace, add, and delete text. Multiple Character Filters can be configured. Note that setting the Character Filters will affect the position and offset information of the Tokenizer
Character Filters come with HTMl strips to remove HTMl tags, Mapping string replacements, and Pattern replace regular matching replacements
Tokenizer deals with word segmentation and has a lot of built-in word segmentation that you’ll see in detail in this second installment
Token Filters add, modify, and delete words after the Tokenizer segmentation, such as lowercase letters, stop to remove modifiers, synonyms, and so on
Customize Character Filters
# Character Fiters HTML replacement
# removes all HTML tags from text
post /_analyze
{
"tokenizer":"keyword"."char_filter": ["html_strip"]."text":" kaka chat "
}
# Character Fiters replacement value
# replaces "I" in text with "kaka" and "hope" with "wish"
post /_analyze
{
"tokenizer":"keyword"."char_filter":[
{
"type":"mapping"."mappings": ["i => kaka"."hope => wish"]}],"text":"I hope,if you don't expect quick success, you'll get a pawn every day."
}
# Character Fiters regular expression
Use regular expressions to obtain domain name information
post /_analyze
{
"tokenizer":"keyword"."char_filter":[
{
"type":"pattern_replace"."pattern":"http://(.*)"."replacement":"The $1"}]."text":"http://www.kakaxiantan.com"
}
Copy the code
Custom Token Filters
The current use of the word segmentation is whitespace, this word segmentation is separated by Spaces, but now also want to make the word smaller and filter modifiers, how to do?
post /_analyze
{
"tokenizer":"whitespace"."filter": ["stop"."lowercase"]."text":"If on you don't expect quick success, you'll get a pawn every day"
}
Copy the code
In order not to take up space, only representative returns are copied
{
"tokens": [{"token" : "if"."start_offset": 0."end_offset": 2."type" : "word"."position": 0}, {"token" : "you"."start_offset" : 6,
"end_offset" : 9,
"type" : "word"."position": 2}]}Copy the code
Practice custom word segmentation
Analyze is composed of Character Fiters, Tokenizer, and Token Filter, which can be customized
The user-defined participle must include Analyzer, Tokenizer, char_filter, and filter
This part of the definition is required to define the rules below, otherwise it cannot be used, detailed definition code to see the full version
Don’t memorize this configuration and you’ll remember it if you use it a lot
# Customize analyze
put kaka
{
"settings": {"analysis": {"analyzer": {"my_custom_analyzer": {"type":"custom"."char_filter": ["emoticons"]."tokenizer":"punctuation"."filter": ["lowercase"."englist_stop"]}},"tokenizer": {"punctuation": {"type":"keyword"}},"char_filter": {"emoticons": {"type":"mapping"."mappings": ["123 => Kaka"."456 => xian tan"]}},"filter": {"englist_stop": {"type":"stop"."stopwords":"_english_"
}
}
}
}
}
# perform a custom participle
post /kaka/_analyze
{
"analyzer":"my_custom_analyzer"."text":" 123 456"
}
Change the uppercase letter to lowercase
{
"tokens": [{"token" : " kaka xian tan"."start_offset": 0."end_offset" : 8,
"type" : "word"."position": 0}]}Copy the code
Index Template
After a new index is created and the document is inserted, the default setting and mapping will be used. If you have Settings and mappings configured, the default Settings and mappings will be overridden
Create index and insert document
post /kaka/_doc/1
{
"gongzhonghao":"123"
}
Get Settings, mappings
get /kaka
Copy the code
The following configuration is the default
# Settings/mappings
{
"kaka" : {
"aliases": {},"mappings" : {
"properties" : {
"gongzhonghao" : {
"type" : "text"."fields" : {
"keyword" : {
"type" : "keyword"."ignore_above": 256}}}}},"settings" : {
"index" : {
"creation_date" : "1642080577305"."number_of_shards" : "1"."number_of_replicas" : "1"."uuid" : "JJWsGYcrTam0foEQxuZqGQ"."version" : {
"created" : "7010099"
},
"provided_name" : "kaka"}}}}Copy the code
Next create a template of your own
Set up a template that can be used for any index starting with test. In this template, we convert the number in the string to long instead of text
put /_template/kaka_tmp
{
"index_patterns": ["test*"]."order": 1,"settings": {"number_of_shards": 1,"number_of_replicas": 2}."mappings": {# let time not parse as date, return as text
"date_detection":false.# parse the number in double quotes to long instead of text
"numeric_detection":true}}Copy the code
Create indexes
post /test_kaka/_doc/1
{
"name":"123"."date":"2022/01/13"
}
get /test_kaka
Copy the code
Returns the result
{
"test_kaka" : {
"aliases": {},"mappings" : {
"date_detection" : false."numeric_detection" : true."properties" : {
"date" : {
"type" : "text"."fields" : {
"keyword" : {
"type" : "keyword"."ignore_above": 256}}},"name" : {
"type" : "long"}}},"settings" : {
"index" : {
"creation_date" : "1642081053006"."number_of_shards" : "1"."number_of_replicas" : "2"."uuid" : "iCcaa_8-TXuymhfzQi31yA"."version" : {
"created" : "7010099"
},
"provided_name" : "test_kaka"}}}}Copy the code
Insist on learning, insist on writing, insist on sharing is the belief that Kaka has been upholding since he started his career. May the article in the big Internet can give you a little help, I am kaka, see you next time.