ElasticSearch: I’ve been working on ElasticSearch for a few days now. This is a learning and memory-saving document that doesn’t touch on the mechanics of ElasticSearch. I hope you can get started on ElasticSearch as soon as possible. If there are mistakes welcome correction.
The quickest way to learn something is to use it
The deployment of
ES of the deployment
The ES deployment is very simple. You can directly download the archive package, decompress it, and then unpack it for use. You can download the required version from the official website.
Wget tar XZVF - https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.3-linux-x86_64.tar.gz Elasticsearch - 7.9.3 - Linux - x86_64. Tar. Gz mv elasticsearch 7.9.3 - Linux - x86_64. Tar. Gz < target_dir >#Es does not support the root user startup, so you need to create a new user
groupadd es && useradd -r es -g es
chown -R es:es <target_dir>
Copy the code
Chinese word divider
The most common Chinese word divider, IK, is used as an example
The installation
elasticsearch-plugin install \
https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.9.3/elasticsearch-analysis-ik-7.9.3.zip
Copy the code
Set up the
- through
mapping
Map to select which fields use the IK participle- through
POST analyze
You can specify a word divider to get word segmentation results
Curl -x POST /_analyze -D {"analyzer": "ik_smart", "text": "Aluminum nonstick micropressure 6L"}Copy the code
Custom thesaurus
cd <es_dir>/analysis-ik/
create my.dic
vim <es_dir>/analysis-ik/IKAnalyzer.cfg.xml
Copy the code
configuration
ES can be used with very little configuration, or even just the default configuration, if only for learning purposes. High availability is a feature of ES in design, so ES supports cluster deployment and only needs to add a few configurations to complete. ES also supports dynamic configuration at run.
Location of the configuration file
In the archive installation mode, the default configuration file directory is in the config folder. You can customize the configuration directory by setting environment variables ES_PATH_CONF=/path/to/my/config
elasticsearch.yml
ES configuration filejvm.options
ES the JVM configurationlog4j2.properties
The log configuration
Common Configuration Items
Only nodes with the same cluster name can build a cluster, default elasticSearch
cluster.name: my-es
cluster.initial_master_nodes: ["node1"] Initialize the master node, which is used to boot the cluster when it is initialized
node.name: node1 The name of the current node
path.data: /path/to/es/data # Data storage path
path.logs: /path/to/es/log The path where logs are stored
Which IP addresses are accessible? Default: _local_
# _local_ 127.0.0.1
# _site_ Intranet access
# _global_ Globally accessible
network.host: _site_
http.port: 9200 # REST port, default 9200
transport.port: 9300 # Internal communication port between nodes
transport.compress: true # Enable compression between nodes. Default is false
# cluster ips
discovery.zen.ping.unicast.hosts:
- "192.168.0.1"
- "192.168.0.2:9300"
- "my.els.com"
# Minimum number of primary nodes. In order to prevent brain splitting, the number of cluster nodes is at least half +1
# Better have more than 3 nodes in a cluster, 2 of them might split...
discovery.zen.minimum_master_nodes: 2
Copy the code
Startup exception and repair
max file descriptors [4096] for elasticsearch process is too low, increase to at least [65535]
Vim/etc/security/limits. Conf * soft nofiles 65536 * hard nofiles # 65536 # save to login again after user use ulimit -s -n/ulimit - H - n see if effectiveCopy the code
max number of threads [1024] for user [elasticsearch] is too low, increase to at least [4096]
vim /etc/security/limits.d/90-nproc.conf
* soft nproc 4096
Copy the code
max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
vim /etc/sysctl.conf
vm.max_map_count=262144
sysctl -p
Copy the code
system call filters failed to install; check the logs and fix your configuration or disable system call filters at your own risk
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * / / set bootstrap.system_call_filter to false for elasticsearch.yml / / bootstrap.memory_lock / / false bootstrap.system_call_filter: falseCopy the code
CURD operations for documents
In older versions (prior to 5.0), the ES terminology corresponds to MySQL:
The index = > library; Type = > table; Doc = > line.
In versions 7.0 and later, the concept of type was removed and index was actually understood as a collection of data, i.e. a table. Doc still corresponds to specific rows of data.
When it comes to indexing in ES, there is a very important concept of inverted indexing. Inverted indexing is a big reason why ES can search and sort quickly. ES indexes every field in a document that needs to be queried (unless it is specified that no search is required when creating the index map). The principle of inverted indexing has been explained in detail by other blogs, please refer to official documents or other articles.
Create indexes
There are two ways to create an index:
- Explicit through
mapping
API creation, more on that later - When a document is created for the first time, ES will automatically create the index and infer the field type of the index based on the document value.
The example is based on a virtual user information user, so the index to be created is user. ES provides RESTful apis, and parameters are accepted in JSON format
CURD of individual documents can be done through the _doc API, that is: /{index}/_doc/{id}
Create a document
#Pretty adds the pretty parameter, which is returned in a readable formatcurl -x PUT /user/_doc/1? Pretty -d {"name": "wang Old wu ", "intro":" Lonely lonely love poetry ", "gender": "M", "age": 26, "created": "The 2020-11-11 10:23:34}"
#Return the following data and the creation is successful
{
"_index": "user",
"_type": "_doc",
"_id": "1",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1
}
Copy the code
Document creation is a PUT request, in which the document ID must be specified. You can also make a POST request without specifying the document ID
curl -x POST /user/_doc? Pretty - d {" name ":" wang six ", "type" : "if you brother, was at school", "gender" : "M", "age" : 21, "created" : "the 2020-11-11 10:30:34"}
#return{" _index ":" user ", "_type" : "_doc", "_id" : "S6vfznUBBA - Q0zh_ydNu", / / id at this time for the system to generate "_version" : 1, "result" : "created", "_shards": { "total": 2, "successful": 2, "failed": 0 }, "_seq_no": 1, "_primary_term": 1 }Copy the code
You can see the newly created index by viewing the index list API
curl -x GET /_cat/indices? pretty
#returnGreen Open user Me4Qr0m6SWybZqE2W6O8NA 1 1 1 0 12.5 KB 6.2 KBCopy the code
Get the document
Using the GET request doc API, you can retrieve a document with a specified ID
curl -x GET /user/_doc/1? pretty#return{ "_index": "user", "_type": "_doc", "_id": "1", "_version": 1, "_seq_no": 0, "_primary_term": 1, "found": True, "_source": {"name": "wang Laowu ", "intro":" Wang Laowu ", "Gender ": "M", "age": 26, "created": "The 2020-11-11 10:23:34}}"Copy the code
The first column indicates the data synchronization status. Green data has been synchronized to all nodes, yellow data has been synchronized to the primary node but not to the backup node, and Red data has not been synchronized to the backup node.
Update the document
There are two ways to update documents
POST _update
API- Call the Create document API again
The first way is to update only some of the document’s fields, but either way, you’re actually replacing the original document and rebuilding the index.
curl -x POST /user/_update/1? Pretty -d {"doc": {"name": "Mr. Wang Laowu"}}Copy the code
Delete the document
curl -x DELETE /user/_doc/2? pretty#return
{
"_index": "user",
"_type": "_doc",
"_id": "1",
"_version": 3,
"result": "deleted",
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"_seq_no": 4,
"_primary_term": 1
}
Copy the code
search
DSL query syntax
The core function of ES is search. Search query has a very powerful DSL query syntax and supports aggregate query. ES also has built-in SQL statement to complete the search
An example provides a comprehensive description of the most commonly used DSL query syntax:
{
// The query section describes the query criteria
"query": {
"match_all": {},// The null condition, i.e. search all records, is mutually exclusive with the following condition
/* Query condition */
/* Query conditions can be divided into basic query conditions (single condition) And combined query conditions (bool query, such as And(must)) */
Bool bool bool bool bool bool bool bool bool bool bool bool bool bool bool bool
// Whether the field is empty
"exists": {
"field": "field_name".// or ["field_names", ...]
},
// Full text index
// Specify k:v directly
"match": {
"field_name": "value value2" // Multiple values are separated by Spaces. The default value is the same as or
},
// The other option is to set additional attributes
"match": {
"field_name": {
"query": "keyword"."operator": "or".// operator, define the query relationship between field values, default or, variable more and
"boost": 1.0.// Query statement integral weight, default 1.0
'minimum_should_match' => 1,// Minimum matching parameter, number or percentage. When the operator is or, at least several conditions need to be matched}},// Exact match
"term": {"field_name": "keyword"}, // Single value exact matching
"terms": {"field_name": ["keywords"]}, // Multi-value precise matching
// Range query, usually nested in filter
// This query is equivalent to field_name >=1 && field_name < 10
// range requires additional search. Allow_expensive_queries to support text or keyword fields
"range": {
"field_name": {
"lt": 10."gte": 1,}},// Multi-field matching
"multi_match": {
"query": "keyword keyword2".// Field values, separated by Spaces
"type": "best_fields".// BEST_fields (default), MOST_fields, and cross_fields (best field, most field, cross field)
"operator": "or".// Operator, default or, variable more and
"boost": 1.// The statement weight is used to calculate the score
"minimum_should_match": 1.// The minimum number of matching conditions
},
/* Boolean query is a combination of nested queries */
// bool query subitems can be nested into any base query and bool query
// If you want to use a combined query, you can use bool
"bool": {
"must": [].// AND
"must_not": [].// NOT participate in the rating
"should": [].// OR
"filter": [].// Filter to filter the query results without participating in scoring}},/* limit */
"from": 0."size": 10./* sort sort */
// By default, sort by relevancy
// If you specify a sorting field and do not specify relevancy to participate in the sorting, the query result will not calculate relevancy score
"sort": {
"num": "asc"."_score": "desc"
},
// Select the field to return
"_source": ["field1"."field2"]
/ / TODO aggregation
}
Copy the code
Query using SQL syntax
The latest version of ES comes with a built-in Xpack for SQL syntax queries, internally translating SQL statements into DSL queries, and support for full-text indexing and aggregate syntax. Note that SQL only supports query, but does not support insert, update, or delete.
- SQL statements do not end with a semicolon
- SQL statements do not support returning array fields, nor do they support specifying return object fields (SELECT object from table).
curl -x POST /_xpack/sql? format=txt -d { "query": "DESC user" }#return
column | type | mapping
---------------+---------------+---------------
age |BIGINT |long
created |VARCHAR |text
created.keyword|VARCHAR |keyword
gender |VARCHAR |text
gender.keyword |VARCHAR |keyword
intro |VARCHAR |text
intro.keyword |VARCHAR |keyword
name |VARCHAR |text
name.keyword |VARCHAR |keyword
Copy the code
Use full text indexMATCH(field[s], text, [options])
SELECT * FROM user WHERE MATCH(Intro, "Lonely House ")SELECT * FROM user WHERE MATCH(name, 'the king'.'operator=or; Cutoff_frequency = 0.2 'Query # query (expr, [options])SELECT * FROM user WHERE QUERY('king of name:)
Copy the code
ES also provides a translation API to translate SQ L statements into DSLS:
curl -x POST /_sql/translate -d
{
"query": "SELECT * FROM user"
}
Copy the code
The index map
Index mapping can configure the document field type of the index and the word segmentation parser of the full-text index, etc. Calling the mapping interface will create the index, so the reconstruction of the index needs to delete the old index (not involved in the smooth reconstruction of the index).
Remove the index
curl -x DELETE /user
Copy the code
Mapping the index
Curl -x PUT /user -d {"mappings": {"properties": {// Common "name": {"type": "text" : {// common "name": {"type": "text" : {// common "name": {"type": "text" : True, // search_analyser: search_analyser: search_analyser: search_analyser: search_analyser: search_analyser "Ik_max_word" // multiple indexes // a single field can be used to create different indexes. Name can be either text or keyword. "fields: {"raw": {// name.raw can use this field "type": "keyword" // nested Settings... }}, // array or object" field2_array_or_object": {"dynamic": false, whether to dynamically index new fields, default false "properties": {"field2.item1": {} // Same as common field configuration //... } } } } }Copy the code
Common mapping field types
Detailed field documentation can be found here
-
Keyword A keyword type that stores values such as ids, status, and labels that do not need to be resolved
When are keyword types needed?
- You are not going to use a range query to search for identifier data
- Quick retrieval is required. Term query searches on keyword fields are generally faster than term searches on number fields.
- If you are not sure how to use it, use multiple index types ↓
-
Text Indicates the type of text used for analysis and can be retrieved in full text
Analyzer sets word dividers, such as ik_smart ik_max_word, the default English word divider
-
Boolean Indicates the Boolean type
-
Numbers Number type
Common numeric types
- long => int64
- integer => int32
- double => float64
- float => float32
-
Date Date type
{
"created": {
"type": "date".The format field can specify parsable types
// We can parse 2020-11-11 10:23:34 and timestamp
// Only formatted dates before 1970 are supported
// epoch_seconds Supports timestamps accurate to the second
"format": "yyyy-MM-dd HH:mm:ss||epoch_seconds",}}Copy the code
alias
Aliases for other fields
{
"field1": {
"type": "long"
},
"field1_alias": {
"type": "alias"."path": "field1"}}Copy the code
-
Object Indicates the type of the JSON object
- The index type of an object, which is actually a deep single field of the object
- Object If too many fields may cause an index explosion
flattened
To index the entire object
-
Array Array type
- All elements of an array type should be of the same type
- Array elements support objects
- An array search may not result in the desired result, and can be optimized by referring to the nested type