ElasticSearch: I’ve been working on ElasticSearch for a few days now. This is a learning and memory-saving document that doesn’t touch on the mechanics of ElasticSearch. I hope you can get started on ElasticSearch as soon as possible. If there are mistakes welcome correction.

The quickest way to learn something is to use it

The deployment of

ES of the deployment

The ES deployment is very simple. You can directly download the archive package, decompress it, and then unpack it for use. You can download the required version from the official website.

Wget tar XZVF - https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.3-linux-x86_64.tar.gz Elasticsearch - 7.9.3 - Linux - x86_64. Tar. Gz mv elasticsearch 7.9.3 - Linux - x86_64. Tar. Gz < target_dir >#Es does not support the root user startup, so you need to create a new user
groupadd es && useradd -r es -g es
chown -R es:es <target_dir>
Copy the code

Chinese word divider

The most common Chinese word divider, IK, is used as an example

The installation

elasticsearch-plugin  install \
https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.9.3/elasticsearch-analysis-ik-7.9.3.zip
Copy the code

Set up the

  • throughmappingMap to select which fields use the IK participle
  • throughPOST analyzeYou can specify a word divider to get word segmentation results
Curl -x POST /_analyze -D {"analyzer": "ik_smart", "text": "Aluminum nonstick micropressure 6L"}Copy the code

Custom thesaurus

cd <es_dir>/analysis-ik/
create my.dic
vim <es_dir>/analysis-ik/IKAnalyzer.cfg.xml
Copy the code

configuration

ES can be used with very little configuration, or even just the default configuration, if only for learning purposes. High availability is a feature of ES in design, so ES supports cluster deployment and only needs to add a few configurations to complete. ES also supports dynamic configuration at run.

Location of the configuration file

In the archive installation mode, the default configuration file directory is in the config folder. You can customize the configuration directory by setting environment variables ES_PATH_CONF=/path/to/my/config

  • elasticsearch.ymlES configuration file
  • jvm.optionsES the JVM configuration
  • log4j2.propertiesThe log configuration

Common Configuration Items

Only nodes with the same cluster name can build a cluster, default elasticSearch
cluster.name: my-es
cluster.initial_master_nodes: ["node1"] Initialize the master node, which is used to boot the cluster when it is initialized

node.name: node1 The name of the current node

path.data: /path/to/es/data # Data storage path
path.logs: /path/to/es/log  The path where logs are stored

Which IP addresses are accessible? Default: _local_
# _local_ 127.0.0.1
# _site_ Intranet access
# _global_ Globally accessible
network.host: _site_ 

http.port: 9200 # REST port, default 9200

transport.port: 9300     # Internal communication port between nodes
transport.compress: true # Enable compression between nodes. Default is false

# cluster ips
discovery.zen.ping.unicast.hosts: 
    - "192.168.0.1"
    - "192.168.0.2:9300"
    - "my.els.com"
# Minimum number of primary nodes. In order to prevent brain splitting, the number of cluster nodes is at least half +1
# Better have more than 3 nodes in a cluster, 2 of them might split...
discovery.zen.minimum_master_nodes: 2 
Copy the code

Startup exception and repair

  1. max file descriptors [4096] for elasticsearch process is too low, increase to at least [65535]
Vim/etc/security/limits. Conf * soft nofiles 65536 * hard nofiles # 65536 # save to login again after user use ulimit -s -n/ulimit - H - n see if effectiveCopy the code
  1. max number of threads [1024] for user [elasticsearch] is too low, increase to at least [4096]
    vim /etc/security/limits.d/90-nproc.conf
    *          soft    nproc     4096
Copy the code
  1. max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
    vim /etc/sysctl.conf
    vm.max_map_count=262144 
    sysctl -p
Copy the code
  1. system call filters failed to install; check the logs and fix your configuration or disable system call filters at your own risk
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * / / set bootstrap.system_call_filter to false for elasticsearch.yml / / bootstrap.memory_lock / / false bootstrap.system_call_filter: falseCopy the code

CURD operations for documents

In older versions (prior to 5.0), the ES terminology corresponds to MySQL:

The index = > library; Type = > table; Doc = > line.

In versions 7.0 and later, the concept of type was removed and index was actually understood as a collection of data, i.e. a table. Doc still corresponds to specific rows of data.

When it comes to indexing in ES, there is a very important concept of inverted indexing. Inverted indexing is a big reason why ES can search and sort quickly. ES indexes every field in a document that needs to be queried (unless it is specified that no search is required when creating the index map). The principle of inverted indexing has been explained in detail by other blogs, please refer to official documents or other articles.

Create indexes

There are two ways to create an index:

  1. Explicit throughmappingAPI creation, more on that later
  2. When a document is created for the first time, ES will automatically create the index and infer the field type of the index based on the document value.

The example is based on a virtual user information user, so the index to be created is user. ES provides RESTful apis, and parameters are accepted in JSON format

CURD of individual documents can be done through the _doc API, that is: /{index}/_doc/{id}

Create a document

#Pretty adds the pretty parameter, which is returned in a readable formatcurl -x PUT /user/_doc/1? Pretty -d {"name": "wang Old wu ", "intro":" Lonely lonely love poetry ", "gender": "M", "age": 26, "created": "The 2020-11-11 10:23:34}"
#Return the following data and the creation is successful
{
    "_index": "user",
    "_type": "_doc",
    "_id": "1",
    "_version": 1,
    "result": "created",
    "_shards": {
        "total": 2,
        "successful": 2,
        "failed": 0
    },
    "_seq_no": 0,
    "_primary_term": 1
}
Copy the code

Document creation is a PUT request, in which the document ID must be specified. You can also make a POST request without specifying the document ID

curl -x POST /user/_doc? Pretty - d {" name ":" wang six ", "type" : "if you brother, was at school", "gender" : "M", "age" : 21, "created" : "the 2020-11-11 10:30:34"}
#return{" _index ":" user ", "_type" : "_doc", "_id" : "S6vfznUBBA - Q0zh_ydNu", / / id at this time for the system to generate "_version" : 1, "result" : "created", "_shards": { "total": 2, "successful": 2, "failed": 0 }, "_seq_no": 1, "_primary_term": 1 }Copy the code

You can see the newly created index by viewing the index list API

curl -x GET /_cat/indices? pretty
#returnGreen Open user Me4Qr0m6SWybZqE2W6O8NA 1 1 1 0 12.5 KB 6.2 KBCopy the code

Get the document

Using the GET request doc API, you can retrieve a document with a specified ID

curl -x GET /user/_doc/1? pretty#return{ "_index": "user", "_type": "_doc", "_id": "1", "_version": 1, "_seq_no": 0, "_primary_term": 1, "found": True, "_source": {"name": "wang Laowu ", "intro":" Wang Laowu ", "Gender ": "M", "age": 26, "created": "The 2020-11-11 10:23:34}}"Copy the code

The first column indicates the data synchronization status. Green data has been synchronized to all nodes, yellow data has been synchronized to the primary node but not to the backup node, and Red data has not been synchronized to the backup node.

Update the document

There are two ways to update documents

  1. POST _update API
  2. Call the Create document API again

The first way is to update only some of the document’s fields, but either way, you’re actually replacing the original document and rebuilding the index.

curl -x POST /user/_update/1? Pretty -d {"doc": {"name": "Mr. Wang Laowu"}}Copy the code

Delete the document

curl -x DELETE /user/_doc/2? pretty#return
{
    "_index": "user",
    "_type": "_doc",
    "_id": "1",
    "_version": 3,
    "result": "deleted",
    "_shards": {
        "total": 2,
        "successful": 2,
        "failed": 0
    },
    "_seq_no": 4,
    "_primary_term": 1
}
Copy the code

search

DSL query syntax

The core function of ES is search. Search query has a very powerful DSL query syntax and supports aggregate query. ES also has built-in SQL statement to complete the search

An example provides a comprehensive description of the most commonly used DSL query syntax:

{
    // The query section describes the query criteria
    "query": {
        "match_all": {},// The null condition, i.e. search all records, is mutually exclusive with the following condition

        /* Query condition */
        /* Query conditions can be divided into basic query conditions (single condition) And combined query conditions (bool query, such as And(must)) */
        Bool bool bool bool bool bool bool bool bool bool bool bool bool bool bool bool
        
        // Whether the field is empty
        "exists": { 
            "field": "field_name".// or ["field_names", ...]
        },
        
        // Full text index
        // Specify k:v directly
        "match": { 
            "field_name": "value value2" // Multiple values are separated by Spaces. The default value is the same as or
        },
        // The other option is to set additional attributes
        "match": {
            "field_name": {
                "query": "keyword"."operator": "or".// operator, define the query relationship between field values, default or, variable more and
                "boost": 1.0.// Query statement integral weight, default 1.0
                'minimum_should_match' => 1,// Minimum matching parameter, number or percentage. When the operator is or, at least several conditions need to be matched}},// Exact match
        "term": {"field_name": "keyword"}, // Single value exact matching
        "terms": {"field_name": ["keywords"]},  // Multi-value precise matching
        
        // Range query, usually nested in filter
        // This query is equivalent to field_name >=1 && field_name < 10
        // range requires additional search. Allow_expensive_queries to support text or keyword fields
        "range": {  
            "field_name": {
                "lt": 10."gte": 1,}},// Multi-field matching
        "multi_match": { 
            "query": "keyword keyword2".// Field values, separated by Spaces
            "type": "best_fields".// BEST_fields (default), MOST_fields, and cross_fields (best field, most field, cross field)
            "operator": "or".// Operator, default or, variable more and
            "boost": 1.// The statement weight is used to calculate the score
            "minimum_should_match": 1.// The minimum number of matching conditions
        },
        
        /* Boolean query is a combination of nested queries */
        // bool query subitems can be nested into any base query and bool query
        // If you want to use a combined query, you can use bool
        "bool": { 
            "must": [].// AND
            "must_not": [].// NOT participate in the rating
            "should": [].// OR
            "filter": [].// Filter to filter the query results without participating in scoring}},/* limit */
    "from": 0."size": 10./* sort sort */
    // By default, sort by relevancy
    // If you specify a sorting field and do not specify relevancy to participate in the sorting, the query result will not calculate relevancy score
    "sort": {
        "num": "asc"."_score": "desc"
    },
    
    // Select the field to return
    "_source": ["field1"."field2"]
    / / TODO aggregation
}
Copy the code

Query using SQL syntax

The latest version of ES comes with a built-in Xpack for SQL syntax queries, internally translating SQL statements into DSL queries, and support for full-text indexing and aggregate syntax. Note that SQL only supports query, but does not support insert, update, or delete.

  1. SQL statements do not end with a semicolon
  2. SQL statements do not support returning array fields, nor do they support specifying return object fields (SELECT object from table).
curl -x POST /_xpack/sql? format=txt -d { "query": "DESC user" }#return
    column     |     type      |    mapping    
---------------+---------------+---------------
age            |BIGINT         |long           
created        |VARCHAR        |text           
created.keyword|VARCHAR        |keyword        
gender         |VARCHAR        |text           
gender.keyword |VARCHAR        |keyword        
intro          |VARCHAR        |text           
intro.keyword  |VARCHAR        |keyword        
name           |VARCHAR        |text           
name.keyword   |VARCHAR        |keyword        

Copy the code
Use full text indexMATCH(field[s], text, [options])
SELECT * FROM user WHERE MATCH(Intro, "Lonely House ")SELECT * FROM user WHERE MATCH(name, 'the king'.'operator=or; Cutoff_frequency = 0.2 'Query # query (expr, [options])SELECT * FROM user WHERE QUERY('king of name:)
Copy the code

ES also provides a translation API to translate SQ L statements into DSLS:

curl -x POST /_sql/translate -d
{
	"query": "SELECT * FROM user"
}
Copy the code

The index map

Index mapping can configure the document field type of the index and the word segmentation parser of the full-text index, etc. Calling the mapping interface will create the index, so the reconstruction of the index needs to delete the old index (not involved in the smooth reconstruction of the index).

Remove the index

curl -x DELETE /user
Copy the code

Mapping the index

Curl -x PUT /user -d {"mappings": {"properties": {// Common "name": {"type": "text" : {// common "name": {"type": "text" : {// common "name": {"type": "text" : True, // search_analyser: search_analyser: search_analyser: search_analyser: search_analyser: search_analyser "Ik_max_word" // multiple indexes // a single field can be used to create different indexes. Name can be either text or keyword. "fields: {"raw": {// name.raw can use this field "type": "keyword" // nested Settings... }}, // array or object" field2_array_or_object": {"dynamic": false, whether to dynamically index new fields, default false "properties": {"field2.item1": {} // Same as common field configuration //... } } } } }Copy the code

Common mapping field types

Detailed field documentation can be found here

  • Keyword A keyword type that stores values such as ids, status, and labels that do not need to be resolved

    When are keyword types needed?

    • You are not going to use a range query to search for identifier data
    • Quick retrieval is required. Term query searches on keyword fields are generally faster than term searches on number fields.
    • If you are not sure how to use it, use multiple index types ↓
  • Text Indicates the type of text used for analysis and can be retrieved in full text

    Analyzer sets word dividers, such as ik_smart ik_max_word, the default English word divider

  • Boolean Indicates the Boolean type

  • Numbers Number type

    Common numeric types

    • long => int64
    • integer => int32
    • double => float64
    • float => float32
  • Date Date type

{
    "created": {
        "type": "date".The format field can specify parsable types
        // We can parse 2020-11-11 10:23:34 and timestamp
        // Only formatted dates before 1970 are supported
        // epoch_seconds Supports timestamps accurate to the second
        "format": "yyyy-MM-dd HH:mm:ss||epoch_seconds",}}Copy the code
  • aliasAliases for other fields
{
    "field1": {
        "type": "long"
    },
    "field1_alias": {
        "type": "alias"."path": "field1"}}Copy the code
  • Object Indicates the type of the JSON object

    • The index type of an object, which is actually a deep single field of the object
    • Object If too many fields may cause an index explosionflattenedTo index the entire object
  • Array Array type

    • All elements of an array type should be of the same type
    • Array elements support objects
    • An array search may not result in the desired result, and can be optimized by referring to the nested type