Using a script

"script": { "lang": "..." , "source" | "id": "..." , "params": { ... }}Copy the code

Parameter Description:

Lang: The language used by the script. The default is Painless.

Source: The core part of the script, where the ID is applied to: stored Script.

Params: Variable arguments passed to the script for use.

Other scripts:

Expression: Low overhead per document: Expressions do more and can be executed very quickly, even faster than writing native scripts, supporting a subset of javascript syntax: a single expression. Disadvantages: Can only access numeric, Boolean, date and GEO_point fields, stored fields are not available

Mustache: Provides template parameterized queries

Java:

Painless:

Painless is a simple scripting language for Elasticsearch, for inlining and storing scripts, similar to Java, but also with annotations, keywords, types, variables, functions, and more. It is the default scripting language for Elasticsearch and can be safely used to inline and store scripts.

www.elastic.co/guide/en/el…

Using the tutorial

Update: Updates a document

POST /product2/_update/2
{
  "script": {
    "source": "ctx._source.tags.add(params.tag)", 
    "lang": "painless",
    "params": {
      "tag": "phone"
    }
  }
}
Copy the code

Reindex: Copies documents from one index to another

POST test/_bulk
{"index":{"_id":"1"}}
{"counter":1,"tags":["red"]}
{"index":{"_id":"2"}}
{"counter":2,"tags":["green"]}
{"index":{"_id":"3"}}
{"counter":3,"tags":["blue"]}
{"index":{"_id":"4"}}
{"counter":4,"tags":["white"]}
DELETE test_2

POST _reindex
{
  "source": {
    "index": "test"
  },
  "dest": {
    "index": "test_2"
  },
  "script": {
    "source": "if (ctx._source.tags.contains('red')) { ctx._source.tags.remove(ctx._source.tags.indexOf('red')) }",
    "lang": "painless"
  }
}

GET /test_2/_search
Copy the code

Script query: script-based filtering query, mostly used in filter. Source must return Boolean type

GET test/_search
{
  "query": {
    "bool": {
      "filter": {
        "script": {
          "script": {
            "source": "doc['counter'].value > 3",
            "lang": "painless"
          }
        }
      }
    }
  }
}
Copy the code

Script Score query/function Score Query: allows users to modify document score flexibly in the retrieval, to achieve their own intervention results ranking purposes

POST test/_bulk
{"index":{"_id":"1"}}
{"counter":1,"msg":"hello"}
{"index":{"_id":"2"}}
{"counter":2,"msg":"hello,hello"}
{"index":{"_id":"3"}}
{"counter":3,"msg":"hello,welcome"}
{"index":{"_id":"4"}}
{"counter":4,"msg":"ello"}
GET test/_search
{
  "query": {
    "match": {
      "msg": "hello"
    }
  }
}
#script score query
GET test/_search
{
  "query": {
    "script_score": {
      "query": {
        "match": {
          "msg": "hello"
        }
      },
      "script": {
        "source": "doc['counter'].value + _score "
      }
    }
  }
}
Copy the code

Search template: search template

POST test/_bulk {"index":{"_id":"1"}} {"counter":1,"msg":"hello"} {"index":{"_id":"2"}} {"counter":2,"msg":"hello,hello"} {"index":{"_id":"3"}} {"counter":20,"msg":"hello,welcome"} {"index":{"_id":"4"}} {"counter":4," MSG ":"ello"} # define search template POST _scripts/my_search_template {" script": {" lang": "mustache", "source": { "query": { "match": { "msg": "{{query_string}}"}}}}} # query template GET _scripts/my_search_template # query template GET test/_search/template {"id": "My_search_template ", "params": {"query_string": "hello"}} """ { "query":{ "match":{ "msg":"hello" } } } """, "params": { "query_string": "Hello"}} # DELETE template DELETE _scripts/my_search_templateCopy the code

Stored scripts: can be understood as script templates

POST _scripts/calculate-discount { "script": { "lang": "painless", "source": "Doc ['price'].value * params.discount"}} GET _scripts/calculate-discount DELETE _scripts/calculate-discount  GET product2/_search { "script_fields": { "discount_price": { "script": { "id":"calculate-discount", "params": {"discount": 0.8}}}}}Copy the code

Reading and writing principles

Write data steps:

The client selects a node to send the request to, and the node is a coordinating node. 2. Forwards the request to the corresponding node (with a Primary shard). 3. The Primary shard on the actual node processes the request and synchronizes data to the replica node. 4. If it is found that the primary node and all replica nodes are completed, coordinating node returns the response result to the client.

1. Write to the memory buffer first, in which the data cannot be searched; Data is also written to a Translog log file. If the segment file is nearly full or a certain amount of time has elapsed, the segment file is refreshed to a new segment file, but not directly to the SEGMENT file. This process is refresh. Every 1 second, es writes a new segment file to the buffer. Every second, ES writes a new segment file to the buffer. This segment file stores the data written to the buffer in the last 1 second. 2. By default, the buffer executes refresh once every second to flush a new segment file. Before data is written to a disk file, it goes to the OS cache, a memory cache at the operating system level. As soon as the data in the buffer is flushed into the OS cache by the refresh operation, the data can be searched. 3. Why is ES quasi-real-time? NRT stands for near real-time. The default is refresh every second, so ES is quasi-real-time because written data is not seen for 1s. You can use the ES restful API or Java API to manually perform a refresh operation, that is, manually flush the data in the buffer to the OS cache, so that the data can be immediately searched. As soon as the data is entered into the OS cache, the buffer is cleared, because the data is persisted to disk in the Translog because there is no need to keep the buffer. 4. Repeat the above steps. New data is continuously entered into buffer and translog, and the buffer data is continuously written to a new segment file. As this process progresses, translog gets bigger and bigger. When the translog reaches a certain length, the COMMIT operation is triggered. 5. The first step of the commit is to refresh the existing buffer data to the OS cache. Then write a commit point to a disk file that identifies all segment files corresponding to the commit point, and force all current data in the OS cache to fsync to the disk file. Finally, the existing Translog log files are emptied, a Translog is restarted, and the COMMIT operation is complete. 6. This commit operation is called flush. Flush is automatically executed every 30 minutes by default, but if translog is too large, flush is triggered. Flush corresponds to the commit process. You can manually flush fsync data from the OS cache to disk through the ES API. Translog is written to the OS cache first and is flushed to disk every 5 seconds by default. Therefore, by default, 5 seconds of data may be stored in the OS buffer or translog file. If the machine hangs at this point, 5 seconds of data will be lost. But this way the performance is better, the maximum loss of 5 seconds of data. You can also set translog so that every write must be fsync directly to disk, but performance will be much worse. Summary: Data is written to the buffer, then refreshed to the OS cache every 1s, where the data can be searched (es is near real-time). Write data to a Translog file every 5 seconds (so that if the machine is down and there is no data in memory, up to 5 seconds of data will be lost). When the translog reaches a certain point, or every 30 minutes by default, the commit operation will be triggered. Flush all buffer data into segment file. After the data is written to the segment file, an inverted index is created.

Read data steps:

SQL > select * from shard where doc ID is assigned. SQL > select * from shard where DOC ID is assigned. 1. The client sends the request to any node as coordinate Node 2. The Coordinate node hacks the doc ID and forwards the request to the corresponding node. Select a random replica from the primary shard and all replicas to balance read request loads. 3. The node receiving the request returns document to coordinate Node. 4. Coordinate Node returns the document to the client.

The most powerful part of ES is to do full text retrieval. 1. The client sends the request to a Coordinate Node. 2. Coordinate nodes to forward search requests to the primary shard or Replica Shard corresponding to all shards, either of which is ok. 3. Query Phase: Each SHard will return its search results (actually some DOC ids) to the coordination node, which will perform data merging, sorting, paging and other operations to produce the final results. 4. Fetch Phase: Then the coordination node pulls the actual Document data from each node according to the DOC ID, and finally returns it to the client.

Delete/update data underlying principles

If the doc is deleted, a.del file is generated at commit time, which identifies the doc as deleted. Therefore, the.del file is used to determine whether the DOC is deleted. In the case of an update operation, the original doc is marked as deleted and a new piece of data is written. A segment file is generated every time the buffer refresh, so by default, the segment file is generated every 1 second. As a result, more and more segment files are generated. When merging multiple segment files into one, the doc identified as deleted is physically deleted, and the new segment file is written to the disk. A commit point is written. Identify all new segment files, then open the segment file for search, and delete the old segment file.

ElasticSearch complete directory

Elasticsearch is the basic application of Elasticsearch.Elasticsearch Mapping is the basic application of Elasticsearch.Elasticsearch is the basic application of Elasticsearch Elasticsearch tF-IDF algorithm and advanced search 8.Elasticsearch ELK