ElasticSearch

Elasticsearch is a jSON-based distributed, scalable, real-time, RESTful search and data analysis engine that can solve a variety of emerging use cases. At the heart of the Elastic Stack, it stores your data centrally, helping you find what you expect and what you don’t expect.

Elasticsearch was developed with a data collection and log parsing engine called Logstash and an analysis and visualization platform called Kibana. The three products are designed as an integrated solution called the “Elastic Stack” (formerly the “ELK Stack”).

Elasticsearch can be used to search various documents. It provides scalable search, has near-real-time search, and supports multi-tenancy. Elasticsearch is distributed, which means the index can be split into shards with zero or more copies per shard. Each node hosts one or more shards and acts as a coordinator to delegate operations to the correct shards. Rebalancing and routing are done automatically. “Related data is usually stored in the same index, which consists of one or more primary shards and zero or more replication shards. Once an index is created, the number of master shards cannot be changed.

Elasticsearch uses Lucene and provides all of its features via JSON and Java apis. It supports facetting and percolating, which is useful for notifications if the new document matches the registered query. Another feature, called “gateways,” deals with long-term persistence of indexes; For example, in the case of a server crash, indexes can be recovered from the gateway. Elasticsearch supports real-time GET requests and is suitable as a NoSQL data store, but lacks distributed transactions.

ElasticSearch 7 environment setup

Ps: This document is based on ElasticSearch 7.12.1 and requires Java 8 or later. Ensure that the versions of ElasticSearch software or plug-ins are the same.

1. Install ElasticSearch

Download ElasticSearch

Decompress elasticSearch. bat in the bin directory

If all is well, a browser that accesses 127.0.0.1:9200 should see something like this:

{
  "name" : "DESKTOP-V4GSUJH"."cluster_name" : "elasticsearch"."cluster_uuid" : "4tnI-jAtTXqXbMDJ8CVRjQ"."version" : {
    "number" : "7.12.1"."build_flavor" : "default"."build_type" : "zip"."build_hash" : "3186837139b9c6b6d23c3200870651f10d3343b7"."build_date" : "The 2021-04-20 T20:56:39. 040728659 z"."build_snapshot" : false."lucene_version" : "8.8.0"."minimum_wire_compatibility_version" : "6.8.0"."minimum_index_compatibility_version" : "6.0.0 - beta1"
  },
  "tagline" : "You Know, for Search"
}
Copy the code

Elasticsearch file directory

Elasticsearch bin: indicates an executable file. Config: elasticSearch global Settings and your specific Settings. If you need to change the JVM, data path, log path, etc., you need to change this. And the port Settings and so on are all here. Data: your index data, that is, the place where you store the specific data that you search for. JDK: JDK, negligible lib: jar package logs: some log files modules: some modules that you can't delete. Plugins are places where plugins are placed, such as third party word splittersCopy the code

2, install ElasticSearch-head plugin

Install ElasticSearch-head plugin

Unzip the head plugin, go to elasticSearch-head, and run the following command to start:

NPM install NPM run start // Start the pluginCopy the code

After the startup is successful, visit http://localhost:9100

Resolve cross-domain problems

Update Elasticsearch config/elasticsearch.yml

http.cors.enabled: true http.cors.allow-origin: “*”

3. Install Kibana

Download Kibana

Kibana is an open source analysis and visualization platform for Elasticsearch. You can use Kibana to search, view and interact with data stored in the Elasticsearch index. You can easily implement advanced data analysis and visualization in the form of charts.

1) unzip Kibana

② Go to kibana/bin and run Kibana.bat

For internationalization, enter config/kibana.yml, add i18n.locale: “zh-cn” to the end of the text, and change the Kibana interface to Chinese.

4, installation,ikChinese word divider

Download the elasticsearch – analysis – ik

Directly into Elasticsearch/Elasticsearch – 7.12.1 / plugins/bottom line (first built a ik under plugins folder).

This can be tested in the Kibana console:

  • Ik_smart: minimum sharding
  • Ik_max_word: finer-grained shard
GET _analyze
{
  "analyzer": "ik_smart"."text": ["I'm a good student."]
}

GET _analyze
{
  "analyzer": "ik_max_word"."text": ["I'm a good student."]}Copy the code

You can go to plugins and write your own dictionary my.dic. Separate multiple dictionaries with a semicolon.

Three, basic operation

Rest Style Description

A software architectural style, rather than a standard, provides a set of design principles and constraints. It is mainly used for client and server interaction class software. Software designed in this style can be simpler. More hierarchical, easier to implement mechanisms such as caching. Basic Rest command description :(deprecated types from es7, so you can no longer write type names in urls, or write _doc)

method The url address describe
PUT Localhost: 9200 / index | name/type/document id Create document (specify document ID)
POST Localhost: 9200 / index | name/type name Create document (random document ID)
POST Localhost: 9200 / index | name/type/document id / _update Modify the document
DELETE Localhost: 9200 / index | name/type/document id Delete the document
GET Localhost: 9200 / index | name/type/document id Query documents by document ID
POST Localhost: 9200 / index _search | name/type name Query all data

The basic concept

The Node and Cluster

Elastic is essentially a distributed database that allows multiple servers to work together and each server can run multiple Elastic instances.

A single Elastic instance is called a node. A group of nodes forms a cluster.

Index

The Elastic indexes all the fields, and after processing, writes a Inverted Index. When looking for data, look up the index directly.

So the top-level unit of Elastic data management is called an Index. It is a synonym for a single database. The name of each Index (that is, database) must be lowercase.

The following command displays all indexes of the current node.

GET _cat/indices? vCopy the code
Document

The single record inside Index is called a Document. A number of documents form an Index.

Document is represented in JSON format, and here is an example.

{
 "user": "Zhang"."title": "Engineer"."desc": "Database Management"
}
Copy the code

Documents in the same Index are not required to have the same structure (scheme), but it is better to keep the same, so as to improve the search efficiency.

Type

Document can be grouped, for example, in the weather Index, it can be grouped by city (Beijing and Shanghai), or by climate (sunny and rainy days). This grouping is called Type, which is a virtual logical grouping used to filter documents.

Different types should have similar schemas. For example, an ID field cannot be a string in one group and a number in another. This is a difference from tables in a relational database. Data of completely different natures (such as products and logs) should be stored as two indexes instead of two Types in one Index (although that is possible).

The following command lists the types contained in each Index.

GET _mapping? pretty=true
Copy the code

As planned, Elastic 6.x will only allow one Type per Index and will remove Type entirely.

Operations on indexes

Create indexes

PUT /test1/_doc/1
{
  "name": "Wang"."age": 18
}
Copy the code

_doc is the default type and the type is inferred automatically.

Data type:

  • The value can be text, keyword
  • Value types: Long, INTEGER, short, byte, double, float, half float, scaled Float
  • Date type: date
  • Boolean type: Boolean
  • Binary: binary
  • , etc.

Create rules

PUT /test2
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text"
      },
      "age": {
        "type": "long"}}}}Copy the code

To get the index, run the _cat command

GET _cat/plugins
Copy the code

Modify the index

1. Method 1:

PUT /test1/_doc/1
{
  "name": "Wang"."age": 19
}
Copy the code

Change a value directly, but if you miss a value, it will be null.

2, method 2 :(do not add _update, other attributes will be null)

POST /test1/_doc/1/_update
{
  "doc": {"age": 20}}Copy the code

Operations on documents

Return all records

Using the GET method, request /Index/Type/_search directly and all records will be returned.

GET accounts/person/_search
{
  "took": 2."timed_out":false."_shards": {"total": 5,"successful": 5,"failed": 0}."hits": {"total": 2."max_score": 1.0."hits":[
      {
        "_index":"accounts"."_type":"person"."_id":"AV3qGfrC6jMbsbXb6k1p"."_score": 1.0."_source": {
          "user": "Bill"."title": "Engineer"."desc": "System Management"}}, {"_index":"accounts"."_type":"person"."_id":"1"."_score": 1.0."_source": {
          "user" : "Zhang"."title" : "Engineer"."desc" : "Database management, software development"}}]}}Copy the code

In the above code, the took field of the returned result indicates the operation time (in milliseconds), the timed_OUT field indicates whether the operation timed out, and the hits field indicates the hit record. The meanings of the inside field are as follows.

  • total: Number of returned records, 2 in this example.
  • max_score: The highest degree of matching, in this example1.0.
  • hits: An array of returned records.

Each of the returned records has a _score field, which indicates the matching program, and the default is in descending order by this field.

Full-text search

More and moreThe query syntax, see the website

Elastic’s queries are unique in that they use their own query syntax and require GET requests with data bodies.

GET accounts/person/_search
{
  "query" : { "match" : { "desc" : "Software"}}}Copy the code

The above code uses a Match query that specifies a Match condition for the desc field containing the word “software”. The result is as follows.

{
  "took":3."timed_out":false."_shards": {"total":5."successful":5."failed":0},
  "hits": {"total":1."max_score":0.28582606."hits":[
      {
        "_index":"accounts"."_type":"person"."_id":"1"."_score":0.28582606."_source": {
          "user" : "Zhang"."title" : "Engineer"."desc" : "Database management, software development"}}]}}Copy the code

By default Elastic returns 10 results at a time, which can be changed with the size field.

GET accounts/person/_search
{
  "query" : { "match" : { "desc" : "Management" }},
  "size": 1}Copy the code

The code above specifies that only one result is returned at a time.

You can also specify the displacement via the FROM field.

GET accounts/person/_search
{
  "query" : { "match" : { "desc" : "Management" }},
  "from": 1,
  "size": 1}Copy the code

The code above specifies that starting at position 1 (default starting at position 0), only one result is returned.

Logical operations

If there are multiple search keywords, Elastic considers them to be AN OR relationship.

GET accounts/person/_search
{
  "query" : { "match" : { "desc" : "Software System"}}}Copy the code

The code above searches for software or systems.

If you want to perform an AND search for multiple keywords, you must use a Boolean query.

GET accounts/person/_search
{
  "query": {
    "bool": {
      "must": [{"match": { "desc": "Software"}}, {"match": { "desc": "System"}}]}}}Copy the code

More and moreThe query syntax, see the website

Documentation