ElasticSearch
Elasticsearch is a jSON-based distributed, scalable, real-time, RESTful search and data analysis engine that can solve a variety of emerging use cases. At the heart of the Elastic Stack, it stores your data centrally, helping you find what you expect and what you don’t expect.
Elasticsearch was developed with a data collection and log parsing engine called Logstash and an analysis and visualization platform called Kibana. The three products are designed as an integrated solution called the “Elastic Stack” (formerly the “ELK Stack”).
Elasticsearch can be used to search various documents. It provides scalable search, has near-real-time search, and supports multi-tenancy. Elasticsearch is distributed, which means the index can be split into shards with zero or more copies per shard. Each node hosts one or more shards and acts as a coordinator to delegate operations to the correct shards. Rebalancing and routing are done automatically. “Related data is usually stored in the same index, which consists of one or more primary shards and zero or more replication shards. Once an index is created, the number of master shards cannot be changed.
Elasticsearch uses Lucene and provides all of its features via JSON and Java apis. It supports facetting and percolating, which is useful for notifications if the new document matches the registered query. Another feature, called “gateways,” deals with long-term persistence of indexes; For example, in the case of a server crash, indexes can be recovered from the gateway. Elasticsearch supports real-time GET requests and is suitable as a NoSQL data store, but lacks distributed transactions.
ElasticSearch 7 environment setup
Ps: This document is based on ElasticSearch 7.12.1 and requires Java 8 or later. Ensure that the versions of ElasticSearch software or plug-ins are the same.
1. Install ElasticSearch
Download ElasticSearch
Decompress elasticSearch. bat in the bin directory
If all is well, a browser that accesses 127.0.0.1:9200 should see something like this:
{
"name" : "DESKTOP-V4GSUJH"."cluster_name" : "elasticsearch"."cluster_uuid" : "4tnI-jAtTXqXbMDJ8CVRjQ"."version" : {
"number" : "7.12.1"."build_flavor" : "default"."build_type" : "zip"."build_hash" : "3186837139b9c6b6d23c3200870651f10d3343b7"."build_date" : "The 2021-04-20 T20:56:39. 040728659 z"."build_snapshot" : false."lucene_version" : "8.8.0"."minimum_wire_compatibility_version" : "6.8.0"."minimum_index_compatibility_version" : "6.0.0 - beta1"
},
"tagline" : "You Know, for Search"
}
Copy the code
Elasticsearch file directory
Elasticsearch bin: indicates an executable file. Config: elasticSearch global Settings and your specific Settings. If you need to change the JVM, data path, log path, etc., you need to change this. And the port Settings and so on are all here. Data: your index data, that is, the place where you store the specific data that you search for. JDK: JDK, negligible lib: jar package logs: some log files modules: some modules that you can't delete. Plugins are places where plugins are placed, such as third party word splittersCopy the code
2, install ElasticSearch-head plugin
Install ElasticSearch-head plugin
Unzip the head plugin, go to elasticSearch-head, and run the following command to start:
NPM install NPM run start // Start the pluginCopy the code
After the startup is successful, visit http://localhost:9100
Resolve cross-domain problems
Update Elasticsearch config/elasticsearch.yml
http.cors.enabled: true http.cors.allow-origin: “*”
3. Install Kibana
Download Kibana
Kibana is an open source analysis and visualization platform for Elasticsearch. You can use Kibana to search, view and interact with data stored in the Elasticsearch index. You can easily implement advanced data analysis and visualization in the form of charts.
1) unzip Kibana
② Go to kibana/bin and run Kibana.bat
For internationalization, enter config/kibana.yml, add i18n.locale: “zh-cn” to the end of the text, and change the Kibana interface to Chinese.
4, installation,ikChinese word divider
Download the elasticsearch – analysis – ik
Directly into Elasticsearch/Elasticsearch – 7.12.1 / plugins/bottom line (first built a ik under plugins folder).
This can be tested in the Kibana console:
- Ik_smart: minimum sharding
- Ik_max_word: finer-grained shard
GET _analyze
{
"analyzer": "ik_smart"."text": ["I'm a good student."]
}
GET _analyze
{
"analyzer": "ik_max_word"."text": ["I'm a good student."]}Copy the code
You can go to plugins and write your own dictionary my.dic. Separate multiple dictionaries with a semicolon.
Three, basic operation
Rest Style Description
A software architectural style, rather than a standard, provides a set of design principles and constraints. It is mainly used for client and server interaction class software. Software designed in this style can be simpler. More hierarchical, easier to implement mechanisms such as caching. Basic Rest command description :(deprecated types from es7, so you can no longer write type names in urls, or write _doc)
method | The url address | describe |
---|---|---|
PUT | Localhost: 9200 / index | name/type/document id | Create document (specify document ID) |
POST | Localhost: 9200 / index | name/type name | Create document (random document ID) |
POST | Localhost: 9200 / index | name/type/document id / _update | Modify the document |
DELETE | Localhost: 9200 / index | name/type/document id | Delete the document |
GET | Localhost: 9200 / index | name/type/document id | Query documents by document ID |
POST | Localhost: 9200 / index _search | name/type name | Query all data |
The basic concept
The Node and Cluster
Elastic is essentially a distributed database that allows multiple servers to work together and each server can run multiple Elastic instances.
A single Elastic instance is called a node. A group of nodes forms a cluster.
Index
The Elastic indexes all the fields, and after processing, writes a Inverted Index. When looking for data, look up the index directly.
So the top-level unit of Elastic data management is called an Index. It is a synonym for a single database. The name of each Index (that is, database) must be lowercase.
The following command displays all indexes of the current node.
GET _cat/indices? vCopy the code
Document
The single record inside Index is called a Document. A number of documents form an Index.
Document is represented in JSON format, and here is an example.
{
"user": "Zhang"."title": "Engineer"."desc": "Database Management"
}
Copy the code
Documents in the same Index are not required to have the same structure (scheme), but it is better to keep the same, so as to improve the search efficiency.
Type
Document can be grouped, for example, in the weather Index, it can be grouped by city (Beijing and Shanghai), or by climate (sunny and rainy days). This grouping is called Type, which is a virtual logical grouping used to filter documents.
Different types should have similar schemas. For example, an ID field cannot be a string in one group and a number in another. This is a difference from tables in a relational database. Data of completely different natures (such as products and logs) should be stored as two indexes instead of two Types in one Index (although that is possible).
The following command lists the types contained in each Index.
GET _mapping? pretty=true
Copy the code
As planned, Elastic 6.x will only allow one Type per Index and will remove Type entirely.
Operations on indexes
Create indexes
PUT /test1/_doc/1
{
"name": "Wang"."age": 18
}
Copy the code
_doc is the default type and the type is inferred automatically.
Data type:
- The value can be text, keyword
- Value types: Long, INTEGER, short, byte, double, float, half float, scaled Float
- Date type: date
- Boolean type: Boolean
- Binary: binary
- , etc.
Create rules
PUT /test2
{
"mappings": {
"properties": {
"name": {
"type": "text"
},
"age": {
"type": "long"}}}}Copy the code
To get the index, run the _cat command
GET _cat/plugins
Copy the code
Modify the index
1. Method 1:
PUT /test1/_doc/1
{
"name": "Wang"."age": 19
}
Copy the code
Change a value directly, but if you miss a value, it will be null.
2, method 2 :(do not add _update, other attributes will be null)
POST /test1/_doc/1/_update
{
"doc": {"age": 20}}Copy the code
Operations on documents
Return all records
Using the GET method, request /Index/Type/_search directly and all records will be returned.
GET accounts/person/_search
{
"took": 2."timed_out":false."_shards": {"total": 5,"successful": 5,"failed": 0}."hits": {"total": 2."max_score": 1.0."hits":[
{
"_index":"accounts"."_type":"person"."_id":"AV3qGfrC6jMbsbXb6k1p"."_score": 1.0."_source": {
"user": "Bill"."title": "Engineer"."desc": "System Management"}}, {"_index":"accounts"."_type":"person"."_id":"1"."_score": 1.0."_source": {
"user" : "Zhang"."title" : "Engineer"."desc" : "Database management, software development"}}]}}Copy the code
In the above code, the took field of the returned result indicates the operation time (in milliseconds), the timed_OUT field indicates whether the operation timed out, and the hits field indicates the hit record. The meanings of the inside field are as follows.
total
: Number of returned records, 2 in this example.max_score
: The highest degree of matching, in this example1.0
.hits
: An array of returned records.
Each of the returned records has a _score field, which indicates the matching program, and the default is in descending order by this field.
Full-text search
More and moreThe query syntax, see the website
Elastic’s queries are unique in that they use their own query syntax and require GET requests with data bodies.
GET accounts/person/_search
{
"query" : { "match" : { "desc" : "Software"}}}Copy the code
The above code uses a Match query that specifies a Match condition for the desc field containing the word “software”. The result is as follows.
{
"took":3."timed_out":false."_shards": {"total":5."successful":5."failed":0},
"hits": {"total":1."max_score":0.28582606."hits":[
{
"_index":"accounts"."_type":"person"."_id":"1"."_score":0.28582606."_source": {
"user" : "Zhang"."title" : "Engineer"."desc" : "Database management, software development"}}]}}Copy the code
By default Elastic returns 10 results at a time, which can be changed with the size field.
GET accounts/person/_search
{
"query" : { "match" : { "desc" : "Management" }},
"size": 1}Copy the code
The code above specifies that only one result is returned at a time.
You can also specify the displacement via the FROM field.
GET accounts/person/_search
{
"query" : { "match" : { "desc" : "Management" }},
"from": 1,
"size": 1}Copy the code
The code above specifies that starting at position 1 (default starting at position 0), only one result is returned.
Logical operations
If there are multiple search keywords, Elastic considers them to be AN OR relationship.
GET accounts/person/_search
{
"query" : { "match" : { "desc" : "Software System"}}}Copy the code
The code above searches for software or systems.
If you want to perform an AND search for multiple keywords, you must use a Boolean query.
GET accounts/person/_search
{
"query": {
"bool": {
"must": [{"match": { "desc": "Software"}}, {"match": { "desc": "System"}}]}}}Copy the code
More and moreThe query syntax, see the website