The main contents of this paper are as follows:
preface
In the project, we always use the Kibana interface to search the logs in the test or production environment to see if there is any abnormal information. Kibana is the K in ELK.
The Interface of Kibana is as shown below:
But how does this log retrieval work? This is where Elasticsearch comes in.
I’m going to break it down into three parts on how Elasticsearch works, how it works, and how it’s deployed. This is the first part of the introduction of ES.
- How to configure Elasticsearch (ES)
- Middle part: Actual ES application.
- Next: Cluster deployment of ES.
The reason why we split it into three chapters is because each one is very long and has a different focus, so we split it into three chapters.
Introduction to Elasticsearch
1.1 What is Elasticsearch?
Elasticsearch is a distributed open source search and analysis engine for all types of data, including text, digital, geospatial, structured, and unstructured data. In short, ES can do anything related to search and analysis.
1.2 What is Elasticsearch used for?
Elasticsearch performs well in terms of speed and extensibility, and is able to index multiple types of content, which means it can be used for multiple use cases:
- For example, an online store where you can allow customers to search for the products you sell. In this case, you can use Elasticsearch to store the entire product catalog and inventory and provide search and auto-completion suggestions for them.
- Such as collecting log or transaction data and analyzing and mining this data to find trends, statistics, summaries, or exceptions. In this case, you can use Logstash (part of the Elasticsearch/Logstash/Kibana stack) to collect, aggregate, and parse data, and then have Logstash feed this data to Elasticsearch. Once the data is in Elasticsearch, you can run searches and aggregations to mine any information that interests you.
1.3 How Elasticsearch Works?
Elasticsearch is built on top of Lucene. ES has made a lot of enhancements to Lucence.
Lucene is a sub-project of the Apache Software Foundation 4’s Jakarta Project. Lucene is an open source full-text search engine toolkit, but it is not a full text search engine, but a full-text search engine architecture, providing a full query engine and indexing engine. Partial text analysis engine (In Western languages English and German). The purpose of Lucene is to provide software developers with an easy-to-use toolkit to easily implement full-text search in target systems, or to build a full full-text search engine on top of it. (From Baidu Baike)
Where does the original data for Elasticsearch come from?
Raw data is entered into Elasticsearch from multiple sources, including logs, system metrics, and web applications.
How does Elasticsearch collect data?
Data collection is the process of parsing, standardizing, and fleshing out this raw data before indexing it in Elasticsearch. Once this data is indexed in Elasticsearch, users can run complex queries against their data and use aggregations to retrieve complex summaries of their own data. This is where we use Logstash, which I’ll talk about later.
How do I visually view the data I want to retrieve?
This is where Kibana comes in, allowing users to search and view data based on their own data.
1.4 What is the index of Elasticsearch?
An Elasticsearch index is a collection of documents that are associated with each other. Elasticsearch stores data as JSON documents. Each document makes a connection between a set of keys (the names of fields or properties) and their corresponding values (strings, numbers, booleans, dates, numerical groups, geographic locations, or other types of data).
Elasticsearch uses a data structure called an inverted index, which is designed to allow very fast full-text searches. The inverted index lists each unique term that appears in all documents, and all documents containing each term can be found.
During indexing, Elasticsearch stores documents and builds inverted indexes so that users can search through document data in near real time. The indexing process starts in the indexing API, which allows you to either add JSON documents to or change JSON documents in a specific index.
1.5 What is Logstash used for?
Logstash is just the L in ELK.
Logstash is one of the core products of Elastic Stack. It is used to aggregate and process data and send it to Elasticsearch. Logstash is an open source, server-side data processing pipeline that allows you to gather data from multiple sources simultaneously, augment and transform it before indexing it to Elasticsearch.
1.6 What is Kibana used for?
Kibana is a data visualization and management tool for Elasticsearch that provides real-time histograms, linear graphs, and more.
1.7 Why Do I Use Elasticsearch
- ES is a fast, near real-time search platform.
- ES has a distributed nature.
- ES includes a wide range of functions, such as data aggregation and index lifecycle management.
Official document: www.elastic.co/cn/what-is/…
Ii. Basic concepts of ES
2.1 Index
Verb: equivalent to insert in Mysql
Noun: equivalent to database in Mysql
Comparison with mysql
The serial number | Mysql | Elasticsearch |
---|---|---|
1 | The Mysql service | ES Cluster Service |
2 | Database Database | The Index Index |
3 | Table Table | Type the Type |
4 | Line by line | Document (JSON format) |
2.2 Inverted index
Suppose the database has the following movie records:
1- Odyssey to the West
2- Travel abroad
3- Analyzing the Odyssey to the West
4- Journey to the West
5- Exclusive analysis of Fantasy Westward Journey
Participle: To break an entire sentence into words
The serial number | Save to the word ES | The serial number of the corresponding film record |
---|---|---|
A | Westward journey | 1, 2, 3, 4, 5 |
B | Big words | 1, 2, 3 |
C | gaiden | 2, 4, 5 |
D | parsing | 3, 5 |
E | Drop the magic | 4 |
F | dream | 5 |
G | exclusive | 5 |
Search: Exclusive Chinese Odyssey
The exclusive bragging-westward journey analysis is divided into exclusive, bragging-westward journey
Records A, B, and G in ES all have one of these three words, so records 1,2, 3,4, and 5 all have related words hit.
Record 1 has 2 hits, both in A and B (2 hits), and record 1 has 2 words, correlation score: 2 hits /2 words =1
Record no. 2 hits both word A and word B (twice), and record No. 2 has two words, relevancy score: 2 times /3 words = 0.67
Record no. 3 hits both word A and word B (twice), and record No. 3 has two words, relevancy score: 2 times /3 words = 0.67
Record 4 hits 2 of word A (1 hit), and record 4 has 2 words, correlation score: 1 hit /3 words = 0.33
Record no.5 hits 2 of word A (2 hits), and record No. 4 has 4 words, relevancy score: 2 hits /4 words = 0.5
So the retrieved records are in the following order:
1- A Journey to the West (Score: 1)
2- A Journey to the West (Score: 0.67)
3- A Journey to the West (Score: 0.67)
5- Fantasy Westward Journey Exclusive Analysis (Evasiveness score: 0.5)
4- Conquering demons in journey to the West (Score: 0.33)
3. Docker builds the environment
3.1. Set up Elasticsearch environment
To set up the virtual machine environment and install Docker, you can refer to the documents written before:
- 01. Quickly Setting up a Linux Environment – Essential for O&M
- 02. Configure the VM network
- 3. Install the Docker
1) Download the image file
Docker pull elasticsearch: 7.4.2Copy the code
2) Create an instance
-
- Mapping Configuration file
Configuration map folder mkdir -p/mydata/elasticsearch/config configuration map folder mkdir -p/mydata/elasticsearch/data set any user can read but write chmod 777 folder permissions /mydata/ ElasticSearch -r Configures http.hostecho "HTTP. Host: 0.0.0.0." " >> /mydata/elasticsearch/config/elasticsearch.yml
Copy the code
-
- Start the ElasticSearch container
docker run --name elasticsearch -p 9200:9200 -p 9300:9300 \
-e "discovery.type"="single-node" \
-e ES_JAVA_OPTS="-Xms64m -Xmx128m"\ -v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \ -v /mydata/elasticsearch/data:/usr/share/elasticsearch/data \ -v / mydata/elasticsearch/plugins: / usr/share/elasticsearch/plugins \ - d elasticsearch: 7.4.2Copy the code
-
- Access the ElasticSearch service
Visit: http://192.168.56.10:9200
Returns the reponse
{
"name" : "8448ec5f3312"."cluster_name" : "elasticsearch"."cluster_uuid" : "xC72O3nKSjWavYZ-EPt9Gw"."version" : {
"number" : "7.4.2"."build_flavor" : "default"."build_type" : "docker"."build_hash" : "2f90bbf7b93631e52bafb59b3b049cb44ec25e96"."build_date" : "The 2019-10-28 T20:40:44. 881551 z"."build_snapshot" : false."lucene_version" : "8.2.0"."minimum_wire_compatibility_version" : "6.8.0"."minimum_index_compatibility_version" : "6.0.0 - beta1"
},
"tagline" : "You Know, for Search"
}
Copy the code
Visit: http://192.168.56.10:9200/_cat to access the node information
127.0.0.1 62 90 0 0.06 0.10 0.05 dilm * 8448ec5f3312
Copy the code
3.2. Build Kibana environment
Docker pull kibana: 7.4.2 docker run - name kibana ELASTICSEARCH_HOSTS = http://192.168.56.10:9200 - p - e 5601:5601 - d Kibana: 7.4.2Copy the code
Visit kibana: http://192.168.56.10:5601/
Fourth, the initial level of retrieval gameplay
4.1. _cat usage
GET /_cat/nodes: Displays all nodes. GET /_cat/ Health: displays the HEALTH status of es. GET /_cat/master: displays the primary node. View a summary of all index queries: /_cat/allocation /_cat/shards /_cat/shards/{index} /_cat/master /_cat/nodes /_cat/tasks /_cat/indices /_cat/indices/{index} /_cat/segments /_cat/segments/{index} /_cat/count /_cat/count/{index} /_cat/recovery /_cat/recovery/{index} /_cat/health /_cat/pending_tasks /_cat/aliases /_cat/aliases/{alias}
/_cat/thread_pool
/_cat/thread_pool/{thread_pools}
/_cat/plugins
/_cat/fielddata
/_cat/fielddata/{fields}
/_cat/nodeattrs
/_cat/repositories
/_cat/snapshots/{repository}
/_cat/templates
Copy the code
4.2. Indexing a Document (Save)
Example: Store data identified as 1 under the external type under the customer index.
- Use Dev Tools for Kibana
PUT member/external/1
{
"name":"jay huang"
}
Copy the code
Reponse:
{
"_index": "member".// In which index
"_type": "external".// In that type
"_id": "2"./ / record id
"_version": 7./ / version number
"result": "updated".// Operation type
"_shards": {
"total": 2."successful": 1."failed": 0
},
"_seq_no": 9."_primary_term": 1
}
Copy the code
- Records can also be created by sending requests through the Postman tool.
Note:
Both PUT and POST can create records.
POST: If no ID is specified, an ID is automatically generated. If the ID is specified, the record is modified and the version number is added.
PUT: The id must be specified. If the record does not exist, it is added. If it does exist, it is updated.
4.3 Querying documents
Request: HTTP:/ / 192.168.56.10:9200 / member/external / 2
Reposne:
{
"_index": "member".// In which index
"_type": "external".// In that type
"_id": "2"./ / record id
"_version": 7./ / version number
"_seq_no": 9.// The concurrency control field, which is +1 for each update, is used for optimistic locking
"_primary_term": 1.// As above, the primary sharding is reallocated. If restart, this will change
"found": true."_source": { // The real content
"name": "jay huang"}}Copy the code
_seq_no is used as an optimistic lock
After each update, _seq_no is +1, so it can be used as concurrency control.
If _seq_no is different from the preset value, the record has been updated at least once and the update is not allowed.
The usage is as follows:
Request update record2: http:/ / 192.168.56.10:9200 / member/external / 2? if_seq_no=9&&if_primary_term=1Return result: {"_index": "member"."_type": "external"."_id": "2"."_version": 9."result": "updated"."_shards": {
"total": 2."successful": 1."failed": 0
},
"_seq_no": 11."_primary_term": 1
}
Copy the code
If _seq_no is equal to 10 and _primary_term is equal to 1, update the data
{
"error": {
"root_cause": [{"type": "version_conflict_engine_exception"."reason": "[2]: version conflict, required seqNo [10], primary term [1]. current document has seqNo [11] and primary term [1]"."index_uuid": "CX6uwPBKRByWpuym9rMuxQ"."shard": "0"."index": "member"}]."type": "version_conflict_engine_exception"."reason": "[2]: version conflict, required seqNo [10], primary term [1]. current document has seqNo [11] and primary term [1]"."index_uuid": "CX6uwPBKRByWpuym9rMuxQ"."shard": "0"."index": "member"
},
"status": 409
}
Copy the code
4.4 Updating documents
- usage
POST the update operation with _update. If the original data does not change, result in repsonse returns noOP (without any operation) and version does not change.
The request body needs to wrap the request data in doc.
POST request: http://192.168.56.10:9200/member/external/2/_update {" doc ": {" name" : "jay huang"}} response: {" _index ": "member", "_type": "external", "_id": "2", "_version": 12, "result": "noop", "_shards": { "total": 0, "successful": 0, "failed": 0 }, "_seq_no": 14, "_primary_term": 1 }Copy the code
Application scenario: For a large number of concurrent updates, you are advised not to include _update. For large concurrent queries with a small number of updates, _update can be used for comparison updates.
- Add attributes when updating
Added the age attribute to the request body
http:/ / 192.168.56.10:9200 / member/external / 2 / _update
request:
{
"doc": {"name":"jay huang"."age": 18
}
}
response:
{
"_index": "member"."_type": "external"."_id": "2"."_version": 13."result": "updated"."_shards": {
"total": 2."successful": 1."failed": 0
},
"_seq_no": 15."_primary_term": 1
}
Copy the code
4.5 Deleting Documents and Indexes
- Delete the document
DELETE request: HTTP:/ / 192.168.56.10:9200 / member/external / 2
response:
{
"_index": "member"."_type": "external"."_id": "2"."_version": 2."result": "deleted"."_shards": {
"total": 2."successful": 1."failed": 0
},
"_seq_no": 1."_primary_term": 1
}
Copy the code
- Remove the index
DELETE request: HTTP:/ / 192.168.56.10:9200 / member
repsonse:
{
"acknowledged": true
}
Copy the code
- There is no ability to delete types
4.6 Importing Data in Batches
Using kinaba’s dev Tools, enter the following statement
POST /member/external/_bulk
{"index": {"_id":"1"}}
{"name":"Jay Huang"}
{"index": {"_id":"2"}}
{"name":"Jackson Huang"}
Copy the code
The execution result is as shown in the figure below:
- Copy official sample data
https://raw.githubusercontent.com/elastic/elasticsearch/master/docs/src/test/resources/accounts.json
Copy the code
- Execute scripts in Kibana
POST /bank/account/_bulk
{"index": {"_id":"1"}}
{"account_number":1."balance":39225."firstname":"Amber"."lastname":"Duke"."age":32."gender":"M"."address":"880 Holmes Lane"."employer":"Pyrami"."email":"[email protected]"."city":"Brogan"."state":"IL"}
{"index": {"_id":"6"}}...Copy the code
- View all indexes
You can see from the returned results that the bank index has 1000 pieces of data, occupying 440.2 KB of storage space.
Five, high-level retrieval gameplay
5.1 Two Query Methods
5.1.1 URL is followed by parameters
GET bank/_search? q=*&sort=account_number: ascCopy the code
“`/_search? q=*&sort=account_number: asc`
Query all data, a total of 1000 pieces of data, 1ms, only 10 pieces of data are displayed (ES page)
Attribute value description:
Took -- ES Time to perform the search (milliseconds) Timed_out -- whether ES timed out _shards -- How many shards were searched, Max_score -- highest score hits.total.value -- how many records were hit hits.sort -- the key key for sorting the results Hits._score - relevance score reference document:https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started-search.html
Copy the code
5.1.2 URL plus Request Body (QueryDSL)
The query conditions are written in the request body
Grammar:
GET bank/_search
{
"query": {"match_all": {}},
"sort": [{"account_number": "asc"}}]Copy the code
For example, to query all items, sort them in ascending order by accOUT_number and then descending order by balance
5.2 Detailed QueryDSL Query
DSL: Domain Specific Language
5.2.1 Matching all matches match_all
Example: Query all records in descending order of balance. Only the 11th record to the 20th record is returned. Only the balance and firstName fields are displayed.
GET bank/_search
{
"query": {
"match_all": {}},"sort": [{"balance": {
"order": "desc"}}]."from": 10."size": 10."_source": ["balance"."firstname"]}Copy the code
5.2.2 Matching Querying match
- Basic type (not string), exact match
GET bank/_search
{
"query": {
"match": {"account_number": "30"}}}Copy the code
- String, full text search
GET bank/_search
{
"query": {
"match": {
"address": "mill road"}}}Copy the code
The full-text search is sorted according to the score, and the search conditions are matched by word segmentation.
Query all records in address that contain Either Mill or Road or Mill Road and give a correlation score.
Address=” 990 Mill Road” score: 8.926605. Address=”198 Mill Lane” score: 5.4032025.
5.2.3 Phrase matching match_phase
The value to be matched is retrieved as a whole word (non-word)
GET bank/_search
{
"query": {
"match_phrase": {
"address": "mill road"}}}Copy the code
Find all records that contain Mill Road in the address and give a correlation score
5.2.4 Multifield match multi_match
GET bank/_search
{
"query": {
"multi_match": {
"query": "mill land"."fields": [
"state"."address"]}}}Copy the code
Query in multi_match is also segmented.
Query records where state contains mill or land or address contains mill or land.
5.2.5 Compound Query bool
Compound statements can merge any other query statement, including compound statements. Compound statements can be nested within each other and can express complex logic.
Use must,must_not,should
Must: The conditions specified by must must be met. (Impact correlation score)
Must_not: The condition of MUST_NOT must not be met. (Does not affect the correlation score)
Should: If the condition “should” is met, the score can be increased. If not, you can query records. (Impact correlation score)
Example: Query records whose address contains Mill, whose gender is M, and whose age is different from 28, and whose firstname contains Winnie.
GET bank/_search
{
"query": {
"bool": {
"must": [{"match": {
"address": "mill"}}, {"match": {
"gender": "M"}}]."must_not": [{"match": {
"age": "28"}}]."should": [{"match": {
"firstname": "Winnie"}}}Copy the code
5.2.6 filter filter
The correlation score is not affected, and the records that meet the filter conditions are queried.
Used in a bool.
GET bank/_search
{
"query": {
"bool": {
"filter": [{"range": {
"age": {
"gte":18."lte":40}}}}Copy the code
5.2.7 term query
Matches the value of an attribute.
Match is used for full-text retrieval fields, and term is used for other non-text fields
Keyword: Text exact match (all matches)
Match_phase: text phrase matching
Non-text fields exactly match GET bank/_search {"query": {
"term": {
"age": "20"}}}Copy the code
5.2.8 aggregations polymerization
Aggregation: Grouping and extracting data from data. Similar to SQL GROUP BY and SQL aggregate functions.
Elasticsearch can return a hit result and multiple aggregate results at the same time.
Aggregate syntax:
"aggregations" : {
"< aggregate name 1>" : {
"< aggregate type >": {< aggregate content >} [,"Metadata" : { [<meta_data_body>] }]?
[,"aggregations" : { [<sub_aggregation>]+ }]?
}
[,Aggregate name 2>: {... *}}]Copy the code
- Example 1: Search for the age distribution (top 10) of all the people in address with BIG and the average age, as well as the average salary
GET bank/_search
{
"query": {
"match": {
"address": "mill"}},"aggs": {
"ageAggr": {
"terms": {
"field": "age"."size": 10}},"ageAvg": {
"avg": {
"field": "age"}},"balanceAvg": {
"avg": {
"field": "balance"}}}}Copy the code
The search results are as follows:
Hits records were returned, and the three aggregated results were returned, with an average age of 34 and an average salary of 25208.0. Pinjun age distribution: there were two 38-year-olds, one 28-year-olds and one 32-year-olds
If you don’t want to return hits, you can set size:0 at the end
GET bank/_search
{
"query": {
"match": {
"address": "mill"}},"aggs": {
"ageAggr": {
"terms": {
"field": "age"."size": 10}}},"size": 0
}
Copy the code
- Example 2: Aggregate by age and query the average salary for these age groups
It can be seen from the results that 61 people aged 31 have an average salary of 28,312.9, and the aggregate results of other ages are similar.
- Example 3: Group by age, then group the results by gender, and then query the average salary for those groups
GET bank/_search
{
"query": {
"match_all": {}},"aggs": {
"ageAggr": {
"terms": {
"field": "age"."size": 10
},
"aggs": {
"genderAggr": {
"terms": {
"field": "gender.keyword"."size": 10
},
"aggs": {
"balanceAvg": {
"avg": {
"field": "balance"}}}}}}},"size": 0
}
Copy the code
You can see from the results that there are 61 31-year-olds. Among them, 35 were M with an average salary of 29565.6, and 26 were F with an average salary of 26626.6. Aggregations for other ages were similar.
5.2.9 Mapping Mapping
Mapping is used to define a document and how the fields it contains are stored and indexed.
- Defines which string properties should be treated as full text fields
- Define which attributes contain numbers, dates, or geographic locations
- Defines whether all attributes in a document can be indexed (_all configuration)
- Date format
- Custom mapping rules to perform dynamic adding properties
Remove tpye concept for Elasticsearch7:
In a relational database, the two database representations are independent, even if they have columns with the same name. This is not the case in ES. Elasticsearch is a search engine based on Lucence, and fields with the same name in ES are treated the same way in Lucence.
In order to distinguish fields with the same name under different types, Lucence needs to deal with conflicts, which reduces the retrieval efficiency
Es7. x version: The type parameter in the URL is optional.
Es8.x version: The type parameter in the URL is not supported
All types can be reference documentation: www.elastic.co/guide/en/el…
- Query the mapping of the index
For example, query the mapping of my-index
GET /my-index/_mapping"my-index" : {
"mappings" : {
"properties" : {
"age" : {
"type" : "integer"
},
"email" : {
"type" : "keyword"
},
"employee-id" : {
"type" : "keyword"."index" : false
},
"name" : {
"type" : "text"
}
}
}
}
}
Copy the code
- Create the index and specify the mapping
For example, to create a my-index index, use the age,email, and name fields, and specify the types as interge, keyword, and text
PUT /my-index
{
"mappings": {
"properties": {
"age": { "type": "integer" },
"email": { "type": "keyword" },
"name": { "type": "text" }
}
}
返回结果:
{
"acknowledged" : true."shards_acknowledged" : true."index" : "my-index"
}
Copy the code
- Add a new field mapping
For example, add a field in the my-index to employ-id and specify the type keyword
PUT /my-index/_mapping
{
"properties": {
"employee-id": {
"type": "keyword"."index": false}}}Copy the code
- Update the map
We cannot update existing mapping fields and must create new indexes for data migration.
- Data migration
POST _reindex
{
"source": {
"index": "twitter"
},
"dest": {
"index": "new_twitter"}}Copy the code
6. Chinese word segmentation
ES has a variety of built-in word segmentation, but it is not friendly to Chinese word segmentation, so we need to use a third-party Chinese word segmentation toolkit.
6.1 The principle of participles in ES
6.1.1 The concept of ES participle
A tokenizer in ES takes a stream of characters, splits it into individual tokens, and then outputs the stream of tokens.
ES provides a number of built-in word dividers that you can use to build custom ananlyzers.
6.1.2 Principle of standard word segmentation
The Stadard Tokenizer, for example, is a standard word participle that splits words when it encounters Spaces. The word segmentation device is also responsible for recording the order or position of each term (used in the phrase phrase and word proximity word query). The character offset for each word (used to highlight what is being searched).
6.1.3 Examples of English and punctuation word segmentation
The following is an example:
POST _analyze
{
"analyzer": "standard"."text": "Do you know why I want to study ELK? 2, 3, 33..."
}
Copy the code
Query result:
do, you, know, why, i, want, to, study, elk, 2,3,33
Copy the code
From the query results, we can see:
(1) Punctuation has no participles.
(2) The number will be participled.
6.1.4 Examples of Chinese word segmentation
However, this kind of word segmentation is not friendly to The Chinese word segmentation support, will be the word segmentation into individual Characters. For example, in the following example, the wukong chat structure participle will be wu, empty, chat, frame, construct, and the expectation participle will be Wukong, chat, structure.
POST _analyze
{
"analyzer": "standard"."text": "Wukong Chat Structure"
}
Copy the code
We can install ik word segmentation to support more friendly Chinese word segmentation.
6.2 Installing an IK participle
6.2.1 IK participle address
Ik participle address:
https://github.com/medcl/elasticsearch-analysis-ik/releases
Copy the code
Check the ES version first. The version I installed is 7.4.2, so let’s install the IK participle 7.4.2 as well
http:/ / 192.168.56.10:9200 /
{
"name" : "8448ec5f3312"."cluster_name" : "elasticsearch"."cluster_uuid" : "xC72O3nKSjWavYZ-EPt9Gw"."version" : {
"number" : "7.4.2"."build_flavor" : "default"."build_type" : "docker"."build_hash" : "2f90bbf7b93631e52bafb59b3b049cb44ec25e96"."build_date" : "The 2019-10-28 T20:40:44. 881551 z"."build_snapshot" : false."lucene_version" : "8.2.0"."minimum_wire_compatibility_version" : "6.8.0"."minimum_index_compatibility_version" : "6.0.0 - beta1"
},
"tagline" : "You Know, for Search"
}
Copy the code
6.2.2 The way to install the IK participle
6.2.2.1 Method 1: Install an IK participle in the container
- Enter the plugins directory inside the ES container
docker exec-it < container ID > /bin/bashCopy the code
- Get the IK segmented package
Wget HTTP: / / https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.4.2/elasticsearch-analysis-ik-7.4.2.zipCopy the code
- Decompress the IK compression package
Unzip zipCopy the code
- Delete the downloaded compression package
rm -rf *.zip
Copy the code
6.2.2.2 Method 2: Map files to install the IK participle
Go to the mapping folder
cd /mydata/elasticsearch/plugins
Copy the code
Download the Installation Package
Wget HTTP: / / https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.4.2/elasticsearch-analysis-ik-7.4.2.zipCopy the code
- Decompress the IK compression package
Unzip zipCopy the code
- Delete the downloaded compression package
rm -rf *.zip
Copy the code
6.2.2.3 Method 3: Xftp Upload the compressed package to the mapping directory
Use the XShell tool to connect to the VIRTUAL machine (for details, see the previous article [02. Setting up a Linux Environment – O&M Prerequisite] (www.jayh.club/#/05. Deployment /01. Environment Constructs), and then copy the downloaded installation package to the VM using Xftp.
6.3 Decompress the IK participle into the container
- If the unZIP tool is not installed, install it.
apt install unzip
Copy the code
- Unzip the IK participle to the ik folder in the current directory.
Command format: unzip <ik segment_package >
Example:
Unzip ELK - IKv7.4.2.zip - d. / ikCopy the code
- Change the folder permission to read and write.
chmod -R 777 ik/
Copy the code
- Delete ik segmented package
The rm ELK - IKv7.4.2. ZipCopy the code
6.4 Check the ik participle installation
- It goes into the container
docker exec-it < container ID > /bin/bashCopy the code
- View the plugin for Elasticsearch
elasticsearch-plugin list
Copy the code
The following results indicate that the IK participle is installed. Isn’t that simple.
ik
Copy the code
Then exit the Elasticsearch container and restart it
exit
docker restart elasticsearch
Copy the code
6.5 Use ik Chinese word segmentation
The IK participle has two modes
-
Smart word segmentation mode (ik_smart)
-
Maximum combined word segmentation mode (ik_max_word)
Let’s take a look at the effect of intelligent word segmentation. For example, for a small star, the Chinese word segmentation, get two words: one, little star
Enter the following query on the Dev Tools Console
POST _analyze
{
"analyzer": "ik_smart"."text": "A little star."
}
Copy the code
We get the following result, which is participled as one and little star.
Now let’s look at the maximum combined word segmentation pattern. Enter the following query statement.
POST _analyze
{
"analyzer": "ik_max_word"."text": "A little star."
}
Copy the code
A little star is divided into six words: one, one, one, little star, little star, star.
Let’s look at another Chinese participle. For example, search wukong brother chat structure, expect results: Wukong brother, chat, structure three words.
Actual results: wu, kongge, chat, structure four words. Ik word participle will be wukong brother participle, think kongge is a word. So you need to let ik word segmentation machine know that Wukong is a word, do not need to split. So how do you do that?
6.5 Custom participle thesaurus
6.5.1 Scheme for customizing thesaurus
- plan
Create a new lexicon file and specify the path to the lexicon file in the ik lexicon configuration file. You can specify a local path or a remote server file path. Here we use the remote server file scheme because it supports hot updates (updating the server file and reloading the IK participle thesaurus).
- Modifying a Configuration File
The path to the ik participle configuration file in the container:
The/usr/share/elasticsearch/plugins/ik/config/IKAnalyzer CFG. XML.Copy the code
This file can be modified by changing the mapping file, file path:
/mydata/elasticsearch/plugins/ik/config/IKAnalyzer.cfg.xml
Copy the code
Edit the configuration file:
vim /mydata/elasticsearch/plugins/ik/config/IKAnalyzer.cfg.xml
Copy the code
The contents of the configuration file are as follows:
<! DOCTYPEproperties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>IK Analyzer extension configuration</comment>
<! -- Users can configure their own extension dictionary here -->
<entry key="ext_dict">custom/mydict.dic; custom/single_word_low_freq.dic</entry>
<! -- Users can configure their own extended stop word dictionary here -->
<entry key="ext_stopwords">custom/ext_stopword.dic</entry>
<! -- This is where the user can configure the remote extension dictionary -->
<entry key="remote_ext_dict">location</entry>
<! -- User can configure remote extension stop word dictionary here -->
<entry key="remote_ext_stopwords">http://xxx.com/xxx.dic</entry>
</properties>
Copy the code
Modify the remote_ext_dict attribute to specify a remote web site file path, such as www.xxx.com/ikwords.tex…
Here we can build our own nginx environment and place ikwords.text in the nginx root directory.
6.5.2 Setting up the NGINX Environment
Then copy the nginx configuration file to the root directory, delete the original nginx container, and then restart the nginx container using the mapping folder.
- Install the Nginx environment from the Docker container.
Docker run -p 80:80 --name nginx -d nginx:1.10Copy the code
- Copy the configuration files of the nginx container to the conf folder of the myData directory
cd /mydata
docker container cp nginx:/etc/nginx ./conf
Copy the code
- Create the nginx directory inside the myData directory
mkdir nginx
Copy the code
- Move the conf folder to the nginx mapping folder
mv conf nginx/
Copy the code
- Terminate and delete the original Nginx container
Docker stop nginx docker rm < container ID >Copy the code
- Starting a new container
docker run -p 80:80 --name nginx \
-v /mydata/nginx/html:/usr/share/nginx/html \
-v /mydata/nginx/logs:/var/log/ nginx \ - v/mydata/nginx/conf: / etc/nginx \ - d nginx: 1.10Copy the code
- Access the Nginx service
192.168.56.10
Copy the code
If 403 Forbidden is reported, nginx/1.10.3 indicates that the Nginx service is started properly. The reason for the 403 exception is that there is no file under the Nginx service.
- The nginx directory creates a new HTML file
cd /mydata/nginx/html
vim index.html
hello passjava
Copy the code
- Access the Nginx service again
The browser prints Hello passJava. There is no problem accessing the nGINx service page.
- Create the IK participle thesaurus file
cd /mydata/nginx/html
mkdir ik
cd ik
vim ik.txt
Copy the code
Fill in the Monkey King and save the file.
- Access the thesaurus file
http://192.168.56.10/ik/ik.txt
Copy the code
The browser will output a string of garbled characters, which can be ignored for the first time. Note The thesaurus file can be accessed.
- Modify the IK participle configuration
cd /mydata/elasticsearch/plugins/ik/config
vim IKAnalyzer.cfg.xml
Copy the code
- Restart the ElasticSearch container and set it to start every time the machine restarts.
docker restart elasticsearch
docker update elasticsearch --restart=always
Copy the code
- Query the segmentation result again
It can be seen that the chat structure of Wukong Brother is divided into three words: Wukong Brother, chat and structure, indicating the usefulness of Wukong Brother in the custom lexicon.
Write at the end
The middle part and the next part continue liver, refueling!
- Middle part: Actual ES application.
- Next: Cluster deployment of ES.
Hello, I am Wukong Brother, “7 years of project development experience, full stack engineer, development team leader, love schematic programming underlying principles”.
I also handwritten 2 small program, Java brush small program, PMP brush small program, click on my public number menu to open! In addition, 111 architect documents and 1000 Java interview questions are available in PDF format. You can follow the public number “Wukong chat structure” reply to Wukong to get quality information.
“Retweet -> Look -> Like -> Favorites -> Comment!!” Is the biggest support for me!
I am Wukong brother, work hard to become stronger, turn super Saiya! See you next time!