First, basic knowledge

1, the source

  • Elasticsearch is a Restful distributed full-text search engine based on Lucene. Each field is indexed and searchable, and you can quickly store, search and analyze massive amounts of data.
  • Full-text search refers to the establishment of an index for each word, indicating the number and position of the word in the article. When the query, according to the pre-established index to search, and the search results back to the user’s retrieval method. This process is similar to looking up words through the search word table in a dictionary.

2. Basic concepts

  • (1) Cluster Cluster: unique identifier of ES — cluster.name
  • (2) Node: A physical concept. A machine is a Node, and a Cluster is composed of nodes.
  • (3) What is the index?An index is similar to a database in mysql. An index in Elasticesearch is a place where data is stored and contains a bunch of document data with similar structure.
    • Index name specification:
      • Must be lowercase
      • Cannot start with _, -, or +
      • Cannot contain Spaces, /, *, /, <, >, |,,, /,?
  • (4) Type:A type is used to define a data structure, which can be thought of as a table in mysql, and a type is a logical data category in index.
    • Version differences:
      • 5. X: An Index can have multiple types
      • 6. X: An Index can have only one Type
      • 7. X: Remove the concept of Type
  • (5) Document document: Similar to a row in MySQL, the difference is that each document in ES can have different fields, but for general fields, it should have the same data type. A document is the smallest data unit in ES, and it can be considered that a document is a record.
  • (6) The Field Field is the smallest unit of Elasticsearch. A document contains multiple fields.
  • (7) Shard sharding: a single machine cannot store a large amount of data. Es can divide the data in an index into multiple Shards and distribute them on multiple servers for storage. With the Shard, you can scale horizontally, store more data, distribute search and analysis operations across multiple servers, and improve throughput and performance.
  • (8) Replica Copy:Any server may fail or go down at any time and shard may be lost. Therefore, multiple replica copies can be created for each shard. Replica can provide backup service when shard fails to ensure data is not lost. Multiple replicas can improve the throughput and performance of search operations. By default, each index has 10 shards, 5 primary shards, 5 replica shards, minimum high availability configuration. It’s two servers.
    • Up the benefits:
      • Improve the fault tolerance
      • Improve query efficiency

3. Core data types

  • (1) String
    • Text: a text field that can be segmented and searched
    • Keyword: A field that does not know the word and needs to be accurately matched
  • (2) Number type
    • byte/short/integer/long/float/double/half_float/scaled_float
    • Date type – date
    • Boolean type – Boolean
    • Binary – binary
    • Range type – range
  • (3) Complex data types
    • Data type – array
    • Object type – object
    • Nested type – nested
  • (4) Geographical data types
    • Geo point type – Geo point
    • Geographic shape type – geo_shape
  • (5) Specialized data types
    • IP type
    • Count data type – token-count5

Forward vs. inverted indexes

  • (1) Forward indexing: from doc ID to content/word correlation
  • (2) Inverted index: from content/word to (doc, freq) correlation relation
  • In search engines, each document has a corresponding document ID, and the document content is represented as a set of keywords. For example, 20 keywords are extracted from a document after word segmentation, and each keyword will record the frequency and location of its occurrence in the document. An inverted index, then, is a mapping of keywords to document ids, each of which corresponds to a series of files in which that keyword appears. With inverted indexes, search engines can easily respond to user queries.
  • Note two important details about inverted indexes:
    • All terms in an inverted index correspond to one or more documents
    • Terms in an inverted index are arranged in ascending lexicographical order

5, DocValues

  • If we need to do some aggregation on the data, such as sorting/grouping, Lucene will internally iterate over all the sorted fields in the document collection, and then construct a final sorted document collection list again, all of this process is maintained in memory. And if the sort data volume is very large, it is very easy to cause SolR memory overflow and slow performance.
  • DocValues is es builds forward index while building inverted index and saves the mapping of docId to each field value, which can be regarded as the dimension of document, so as to realize the function of sorting and aggregation according to the specified field.
  • In addition, docValues are stored in the disk of the operating system. When docValues is larger than the available memory of the node, ES can be loaded or ejected from the page cache of the operating system to avoid the exception of memory overflow. DocValues are far smaller than the available memory of the node. The operating system naturally stores all Doc Values in memory (off-heap memory) for fast access.

6. Text VS keyword

  • The difference between the two is mainly the difference between the keyword type and the keyword type: the keyword type does not divide words, and the inverted index is directly established based on the content of the string. The keyword type can only be searched by the exact value. When storing Elasticsearch, the Text type will split the word and then create an inverted index based on the word.

7. What is a stop participle

  • In information retrieval, to save storage space and improve search efficiency, some Words or Words are automatically filtered out before or after processing natural language data (or text). These Words or Words are called Stop Words.
  • Pause words can be regarded as meaningless words, such as “of” and “and”, which do not need to be indexed.

Query VS Filter

  • (1) Query: The query operation not only performs the query, but also calculates the score to determine the relevance;
  • (2) Filter: The query operation only determines whether the query conditions are met, without calculating any score value or caring about the returned sorting problem. At the same time, the results of the filter query can be cached to improve performance.

9. Match VS term

  • (1) Match: fuzzy matching query
  • (2) term: full match query
  • Term stands for full match, that is, no word segmentation is performed, and the document must contain the entire search term
  • The difference between match and term is that the term query does not have an appropriate parser for the field you are looking for. Match is a fuzzy match query that contains only a few keywords.

Second, write process

1, processes,

  • (1) The client selects a node to send the request to, and this node is a coordinating node;
  • Shard = hash(routing) % number_of_primary_shards = hash(routing) % number_of_primary_shards Coordinating Node forwards requests to the corresponding node (with primary shard) through a routing algorithm for the document.
  • (3) The primary shard on the actual node processes the request and synchronizes the data to the replica node;
  • (4) When the primary node and all replica nodes are successfully executed, the response result is returned to the client.

2, the principle of

  • (1) The data is first written to the memory buffer, and then periodically (every 1s by default) the data is written to a new segment file and stored in the Filesystem cache (at the same time the memory buffer is emptied). This process is called refresh.
    • Near-real-time performance of ES: Data stored in memory buffer cannot be searched. Data can be searched only after refresh into Filesystem cache, and refresh occurs every second. Therefore, ES is called near real-time. You can trigger a refresh operation by manually calling the ES API, making the data immediately searchable.
  • (2) Since both memory Buffer and Filesystem Cache are based on memory, if the server goes down, data will be lost. Therefore, ES uses translog files to ensure data reliability. Data is written to the Translog log file. In case of machine downtime and restart, ES automatically reads the Translog file and restores the data to the memory buffer and Filesystem cache.
    • ES data loss problem: Translog is also written to Filesystem Cache first and flushed to disk every 5 seconds by default, so by default, Five seconds of data may be stored in the memory buffer or Filesystem cache of a Translog file, not on disk. If the machine goes down, five seconds of data may be lost. You can also set translog so that every write must be fsync directly to disk, but performance will be much worse.
  • (3) Flush: Repeat the above steps and the translog becomes larger and larger. When the translog file defaults to every 30 minutes or the translog threshold exceeds 512 MB, the commit operation is triggered.
    • Refresh the buffer to the Filesystem Cache to clear the buffer.

    • Create a new commit point and forcibly fsync all current data in Filesystem Cache to disk files.

    • Delete the old Translog log file and create a new Translog file, at which point the COMMIT operation is complete.

Update and delete process

  • Delete and update are both write operations, but since documents in Elasticsearch are immutable, they cannot be deleted or changed to show changes. So ES uses a.del file to indicate whether the document has been deleted, and each segment on disk has a corresponding.del file.

1. Delete operations

  • The document is not actually deleted, but is marked as deleted in the.del file. The document will still match the query, but will be filtered out of the results.

2. Update operation

  • Mark the old doc as deleted and create a new doc.

  • Each time the memory buffer refresh, a segment file is generated. By default, a segment file is generated for 1s. In this case, more and more segments are generated.

  • When merging multiple segment files, the doc identified as deleted is physically deleted from the new segment, and the new segment file is then written to the disk. A commit point is written to identify all new segment files, open the segment file for search, and delete the old segment file.

Fourth, search process

  • The search is performed as a two-phase process called Query Then Fetch:

1. Query phase

  • The client sends the request to coordinate node, and the coordinate node broadcasts the search request to all primary shards or Replica Shards. Each shard performs a search locally and builds a priority queue matching documents of size from + size. Each shard returns the ID and sorting value of all documents in its respective priority queue to the coordination node, and the final result is produced by the coordination node and the merging, sorting, paging and other operations of retrograde data.

2. Fetch phase

  • The coordination node queries the actual document data on each node according to the doc ID, and the coordination node returns the result to the client.
    • Coordinate Node carries out hash routing for doc ID and forwards the request to the corresponding node. In this case, the round-robin random polling algorithm is used to randomly select one from the Primary shard and all replicas to balance the read request load.
    • The node that receives the request returns the document to the coordinate Node.
    • Coordinate Node returns the document to the client.
  • The search type of Query Then Fetch refers to the data of the fragment when scoring the document relevance, which may not be accurate when the number of documents is small. DFS Query Then Fetch adds a pre-query processing. Ask Term and Document Frequency, this score is more accurate, but performance deteriorates.

Fifth, tuning method

1. Design stage

  • Based on service incremental requirements, indexes are created based on date templates and are rolled over through the API.
  • Use aliases for index management.
  • The force_merge operation is performed at dawn every day to release space.
  • Hot data is stored on SSD to improve retrieval efficiency. Cold data is periodically shrink to reduce storage.
  • Adopt the life cycle management of index for curator.
  • Only for the fields that need word segmentation, set the word segmentation reasonably.
  • In the Mapping stage, the attributes of each field are fully combined, such as whether to be retrieved and stored.

2. Write phase

  • Set the number of copies before writing to 0.
  • Disable refresh_interval and set it to -1 before writing.
  • Writing process: Bulk writing is adopted.
  • Restore the number of copies and refresh interval after write.
  • Use automatically generated ids whenever possible.

3. Query phase

  • Disable wildcard.
  • Disable batch terms (hundreds of scenarios).
  • Make full use of the inverted index mechanism to keyword type as much as possible.
  • When there is a large amount of data, the index can be determined based on time and then retrieved.
  • Set a proper routing mechanism.

How to ensure read and write consistency with high concurrency

1. Update operation

  • Optimistic concurrency control can be used by version number to ensure that the new version is not overwritten by the old version.
  • There is one for each document_versionVersion number, which is incremented by 1 when the document is changed. Elasticsearch uses this_versionEnsure that all changes are sorted correctly. When an old version appears after a new version, it is ignored.
  • using_versionThis advantage ensures that data is not lost due to modification conflicts. Such as specifying the version of the document to make changes. If that version number is not the current one, our request has failed.

2. Write operation

  • The consistency level supports quorum/ One/All and defaults to quorum, that is, write operations are allowed only when most shards are available. But even if most are available, there may be a failure to write to the copy due to network reasons, so that the copy is considered faulty and the shard is rebuilt on a different node.
    • One: A write operation can be performed as long as one primary shard is active and available
    • All: The write operation can be performed only when all primary and Replica shards are active
    • Quorum: Default, the majority of all shards must be active and available before this write operation can be performed

3. Read operation

  • You can set Replication to sync(the default), which makes the operation return only after both master and replica shards have completed; If Replication is set to ASYNc, you can also query the master shard by setting the search request parameter _preference to primary to ensure that the document is the latest version.

7. HA HA Master node election

1. Elasticsearch is distributed

  • Elasticsearch splits stored data into different shards and stores multiple copies for each shard to ensure high availability in a distributed environment. In Elasticsearch, the nodes are peer. The Master of the cluster is selected between the nodes. The Master is responsible for the change of the cluster status and synchronizes the change to other nodes.
  • Data writing has a simple Routing rule and can be routed to any node in the cluster, so the data writing pressure is distributed across the cluster.

How does Elasticsearch select Master

  • For Elasticsearch, the ZenDiscovery module is responsible for Ping (the RPC between nodes to find each other) and Unicast (the Unicast module contains a list of hosts to control which nodes need to be pinged).
    • Confirm the minimum number of votes for the candidate primary node, as set in elasticSearch.ymldiscovery.zen.minimum_master_nodes;
    • For all nodes that are master candidates (node.master: trueSelect the first node (0th bit), which is considered to be the master node for the time being.
    • If the number of votes for a node reaches a threshold and that node elects itself, that node is master. Otherwise, a new election will be held until the above conditions are met.
  • The master node manages clusters, nodes, and indexes, but does not manage documents. The DATA node can disable the HTTP function.

How does Elasticsearch avoid brain split

  • (1) When the number of master candidate nodes in the cluster is not less than 3 (node.master: true), you can set the minimum number of votes (discovery.zen.minimum_master_nodes), set more than half of all candidate nodes to solve the split brain problem, that is set to(N/2)+1;
  • (2) When there are only two candidates for the cluster master node, this situation is not reasonable. It is better to put another onenode.masterChange to be false. If we don’t change the node Settings, we’ll do the same(N/2)+1Formula, at this pointdiscovery.zen.minimum_master_nodesIt should be set to 2. The problem is that if one of the two master candidates fails, the master cannot be selected.

8. Performance improvement method in index building stage

  • (1) Use SSD storage media.
  • (2) Use batch requests and resize them: 5-15 MB per batch is a good starting point.
  • (3) If you’re doing bulk imports, consider setting them upindex.number_of_replicas: 0Close the copy.
  • (4) If you don’t need near-real-time accuracy for your search results, consider indexing each of themindex.refresh_intervalChanged to 30 s.
  • (5) Segments and merge: Default value of Elasticsearch is 20 MB/s. But if you’re using SSDS, consider going up to 100-200 MB/s. If you’re doing bulk imports and don’t care about search at all, you can turn merge limiting off completely.
  • (6) Increaseindex.translog.flush_threshold_sizeSet from the default of 512 MB to something larger, such as 1 GB.

Deep paging and scrolling search scroll

1. Deep paging

  • Deep pagination is the depth of the search, for example, page 1, page 2, page 10, page 20, is shallow; Page 10,000. Page 20,000 is pretty deep. Searching too deeply can cause performance problems, consuming memory and CPU. And for performance, ES does not support paging queries that exceed 10,000 data.
  • In order to solve the problem of deep paging, we should avoid deep paging operations (limiting the number of pages that can be paging), such as providing a maximum of 100 pages of display, starting from page 101, after all, users will not search that deeply.

2. Scroll

  • Querying 10,000 + data at a time often affects performance because there is too much data. In this case, you can use scroll search, or scroll. A scroll search can first query some data, and then follow the query down.
  • There is a scroll ID on the first query, which is equivalent to an anchor tag, and a subsequent scroll requires the scroll ID of the previous search, based on which the next search request is made. Each search is based on a historical snapshot of the data. If there is a change in the data during the query, it has nothing to do with the search.