Description: Lucene is the most popular search library, elasticSearch is based on Lucene distributed search engine.
2) Elastic Search is also developed in Java and uses Lucene as its core for indexing and searching, but it aims to make full text search simple by hiding the complexity of Lunece with a simple RESTful API.
1) Indexes in ElasticSearch are logical Spaces that organize data (like databases).
Each ElasticSearch index has 1 or more shards (default is 5). Sharding corresponds to Lucene’s index where the data is actually stored, and sharding itself is a search engine. Each shard has zero or more copies (one by default).
2) ElasticSearch’s index also contains “type” (similar to a table), which logically isolates data in the index.
3) In ElasticSearch index, given a type, all of its documents will have the same attribute (similar to schema)
Indices == Databases
Types == Tables
Properties == Schema
Copy the code
Figure A shows an ElasticSearch index with three shards, each with one copy. These shards make up an ElasticSearch index, and each shard is itself a Lucene index.
Figure B shows the logical relationships between ElasticSearch indexes, sharding, Lucene indexes, and documents.
Elastic Search uses Apache Lucene, the Java full-text Search tool developed by Doug Cutting (father of Apache Lucene), which uses data structures called inverted indexes internally. It is designed to provide low latency services for full-text search results.
The document is the data unit of ElasticSearch, and the word in the document is divided into words, and the ordered list of words is created. Then the word is associated with the list of places in the document, and the inverted index is formed.
Suppose there are two documents:
- Insight Data Enginerring Fellows Program
- Insight Data Science Fellows Program
Term is | The document |
---|---|
data | Doc 1, Doc 2 |
enginerring | Doc 1 |
fellows | Doc 1, Doc 2 |
insight | Doc 1, Doc 2 |
program | Doc 1, Doc 2 |
science | Doc 2 |
When we want to find documents that contain the word insight, we can scan the inverted index to find “insight” and return the document IDS (Doc 1 and Doc 2) that contain the word.
To improve scalability, we should analyze the document before indexing it.
1) Turn sentence entries into single entries.
2) Normalize the single pass into standard form
By default, ElasticSearch uses a standard parser:
1) Standard word segmentation takes single time as boundary word segmentation
2) Lowercase filter converts words
Basic knowledge and theory; The association between ES and Lucene