This is the seventh day of my participation in the August More text Challenge. For details, see: August More Text Challenge
If ❤️ my article is helpful, welcome to like, follow. This is the greatest encouragement for me to continue my technical creation. More past articles in my personal column
What is an inverted index
To compare a book to a search engine, the contents of the book page, page number to page number content of the word association – that is, the keyword at the end of the index book, the content index page, page number to page number correlation – is the inverted index
The transformation of forward index and inverted index structure
The core component of an inverted index
-
The inverted index has two parts
- Term Dictionary, which records the words in all documents, records the association of words to the inverted list
- Word dictionaries are generally large and can be implemented by B + tree or hash zipper method to meet high performance insertion and query
- Posting List – Posting Lists record the document combinations corresponding to words and are made up of inverted index entries
- Inverted index entries (Posting)
- Document ID
- Word frequency TF- The number of times the word appears in the document for relevance score
- Position – The Position of the word in the participle of the document. Used for statement search (phrase Query)
- Offset – Record the start and end positions of words, to achieve highlighting
- Document ID
- Inverted index entries (Posting)
- Term Dictionary, which records the words in all documents, records the association of words to the inverted list
-
Each field in Elasticsearch’s JSON document has its own inverted index
-
You can specify that certain fields are not indexed
- Advantages: Saves storage space
- Disadvantages: Fields cannot be searched separately
Inverted index demonstration
POST _analyze {"analyzer": "standard", "text": "Mastering Elasticsearch"} "mastering", "start_offset" : 0, "end_offset" : 9, "type" : "<ALPHANUM>", "position" : 0 }, { "token" : "elasticsearch", "start_offset" : 10, "end_offset" : 23, "type" : "<ALPHANUM>", "position" : 1 } ] } ==================================================== POST _analyze { "analyzer": "standard", "text": {"Elasticsearch ": [{"token" : "start_offset" : 0, "end_offset" : 0} 13, "type" : "<ALPHANUM>", "position" : 0 }, { "token" : "essentials", "start_offset" : 14, "end_offset" : 24, "type" : "<ALPHANUM>", "position" : 1 } ] }Copy the code
reading
- Wikipedia is indexed backwards
- IO describes the inversion index