This is the seventh day of my participation in the August More text Challenge. For details, see: August More Text Challenge

If ❤️ my article is helpful, welcome to like, follow. This is the greatest encouragement for me to continue my technical creation. More past articles in my personal column

What is an inverted index

To compare a book to a search engine, the contents of the book page, page number to page number content of the word association – that is, the keyword at the end of the index book, the content index page, page number to page number correlation – is the inverted index

The transformation of forward index and inverted index structure

The core component of an inverted index

  • The inverted index has two parts

    • Term Dictionary, which records the words in all documents, records the association of words to the inverted list
      • Word dictionaries are generally large and can be implemented by B + tree or hash zipper method to meet high performance insertion and query
    • Posting List – Posting Lists record the document combinations corresponding to words and are made up of inverted index entries
      • Inverted index entries (Posting)
        • Document ID
          • Word frequency TF- The number of times the word appears in the document for relevance score
          • Position – The Position of the word in the participle of the document. Used for statement search (phrase Query)
          • Offset – Record the start and end positions of words, to achieve highlighting
  • Each field in Elasticsearch’s JSON document has its own inverted index

  • You can specify that certain fields are not indexed

    • Advantages: Saves storage space
    • Disadvantages: Fields cannot be searched separately

Inverted index demonstration

POST _analyze {"analyzer": "standard", "text": "Mastering Elasticsearch"} "mastering", "start_offset" : 0, "end_offset" : 9, "type" : "<ALPHANUM>", "position" : 0 }, { "token" : "elasticsearch", "start_offset" : 10, "end_offset" : 23, "type" : "<ALPHANUM>", "position" : 1 } ] } ==================================================== POST _analyze { "analyzer": "standard", "text": {"Elasticsearch ": [{"token" : "start_offset" : 0, "end_offset" : 0} 13, "type" : "<ALPHANUM>", "position" : 0 }, { "token" : "essentials", "start_offset" : 14, "end_offset" : 24, "type" : "<ALPHANUM>", "position" : 1 } ] }Copy the code

reading

  • Wikipedia is indexed backwards
  • IO describes the inversion index