Recommend original columns
Uncle big data ramble
What is ES? Stop asking such low-level questions after reading this article!
- Non-leaf nodes only store key-value information.
- There is a chain pointer between all leaf nodes.
- Data records are stored in leaf nodes.
ID | Name | Sex |
1 | Kate | female |
2 | John | Male |
3 | Bill | Male |
Term | Posting List |
Kate | 1 |
John | 2 |
Bill | 3 |
Term | Posting List |
female | 1 |
Male | [2, 3] |
The Term Dictionary:
The Term Index:
Finite State Transducers (FST)
- Each edge has two attributes, one for label (the key element) and one for Value(out).
- Each node has two properties, Final=true/false (true if the node ends with a key); If final is true, there is a FinalOut, FinalOut=entry value – the sum of the path’s out values.
- For example, for node 8, the key of the corresponding entry is DO and the value is 15, and the sum of the out values of the path is 2, so FinalOut=15-2=13
- Where does the out value come from?
- When only one data is written, such as cat, the out value of the first byte (c) is equal to the value of the entry (5).
- When deep writes, the out value of “d” is “10” because the data starting with “D” has not been written.
- When do writes, because “d” = “10”, “o” = “15” – “10” = “5”
- When dog is written, “d” = “10” and “o” = “5” have exceeded the value “2” of dog. In this case, “D” is set to “2” and “o” is set to “0” to satisfy the condition that dog= “2”.
- However, the out values of deep and do are then reassigned
- The sum of the entire path of deep is “10”, given that “d” = “2”, so “e” takes the remaining “8”.
- The entire path sum of do is “15”, “D” = “2”, “o” = “0”, and there is no label, so FinalOut=15-2-0=13.
- Use the Skip list data structure to quickly do and, or
- Use the bitset “and” mentioned above
>>
Want to learn big Data? Click find uncle!
<<
Choose the direction? How much do you know about big data positions
Play data center – big guy play concept, younger brother write interface
During my one year at Alibaba, I changed my once-held technical thinking
Is Spark faster than MR because it is computed in memory? Wrong!