I. Background introduction
Xunsearch is a high performance search engine, mainly for word cutting + index search. Tars is the open source micro-service framework of Tencent. The system is complete
Second, the collection of articles
2.1 Xunsearch is a high performance retrieval solution
For full text search engine processing is very important. Full-text search engine mainly serves for article search and specific data processing. This is a very important point for overall document use and lookup.
For full-text search, consider lightweight XunSearch, Coreseek (a Sphinx variant that supports Chinese search), Elasticsearch for small to medium sized apps, and Elasticsearch for large apps
-
Component packages for the PHP SDK. To view
-
Based on Xunsearch + Laravel Scout Laravel College full-text search function (support multi-model search) view
-
Mechanism and principle of Xunsearch
-
SCWS Chinese word segmentation logic, view
- Single word sharding by default. For example, there are the following sentences: “We are having dinner”, which are broken down into [I], [we], [at], [eat], [meal], [tu]. Term segmentation in this way is the least, because we use only a few thousand Chinese characters.
- Binary segmentation, that is, every two words in a sentence as a word. Continuing with the sentence “We are eating”, dichotomy yields the following words: [we], [we are], [eating], [eating], [eating].
- Divide according to meaning. This method to use the dictionary, the common forward maximum segmentation method and reverse maximum segmentation method. Let’s take “We’re eating” as an example. Using forward segmentation, you might end up with the following words: [we], [eating], [meal], [what], whereas using reverse maximum segmentation, you might end up with the following words: [we], [at], [eating], [what].
- Based on statistical probability segmentation. This method, according to a probability model can be obtained from an existing terms of a probability of a word, also with? “this sentence” we’re eating for an example may not be appropriate, assume that already exist [we] the word, then according to probability and statistics model can be established [eat] the word probability are obtained. Of course, models in practice are much more complex, such as the famous Hidden Markov model.
-
SCWS word segmentation framework, view
-
Xapian is a full text search program written in C++, his role is similar to Java lucene, view
2.2 tars-php: PHP builds a high-performance RPC framework,To view
Tencent’s open source Tars product system can better support business expansion and logical processing.
-
Tars PHP environment setup, view
-
For a detailed introduction to Tars PHP, see
-
TARS- Tencent open Source Micro-service architecture technology revealed, [to see] (file:///Users/Macx/Downloads/PPT-TARS-%E8%85%BE%E8%AE%AF%E5%BC%80%E6%BA%90%E5%BE%AE%E6%9C%8D%E5%8A%A1%E6%9E%B6%E6%9 E%84%E6%8A%80%E6%9C%AF%E6%8F%AD%E7%A7%98%20%E9%92%9F%E7%A7%91.pdf)
2.3 ThinkPHP support duoduo module operation,To view
-
The API version needs to support multiple API approaches and operate on the library of data.
! [image-20181229171857064](/Users/Macx/Library/Application Support/typora-user-images/image-20181229171857064.png)
Ix. Problems encountered
- Identification of unique identifier of web mobile phone
- Fingerprinting identifies the operation to view
- Other ways to handle, [view