background

In recent years, artificial intelligence with deep learning technology as the core has attracted wide attention. Both academia and industry regard deep learning as the focus of research and application. The rapid development of deep learning technology is inseparable from the accumulation of massive data, the improvement of computing power and the improvement of algorithm model. This paper mainly introduces the application of deep learning technology in the field of text, which can be roughly divided into four dimensions: word, sentence, discourse and system-level application.

  • Word. In terms of word segmentation, from the most classic forward and backward matching to Conditional Random Field (CRF) sequence labeling, to the present BI-LSTM +CRF model, it no longer needs design features and can achieve the best sequence labeling effect from word granularity, which can be extended to sequence labeling in texts. Such as part-of-speech tagging and special recognition.
  • The sentence. On Parser, in addition to deep learning sequence annotation for word granularity introduction, deep learning model can also be used to improve shift-Reduce intermediate classification judgment effect. In terms of sentence generation, the sequential to sequential (Seq2Seq) model can be used to train automatic sentence generator, which can be used for chatting or sentence rewriting.
  • Chapter. For sentiment analysis, convolutional neural network can be used to directly model the input text to predict sentiment labels. In the aspect of reading comprehension, a cyclic neural network with memory function can be designed to do reading comprehension, which is also a hot research issue in recent years.
  • System-level applications. In terms of information retrieval, Deep learning technology can be used for similarity calculation of text matching. BOW, convolutional neural network or cyclic neural network can be used to represent the re-learning of matching relations (such as DSSM series), and DNN can be used for sorting model (such as Google Wide & Deep, etc., which will be mainly introduced later). In terms of machine translation, from Seq2Seq model to stack-LSTM + Attention and other multi-layer LSTM networks, word-based statistical machine translation model has been surpassed by neural network translation model, and has been applied in products such as Google Translate, Baidu Translate, Youtao Translate, etc. In terms of intelligent interaction, deep learning has a good application in classification, state management (such as deep reinforcement learning), reply generation and other links when making systems such as chat, conversation and question and answer.

All in all, these deep learning applications in the field of text are only the tip of the iceberg, including knowledge graph, automatic summarization, speech, image text generation and so on. The general trend is that each direction of text research and application is trying to deep learning technology, and has made progress respectively. In the field of text, if we want to make breakthroughs like image and voice, we still face many difficulties, such as the lack of large-scale annotated data for different tasks, how to capture the logic of language and the regional and cultural characteristics contained in modeling, etc. Due to the limitation of space, this paper only introduces the text matching and sorting models used in meituan text.

Text matching based on deep learning

Text matching is useful in many areas, especially information retrieval related scenarios, such as Query and Doc in searches, Query-AD in ads, Query prefix and Query in search Suggestion (see Figure 1), Query and Query in keyword recommendations, Doc and Doc in document deduplication, and so on.

Text matching is mainly concerned with computing the similarity of two paragraphs of text. Similarity problem consists of two layers: one is how to represent two text can make computer convenient processing, the need to study the effect of different methods of said difference: the second is how to define the similarity as the optimization goal, such as semantic matching similarity, click on the relationship between similarity and user behavior similarity, it is very close relationship with the business scenario.

There are many challenges in solving these two problems, one of which is how to fully account for semantics in the design model. Because Chinese polysemous words and synonyms are very common, they express different meanings in different contexts. Like how much is an apple? How much are apples a kilo? The former refers to apple-branded electronic devices, while the latter refers to fruit. Of course, there are many language phenomena that are more difficult, such as tone, context, and different expressions of spoken language.

Text representation and matching is the main line of this section, how to achieve semantic matching becomes the main idea of this section. Influenced by the evolution of the overall technology, text matching technology also has a technological process that conforms to The Times, as shown in Figure 2.

1. Vector space

The vector space model proposed around 1970 is to calculate the weight of words by using TF-IDF to represent the dimension vector of the document word list. For example, one standard word list contains word ID, word and IDF, and the other is the stop word list, as shown in Figure 3.

After removing the stop word from the participle of the text “hotel price in Lijiang”, we can get lijiang, hotel and price, and the occurrence times of the word is 1. Looking up IDF, we can get the expression of this sentence: [0, 1.5, 2.1, 0, 0…, 0, 4.1]. Among them, the weight is TF×IDF, TF is the frequency of Term in the text, and IDF is the frequency of inverse document. The two definitions have many forms, as shown in FIG. 4. The second definition is used here.

The vector space model uses high dimensional sparse vectors to represent documents, which is simple and clear. The corresponding dimension is calculated by TF-IDF, which includes point mutual information entropy of word and document and information encoding length of document from the perspective of information theory. With vector representation of documents, how do you calculate similarity? The metric formulas include Jaccard, Cosine, Euclidean Distance, BM25, etc. BM25 is a very classic method to measure document matching similarity. The formula is as follows:

Although the vector space model cannot contain the information of synonyms and polysemous words, and the dimensions become large with the enlargement of the dictionary, it is still a necessary feature of all retrieval systems because of its simplicity and good effect.

2. Matrix decomposition

The high dimension of the vector space model is not good for describing semantic information, and the document set will be represented as a high dimensional sparse large matrix. Around 1990, someone studied the decomposition of the high-dimensional sparse matrix into two long and narrow matrices through matrix decomposition, and these two low-dimensional matrices contain semantic information, which is the process of latent semantic analysis.

Suppose there are N documents with A total of V words, and a sparse matrix X of N×V is represented by the vector space of TF-IDF. Semantic analysis of words and documents is operated on this co-occurrence matrix. This co-occurrence matrix can be transformed into three matrices through singular value decomposition. The dimension of the long and narrow matrix U is N×K, the dimension of V is K×V, and the diagonal matrix K×K dimension is in the middle, as shown in Figure 5.

After decomposition, each document is represented by k-dimensional vector (K<<V), which represents potential semantic information and can be regarded as semantic space representation of the document to be expressed. The distribution of V matrix in potential space is obtained by decomposition of co-occurrence matrix.

Latent semantic analysis can make low-dimensional semantic representation of documents or words, and has high performance in matching (for example, the number of valid words in a document is greater than K). It contains semantic information and is more accurate for some documents with the same semantics. However, latent semantic analysis is not good for the modeling of polysemy semantics, and the k-dimension meaning vector is completely based on mathematical decomposition, and the physical meaning is not clear. So, around the year 2000, a thematic model emerged to address the above problems.

3. Theme model

From 2000 to 2015, thematic models based on probability graph models have aroused a boom, so what are the advantages of this model to attract everyone?

PLSA (Probabilistic Latent Semantic Analysis)

PLSA introduces the concept of topic on top of latent semantic analysis. It is a semantic meaning, modeling the topic of a document is no longer a matrix decomposition, but a probability distribution (such as polynomial distribution), so that the distribution of polysemy can be solved, and the topic has a clear meaning. However, the basis of this analysis is still the co-occurrence frequency of documents and words, and the goal of the analysis is to establish the relationship between words/documents and these underlying topics, which in turn serve as a bridge for semantic relevance. The transition can be shown in Figure 6.

Assuming that each article are composed of several themes, the probability of each topic is p (z | d), under the condition of a given topic, the probability of each word in a p | z (w). Thus, the co-occurrence of documents and words can be described in a generative manner:

Its probability graph model is shown in FIG. 7:

By EM algorithm to p (z | d) and p | z (w) as a parameter to study, specific algorithm reference Thomas Hofmann pLSA paper. Need to learn a number of p (z | d) parameter is a topic number and document number by the relationship of p | z (w) is a word count by theme, the relationship between parameter space is very big, easy to fitting. Therefore, we introduce the conjugate distribution of polynomial distribution to do Bayesian modeling, which is the method used by LDA.

LDA (Latent Dirichlet Allocation)

If pLSA is the representative of frequency school, LDA is the representative of Bayes school. LDA completely explains a document generation process mathematically by introducing Dirichlet distribution as polynomial conjugate prior, and its probability graph model is shown in FIG. 8.

Different from pLSA probability graph model, LDA probability graph model introduces two random variables α and β, which are the distribution of control parameter distribution, that is, document-topic coincidence polynomial distribution. The generation of this polynomial distribution is controlled by Dirichlet prior distribution, and the parameters can be derived by using Variational EM and Gibbs Sampling, which will not be described here.

In general, Topic model introduces the concept of “Topic” with physical meaning, and the model can learn synonyms, polysemy, semantic correlation and other information through co-occurrence of information. The resulting thematic probability distribution becomes more reasonable and meaningful as a representation. With the representation of documents, when matching, we can not only use the previous metric method, but also introduce the formula of KL and other metric distribution, which is widely used in the field of text matching. Of course, there are some problems in topic model, such as poor inference effect for short text, slow speed of multi-training parameters, and introduction of stochastic process modeling to avoid unreasonable artificial setting of topic number. With the further development of research, these problems are basically better solved. For example, for the problem of slow training speed, from LDA to SparseLDA and AliasLDA, and then to LightLDA and WarpLDA, the sampling speed is reduced from O(K) to O(1).

4. Deep learning

In 2013, Tomas Mikolov published a paper related to Word2Vec, and proposed two models CBOW (Continuous Bag of Words) and Skip-Gram, which can train word embedding very quickly and can add and subtract the word vector, which has attracted wide attention. Before this work, neural network model has experienced a long evolution. Here we first introduce Yoshua Bengio’s work of using neural networks to make language models in 2003. Word2Vec is also one of many improvements.

Neural network language model

In 2003, Yoshua Bengio used neural network to train the language model with much better results than N-Gram. The network structure is shown in FIG. 9. The input is n-gram words, predicting the next word. The dense vectors C(w(t-1)),C(w(t-2)) of the first n words are found by word vector Matrix Matrix C(dimension: n*emb_size). Then connected to the Hidden Layer for nonlinear transformation; Then connect with the output layer to make Softmax predict the probability of the next word; During training, the network weight is adjusted according to the outermost error backpropagation. It can be seen that the training complexity of this model is O(N × EMb_size + N × EMb_size ×hidden_size + hidden_size×output_size), where n ranges from 5 to 10 and EMb_size ranges from 64 to 1024. Hidden_size ranges from 64 to 1023, and output_size is the word table size, such as 10^7. Since Softmax requires the values of all words for probability normalization, the complexity is mainly in the last layer. Since then, many optimization algorithms have been proposed, such as Hierarchical Softmax and Noise Contrastive Estimation.

Word2Vec

The network structure of Word2Vec includes CBOW and Skip-Gram, as shown in Figure 10. Compared to NNLM, Word2Vec has fewer hidden layers and only a projection layer. The output layer is the tree-like Softmax, which performs Huffman coding for each word. When predicting words, it only needs to predict 0 and 1 codes on the path, thus reducing the complexity from O(V) to O(log(V)).

Taking CBOW as an example, the algorithm flow is as follows:

(1) The word vector of context words (window size is Win) is mapped to the projection layer by dimension addition. (2) After Sigmoid transformation, the projection layer predicts the coding path of the current word (Huffman tree). (3) The cross-entropy Loss function is used for back-propagation to update the Embedding parameters and intermediate layer parameters. (4) Backpropagation mechanism was used in training and SGD was used in optimization method.

As can be seen from the algorithm flow, the prediction complexity of the outermost layer is greatly reduced, and the hidden layer is also removed, which greatly improves the calculation speed. The algorithm can get the Dense Word Embedding, which is a very good representation and can be used to calculate the text matching. However, since the learning goal of this model is to predict the probability of word occurrence, that is, the language model, the general semantic information of words is learned from the massive corpus, which cannot be directly applied to the matching scenarios of customized business. Can semantic representation and matching be modeled simultaneously for business scenarios to improve matching? The DSSM family takes representation and matching into account.

DSSM series

This type of approach can model presentation and learning together, and is typical of Microsoft’s related work. The following describes the DSSM series.

(1) DSSM model framework

DSSM network structure is shown in Figure 11:

Use the search click data to train the semantic model, input Query(Q) and show Doc(D) list of click, first make semantic representation of Q and D, then calculate the similarity of Q-DK, and distinguish between click and not by Softmax. Among them, semantic representation first uses word hash to reduce the dimension of the word table (such as the English letter Ngram), after several layers of full connection and nonlinear changes to get 128 dimensional representation of Q and D. From the experimental conclusion, NDCG index was significantly improved, as shown in Figure 12.

(2) CLSM

On the basis of DSSM, CLSM adds 1-dimensional convolution and pooling operations to obtain global sentence information, as shown in Figure 13. By introducing convolution operation, the influence of the context in the window can be fully considered to ensure the personalized semantics of words in different contexts.

The corresponding effect is shown in Figure 14:

(3) LSTM-DSSM

Lstm-dssm uses LSTM as the representation of Q and D, and other frameworks are consistent with DSSM. Its network structure diagram is shown in Figure 15. Because LSTM has the function of semantic memory and contains word order information, it is more suitable to represent sentences. Of course, two-way LSTM and Attention Model can also be used.

Meituan deep learning text matching algorithm

As a classical problem of natural language processing, semantic matching of text can be used in the recall and sorting of search, recommendation, advertising and other retrieval systems, as well as in the scene of text de-duplication, normalization, clustering and extraction. Common techniques and recent advances in semantic matching have been described previously.

In typical O2O application scenarios like Meituan, the presentation of the results is not only strongly related to the semantics of the language layer expressed by users, but also strongly related to the user intention and user status. User intent is what is the user here for? For example, when a user searches for “Guan Nei Guan Wai” on Baidu, his intention may be to know the geographical scope of guan Nei and Guan Wai. “Guan Nei” and “Guan Wai” are retrieved as two words. A search for “guan nei Guan Wai” on Meituan might lead to “Guan Nei Guan Wai”, which is treated as a single word. As for user status, a user in Beijing and a user in Wuhan can search for any given term on Baidu or Taobao, and they will not get much different results. But it’s a whole different story in location-dependent apps like Meituan. A search for “Yellow Crane Tower” in Wuhan, for example, might yield scenic spot tickets, while a search for “Yellow Crane Tower” in Beijing might yield a restaurant.

How to combine language level information with user intention and user state to make semantic matching?

In addition to the short text, some O2O business scene related features are introduced into the designed semantic matching framework of deep learning. Click/order data is used to guide the optimization direction of semantic matching model, and finally the trained click correlation model is applied to search related businesses.

ClickNet, a click similarity framework designed for the Meituan scenario, is a lightweight model that combines both effectiveness and performance and can be easily generalized to online applications, as shown in Figure 16.

  • The presentation layer. Query and merchant name are represented by semantic and business features respectively, of which semantic features are the core, and the whole vector representation of the short text is obtained by DNN/CNN/RNN/LSTM/GRU methods. In addition, business-related features will be introduced, such as information about users or businesses, distance between users and businesses, and business evaluation.
  • Learning layer. After multi-layer full connection and nonlinear variation, the matching score is predicted, and the network is adjusted according to the score and label to learn the click matching relationship between Query and merchant name.

If you want to train good semantic models on the ClickNet algorithm framework, you also need to tune the model for the scenario. First of all, we do a lot of optimization from the training corpus, such as considering sample imbalance, sample importance and other issues. Secondly, different optimization algorithms, network size hierarchy and adjustment of hyperparameters are considered in the model parameter tuning.

After model training and optimization, the semantic matching model has been put online in search, advertising, hotel, tourism and other recall and ranking systems of Meituan platform, which has greatly improved the index of purchase rate/revenue/click rate and so on.

To sum up, the application of deep learning in semantic matching needs to design an appropriate algorithm framework for business scenarios. In addition, although deep learning algorithm reduces feature engineering work, the difficulty of model tuning will increase. Therefore, the framework design, business corpus processing and model parameter tuning can be considered comprehensively to achieve a model with both effect and performance.

Sequencing model based on deep learning

Introduction to sorting Model

In search, advertisement, recommendation, q&A and other systems, ranking is very important because it is necessary to select a limited number of candidates from a large number of recall candidate sets for display. How can you design this ordering so that the end result of the business is better? This requires a complex ordering model. For example, the sorting in Meituan search system will consider the multi-dimensional information such as user history behavior, Query, and business information, extract and design various features, and obtain the sorting model through the training of massive data. Here is a brief review of sorting model types and evolution, with emphasis on the use of deep learning in sorting models.

The sorting model is mainly classified into three categories: Pointwise, Pairwise and Listwise, as shown in Figure 17. Pointwise classiifies or regresses a single sample, that is, predicts the score of <Query, Doc> as the sorting criterion. The representative models include logistic regression and XGBoost. Pairwise will consider the partial order relationship between two samples and convert it into a single classification problem. For example, if <Query, Doc1> is higher than <Query, Doc2>, the Pair will predict positive, and vice versa. Typical models include RankSVM and LambdaMART. The third type is Listwise model. The whole sorting is taken as the optimization target, and the model is optimized by the gap between the predicted distribution and the real sorting distribution. A typical model is ListNet.

Evolution of deep learning sequencing models

In the development of ranking model, neural network has been used to do ranking model for a long time. For example, RankNet proposed by Microsoft Research in 2005 uses neural network to do Pairwise learning. In 2012, Google introduced the CTR method using deep learning; At the same time, Baidu began using deep learning as CTR in Phoenix Nest, which went live in 2013. With the popularity of Deep learning, various companies and research institutions are trying to apply Deep learning in sorting, such as Wide & Deep of Google, DNN recommendation model of YouTube, etc. DSSM introduced above can also be used for sorting. Below is a brief introduction to RankNet, Wide & Deep, and YouTube sorting models.

RankNet

RankNet is a model of Pairwise, also converted to Pointwise for processing. For example, in a query, Di and Dj have a partial order relationship, and the former is more relevant than the latter. Then, the characteristics of the two are taken as the input of neural network, and after a layer of nonlinear changes, Loss is connected to learn the target. If Di is more relevant than Dj, the predicted probability is as follows, where Si and Sj are the scores of corresponding Doc.

Neural network is used in score calculation, as shown in FIG. 18. Input features of each sample are used as the first layer, and scores are obtained after nonlinear transformation. Since the probability formula that RankNet needs to predict is transitive, that is, the partial order probability of Di and Dj can be obtained by Di and Dk as well as Dk and Dj, RankNet changes the computational complexity from O(n²) to O(n). For detailed introduction, please refer to the literature.

Of course, subsequent studies found that RankNet, aiming at reducing error pairs, did not have a good effect on NDCG and other indicators (concerned about the location of relevant documents), so an improved model, such as LambdaRank, emerged later. RankNet is a typical neural network ordering model, but at that time, the industry used a lot of simple linear models, such as logistic regression, linear model through a large number of artificial design features to improve the effect, the model has good interpretation and high performance. When artificial features are designed to a certain extent, the bottleneck will be encountered. However, deep learning can learn complex relationships from original features, greatly reducing the work of feature engineering. Moreover, GPU, FPGA and other high-performance auxiliary processors become popular, which promotes the extensive research of deep neural network to do sequencing model.

Wide&Deep

Google has published a paper “Wide & Deep Learning”, whose ideas can be used in recommendations. For example, Google Apps recommendations made good use of this idea and published the model in TensorFlow. The overall model structure of Wide & Deep is divided into two parts, Wide and Deep, which are combined together in the outermost layer to learn the model, as shown in Figure 19. The input is all sparse features, but features are divided into two types: one is suitable for Deep network changes, and the other is suitable for timeliness or memory features, such as statistical features or display positions; The other can be directly attached to the outermost layer and is suitable for features that have generalized power but require deep combination sampling, such as category, type, etc. During the model optimization, the two parts are optimized jointly, FTRL is used for Wide part, and Adagrad algorithm is used for Deep part. In this way, Wide and Deep distinguish different types of features, give full play to their respective roles, and have good interpretation.

This idea can actually be extended. For example, the Wide connection is not in the outermost layer, but in a certain layer, and some layers of Deep can also be connected to the outermost layer to take advantage of the Dense information abstraction of different layers. Similar to Wide & Deep network Connection, such as Direct Connection in NNLM model in 2003 and RNNLM model in 2010, shallow and Deep combination can accelerate convergence well, and the Highway way of Deep learning is also similar. At present, there are many applications of Wide & Deep, for example, alibaba has a good application.

YouTube DNN sorting model

YouTube is used to predict how long users watch videos, which translates into a weighted logistic regression problem. The DNN sorting model is similar to the previous work, and its network structure is a standard feedforward neural network, as shown in Figure 20. The characteristics of DNN sorting model are still in the input characteristics. Although the deep learning model has low requirements for feature engineering, many data can be added into the model only after simple processing. The features in Figure 20 are divided into a number of domains, such as language aspect, video aspect, video ID that the user has watched historically, and statistics and normalized values for previous viewing duration. The discrete values become continuous vectors after Embedding processing, and then the final tag is predicted after cascade through multi-layer nonlinear changes.

As can be seen from the deep learning correlation sorting model introduced above, the sorting model requires a variety of data types and different data meanings, which are different from the single input form in the fields of image and voice. Therefore, in the sorting model, the selection and representation of input features is very important, such as continuous feature, discrete feature processing, user history, document feature distinction, etc. In the Meituan scenario, the design of the sorting model needs to consider the business characteristics and make a lot of attempts to represent the input characteristics.

Meituan deep learning sequencing model attempt

The ClickNet framework is introduced in the semantic matching model, but it can also be used for sorting. The difference from semantic matching is mainly in the presentation layer, as shown in Figure 21. If ClickNet is used as a CTR model for search, then the semantic characteristics of Query and Title at the presentation layer are only part of it, as are user queries, user behavior, merchant information, and cross-composition characteristics, which can be classified into different domains based on the type of characteristics. Further, if the scenario does not contain semantic matching, the inputs to the model can have only business characteristics. Here’s a quick look at trying to use ClickNet for sorting in Meituan.

ClickNet-v1

ClickNet was designed as a text matching model and added as a one-dimensional semantic feature to the business Rank model to improve performance. However, based on our data analysis after launch, we found that ClickNet’s semantic feature presentation, supplemented by some business features, performed better in the sorting system. We have made the following improvements to the sorting model.

  • (1) Selection of business features. From the existing artificial features of business Rank, select O2O representative features that have not been advanced processing, such as user location, business location, user history information, business rating star, business seasonality, etc.
  • (2) Feature discretization. The selected service features are discretized, such as by feature interval.
  • (3) Sample processing. According to the business needs to sample positive and negative cases, to click, order, pay to do different weighting operations.
  • (4) Information fusion. Gate is introduced to control the fusion of semantic features and various business features, rather than just summing or cascading. Gate parameters are learned from samples.

ClickNet is much better optimized for business Rank goals, but the model is still heavily semantic. Can you use ClickNet directly to model sorting? The answer is yes. All you need to do is to accentuate the business features, weaken or remove the semantic presentation features, and the modified model is Clicknet-V2.

ClickNet-v2

Clicknet-v2 replaces the service Rank model with the service characteristics as the target, uses the service characteristics as the ClickNet presentation layer input, and passes the model through the discretization of each feature. Unlike ClickNET-V1, Clicknet-V2 has a wide variety of business characteristics that require in-depth analysis and model design.

How do you think about location preference? Because the display location will be sequential, the following display is not easy to be seen by users, thus the natural click through rate will be low. One solution is to connect the location information directly to the outermost layer without feature combination.

Another example is whether the feature combination is sufficient after the multi-layer nonlinear change of each business feature? One solution is to use polynomial nonlinear transformations, which combine the features of multiple layers well.

Is the model combination better? One solution is to try cascading FM and ClickNet, or Bagging each model.

There are also many business scenariorelated considerations such as the interpretation of the model.

ClickNet is based on Ginger, a self-developed deep learning framework, with excellent convergence speed and performance. Let’s look at some tests on the sorting task, as shown in Figure 22. In Higgs data, ClickNet based on Ginger is 34 micropoints higher than XGBoost-based AUC, and ClickNet implemented using TensorFlow is 3 micropoints slower than Ginger. As Figure 23 shows, ClickNet has improved accuracy over linear models.

conclusion

Due to its strong fitting ability and low requirement for feature engineering, deep learning has been widely applied in the field of text. This chapter takes semantic matching and sorting models as examples to introduce the industry progress and the application of Meituan scenarios respectively.

The first part introduces several stages of semantic matching including vector space, latent semantic analysis, theme model and deep learning, focusing on the Embedding and DSSM models of deep learning applied to semantic matching, as well as the ClickNet model of Meituan. The second part introduces some progress of deep learning in ranking model and some attempts of Meituan. In addition to these two parts, deep learning has penetrated almost every aspect of text. Meituan also has many attempts, such as sentiment analysis, dialogue system, abstract generation, keyword generation, etc., which is not introduced due to space limitations. In short, cognitive intelligence still has a long way to go. Language is the cultural precipitation of human history, involving many complex problems such as semantics, logic, culture and emotion. We believe that deep learning will soon make a big breakthrough in the field of text.

reference

[1] Thomas Hofmann. “Probabilistic Latent Semantic Analysis”. 1999.

[2] David M. Bilei, Andrew Y. NG, Michael Jordan. “Latent Dirichlet Allocation”. 2002.

[3] Huang, Po-sen et al. “Learning Deep Structured Semantic Models for Web Search using Clickthrough Data”, In CIKM 2013.

[4] Shen, Yelong, He, Xiaodong, Gao, Jianfeng, et al. “A latent semantic model with convolutional-pooling structure for information retrieval” in CIKM 2014.

[5] H. Palangi et al. “Semantic modeling with long-short-term memory for information retrieval”. 2015.

Team introduction

Meituan Dianping algorithm team is the “brain” of meituan Dianping technology team, covering search, recommendation, advertising, intelligent scheduling, natural language processing, computer vision, robotics, unmanned driving and other technical fields. It has helped hundreds of millions of active meituan-Dianping users improve their user experience and millions of merchants in more than 200 categories, including catering, hotel, marriage, beauty and parent-child, improve their operation efficiency. At present, Meituan-Dianping algorithm team is actively exploring and researching in the field of artificial intelligence, constantly innovating and practicing, and is committed to applying the most cutting-edge technology to bring better life service experience to advertising consumers.