Author’s brief introduction
Jian Li is the R&D director of Ctrip Vacation R&d Department. He joined Ctrip at the end of 2013 and has accumulated some practice and accumulation in data mining analysis and artificial intelligence.
With the application of big data becoming more and more widespread, ARTIFICIAL intelligence is finally showing its vitality again after several ups and downs. In addition to the development at the theoretical base level, what is most striking about this round of development is the unprecedented data dividend from the growth of big data infrastructure, storage and computing power.
The progress of artificial intelligence is highlighted in knowledge engineering represented by knowledge graph and machine learning represented by deep learning.
In the future, as the dividend of deep learning for big data is exhausted, if there is no new breakthrough in basic theory, the ceiling of the effect of deep learning model will be increasingly near. On the other hand, a large number of knowledge maps keep emerging, which contains a large amount of prior knowledge of human beings, but has not been effectively utilized by deep learning.
The fusion of knowledge graph and deep learning has become one of the important ideas to further improve the effect of deep learning. Symbolism represented by knowledge graph and connectionism represented by deep learning have gradually broken away from their original track of independent development and embarked on a new road of collaborative development.
First, the construction of large-scale knowledge graph
Since the knowledge graph was developed from semantic Network in 1960s, it has experienced expert system in 1980s, Bayesian Network in 1990s, OWL and Semantic WEB in 2000s, and Google’s knowledge graph after 2010. Google’s knowledge Graph already contains hundreds of millions of entries and is widely used in search, recommendations and more.
Knowledge graph storage and query languages have also gone through a history of washing, from RDF to OWL and SPARQL query, are gradually abandoned by the industrial mainstream due to the inconvenience and high cost. Graph database has gradually become the main storage method of knowledge graph.
At present, graph databases widely used include Neo4J, GraphSQL, SparkGraphx (including graph computing engine), hbase-based Titan, BlazeGraph, etc., and their storage and query languages are also different. OrientDB and PostgresQL are also widely used in practical application scenarios, mainly due to their relatively low implementation cost and performance advantages.
Because the construction of knowledge map on a large scale tend to have many entities and relationships need to be from the original data (can be structured unstructured) were extracted, and structured way to store, and we rely on the original data often exist in the multi-source heterogeneous environment, abundant knowledge extraction and fusion, with so Is the first and foremost unavoidable problem.
Converting structured data into a graph structure is a relatively easy and effortless project, so it is recommended that this step be done first.
For complex unstructured data, at present, the main methods for knowledge graph construction are traditional NLP and deep learning-based model. However, at present, more and more people tend to use deep learning to extract AVP (attribute-value pairs).
There are many deep learning models that can be used to complete end-to-end tasks including named entity recognition NER, relationship extraction and relationship completion to build and enrich knowledge graphs.
Named Entity Recognition (NER) is a method of identifying relevant entities (subject and object words in triplet) from an unstructured text and labeling their positions and types. It is the basis of some complex tasks in NLP, such as relationship extraction and information retrieval.
NER has always been a hot topic in the field of NLP. From the early methods based on dictionaries and rules, to the traditional machine learning methods, to the methods based on deep learning in recent years, the general evolution of NER methods is shown as follows.
In machine learning, NER is defined as a sequence annotation problem. Different from the classification problem, the prediction labels in the sequence labeling problem are not only related to the input features, but also related to the previous prediction labels, that is, there is mutual dependence and influence between the prediction labels.
Conditional Random Field (CRF) is the mainstream model of sequence labeling. Its objective function not only considers the input state characteristic function, but also includes the label transfer characteristic function. SGD learning parameters can be used during training. When forecasting, you can use Vertibi algorithm to solve for the optimal sequence that maximizes the objective function.
At present, bilSTM-CNN-CRF is a common sequence annotation model based on deep learning. It is mainly composed of Embedding (word vector, word vector, etc.), BiLSTM, TANh hidden layer and CRF layer (CNN is not needed for Chinese). Our experiments show that BiLSTM-CRF can achieve better results. In terms of features, since it inherits the advantages of deep learning, word vector and word vector can achieve good results without paving the way for feature work.
In recent months, we’ve also experimented with the Attention mechanism, and semi-supervised work that requires only a small number of labeled samples.
On the basis of BilstM-CRF, Attention mechanism is used to improve the original word vector and word vector splicing to sum by weight, and two hidden layers are used to learn the weight of Attention, so as to make the model
Information about word vectors and word vectors can be used dynamically. Also add NE class features and use Attention on word vectors to learn to focus on more efficient characters. The experimental results are better than BiLSTM-CRF method.
The reason why a large amount of space is devoted to NER’s deep learning model here is that the relational extraction model is also implemented using the same model, which is also a sequence annotation problem in nature. So I won’t repeat it here.
Another difficulty in knowledge graph construction is knowledge fusion, that is, multi-source data fusion. Fusion includes entity alignment, attribute alignment, conflict resolution, normalization and so on. For the Open – domain this is almost a difficult process, but for our specific tourism areas, can through the alias proof, alignment, and digestion method such as domain knowledge, from a technical perspective, this will involve more logical, so partial traditional machine learning method, even the business logic can cover most of the scene.
Knowledge graph Schema is the representation of knowledge classification system. It can also be used for logical reasoning and conflict detection, so as to improve the quality of knowledge graph.
In a word, there is no unified method to construct knowledge graph, because its construction requires a whole set of knowledge engineering methods, including NLP, ML, DL technology, graph database technology, knowledge representation reasoning technology, etc. The construction of knowledge graph is a system engineering, and the update of knowledge is inevitable, so we must pay attention to rapid iteration and rapid output test.
2. Inference of knowledge graph
In the process of knowledge graph construction, there are still many relationship completion problems. Although a common knowledge graph may contain millions of entities and hundreds of millions of relational facts, it is far from completion. The completion of knowledge graph is to predict the relationship between entities through the existing knowledge graph, which is an important supplement to relationship extraction. Traditional approaches TransE and TransH build entity and relationship embedding by using relationships as A translation from entity A to entity B, but these models simply assume that entities and relationships are in the same semantic space. In fact, an entity is a complex of attributes, and different relationships focus on different attributes of the entity, so it is not enough to model them in a single space.
Therefore, we try to use TransR to project entities and relations into different Spaces respectively, and build entity and relationship embedding in entity space and relationship space. For each tuple (h, R,t), the entities in the solid space are first projected through the Mr Direction relation R to obtain HR and tr, and then hr+r ≈ tr. The projection of a particular relation can make two real bodies really close to each other under the relation, and keep entities that do not have the relation away from each other.
In knowledge graph reasoning, knowledge graph is often expressed as tensor tensor form, which is used to determine unknown facts through tensor factorization. They are often used for link prediction (to determine whether a particular relationship exists between two entities), entity classification (to determine the semantic category of entities), and entity resolution (to identify and combine different names that refer to the same entity).
Common models are RESCAL model and TRESCAL model.
The core idea of RESCAL model is to encode the whole knowledge graph into a three-dimensional tensor, from which a core tensor and a factor matrix are decomposed. Each slice of the two-dimensional matrix in the core tensor represents a relationship, and each row in the factor matrix represents an entity. The result restored by the core tensor and factor matrix is regarded as the probability of the corresponding triad. If the probability is greater than a certain threshold, the corresponding triad is correct. Otherwise it’s not correct.
TRESCAL solves the over-fitting problem when the input tensor is highly sparse.
Path-sorting algorithms are also commonly used to determine the possible relationship between two entities, such as the PRA algorithm. This article does not expand the description.
3. Application of large-scale knowledge graph
The application scenarios of knowledge graph are very wide, such as search, q&A, recommendation system, anti-fraud, inconsistency verification, exception analysis, customer management, etc. As more and more deep learning models appear in the application of the above scenarios, this paper mainly discusses the application of knowledge graph in deep learning model.
At present, there are mainly two ways to apply knowledge graph to deep learning. One is to input semantic information of knowledge graph into deep learning model and represent discrete knowledge as continuous vector, so that the prior knowledge of knowledge graph can be regarded as the input of deep learning. The other is to use knowledge as the constraint of the optimization goal to guide the learning process of the deep learning model, which usually represents the knowledge in the knowledge graph as a posterior regular term of the optimization goal.
Representation learning of knowledge graph is used to learn the vectorized representation of entities and relations. The key is to reasonably define the loss function FR (H, T) of the facts (triplet H, R,t) in the knowledge graph, whose sum is the vectorized representation of the two entities H and T of the triplet. In general, when facts h,r,t are true, it is expected to minimize FR (h,t).
Common models are based on distance and translation.
Distance-based models, such as the SE model, have the basic idea that when two entities belong to the same triplet, their vector representations should also be close to each other in the projected space. So the loss function is defined as the distance after the projection of the vector
The matrices Wr1 and Wr2 are used for the projection operation of the head entity H and the tail entity T in the triplet.
Translatation-based models can refer to the aforementioned TransE, TransH, and TransR models. It describes the correlation between entities and relations through vector translation of vector space.
The current knowledge graph representation learning methods still have various problems, and the development of this field is also very rapid, which is worth expecting.
Knowledge map said after conversion, according to the different areas of application, can and all kinds of deep learning model, such as in the field of automatic question answering, can be combined and encoder decoder, the problems and triples matching, namely, computing the vector similarity to a particular problem found from the best triples matching of knowledge map. There are also cases in the recommendation system where the vectorization representation of structured knowledge is obtained through network embedding. Then, the Stacked Denoising auto-encoder (SDAE) and Stacked convolutional auto-encoder (StackedConvolutional auto-encoder) are used to extract text knowledge features and image knowledge features respectively, and the three features are fused into the collaborative ensemble learning framework. Personalized recommendation is realized by the integration of three kinds of knowledge features.
With the wide application of deep learning, how to effectively use a large amount of prior knowledge to greatly reduce the dependence of models on large-scale annotated corpus has gradually become one of the main research directions. The integration of common knowledge and domain knowledge in deep learning model will be another great opportunity and challenge.
Recommended reading:
Evolution of Ctrip operation and maintenance workflow platform
Ctrip ticket big data architecture best practices
Design and implementation of Ctrip open source configuration center Apollo
Ctrip ticket wireless testing technology and efficiency improvement