Abstract: From the perspective of model and algorithm, this blog introduces emotion analysis model based on statistical method and emotion analysis model based on deep learning respectively.

Sentiment Analysis refers to the process of analyzing, processing and extracting subjective texts with emotional color by using natural language processing and text mining techniques. At present, the research of text sentiment analysis covers many fields including natural language processing, text mining, information retrieval, information extraction, machine learning and ontology, etc., which has attracted the attention of many scholars and research institutions. In recent years, it continues to become one of the hot issues in the field of natural language processing and text mining.

In terms of human subjective cognition, the task of emotion analysis is to answer the question “Who? At what time? To what? Which property? What emotion was expressed?” Therefore, a formal expression of sentiment analysis can be as follows :(entity, aspect, opinion, holder, time). For example: “I find the 2.0T XX car very powerful.” Where it can be converted into a formal tuple is (XX car, power, positive emotion, I, /). It should be noted that the majority of current studies generally do not consider the opinion holder and time of the five elements of sentiment analysis.

Emotion analysis problems can be divided into many sub-domains, and the following mind map shows the sub-tasks of emotion analysis tasks:

The analysis objects of word level and sentence level are positive and negative emotions of a word and the whole sentence respectively. It is equivalent to ignoring the two elements of the five elements, entity and attribute, without distinguishing the specific goals in the sentence, such as entity or attribute. Word-level sentiment analysis, also known as sentiment dictionary construction, studies how to attach emotional information to words. Sentence-level/document-level sentiment analysis studies how to emotionally label entire sentences or documents. However, goal-level sentiment analysis considers specific goals, which can be entities, attributes of an entity or the combination of entities and attributes. It can be divided into three types: Target-grounded aspect based sentiment analysis (TG-ABSA), Target no aspect based sentiment analysis (TN-ABSA), Target aspect based sentiment analysis (T-ABSA). The analysis object of TG-ABSA is the emotion analysis of each attribute under the given attribute set of a certain entity. The analysis object of TN-ABSA is the positive and negative emotions of entities in texts. T-absa analyzes entities and attribute combinations that appear in text. The following table provides examples of emotion analysis tasks with different objectives:

Emotion analysis model based on statistical method

Emotion analysis methods based on statistical methods mainly rely on the established “emotion dictionary”, the establishment of which is the premise and basis of emotion classification. Currently, in practical use, it can be classified into four categories: general emotion words, adverbs of degree, negative words and domain words. In English, it is mainly based on the expansion of the English dictionary WordNet[1]. Hu and Liu[2], on the basis of the manual establishment of the seed adjective vocabulary, judge the emotional tendency of emotional words by using the synonym and synonym relations between words in WorldNet, and judge the emotional polarity of ideas. In Chinese, it is mainly an expansion of Hownet[3]. Zhu Yan-Lan [4] calculated semantic similarity between words and the benchmark sentiment word set by using semantic similarity calculation method, so as to infer the emotional tendency of the words. In addition, a specialized domain dictionary can be established to improve the accuracy of emotional classification. For example, a new online vocabulary dictionary can be established to more accurately grasp the emotional tendency of new words.

Based on the method of emotion dictionary, the text is preprocessed with word segmentation and stop word processing, and then the emotion dictionary is constructed to match the text string, so as to mine the positive and negative information. Its general process is shown in the figure:

In addition to the above mentioned dictionaries, the following [5] supplements other existing Chinese dictionaries for reference:

Of course, you can also train your own emotional lexicon through corpus. After importing sentiment dictionary, we need to use sentiment dictionary text matching algorithm for sentiment analysis. The dictionary-based text matching algorithm is relatively simple. The words in the word segmentation statement are traversed one by one. If the words match the dictionary, the corresponding weight is processed. Positive word weight is plus, negative word weight is subtraction, negative word weight is negative, degree adverb weight is multiplied by the word weight it modifies. Using the weight of the final output, it is possible to distinguish positive, negative or neutral emotions. A typical algorithm flow for sentiment analysis using sentiment dictionary text matching algorithm is as follows [5] :

The emotion analysis model based on statistical method is simple and easy to operate, and has universality and generalization, but it still has the following three main shortcomings:

1 Accuracy is not high

Language is a highly complex thing, and using simple linear superpositions obviously leads to a significant loss of accuracy. Word weights are also variable and hard to pin down.

The dictionary needs to be updated continuously

For new emotional words, such as geili, niubi and so on, the dictionary may not be able to cover. Hence the need to constantly refresh the dictionary to add new words. In the current era of the continuous emergence of online words, if the speed of dictionary updating can not keep up with the speed of new words, the actual use of sentiment analysis will be far from the expected distance. For example, taobao product evaluation, Ele. me takeout evaluation, if the new words cannot be captured, the emotion of analysis will deviate from reality.

3 Difficulty in building dictionaries

The core of emotion classification based on dictionary is emotion dictionary. However, the construction of emotion dictionary requires a strong background knowledge and a deep understanding of the language, so it has great limitations in analyzing foreign languages.

Emotion analysis model based on deep learning

After understanding the advantages and disadvantages of sentiment analysis models based on statistical methods, let’s take a look at how deep learning text classification model conducts text sentiment analysis and classification. One advantage of deep learning is that it can be done end-to-end, eliminating the need for human intervention at every step. Based on the word vector generated by pre-training model, the first important problem that deep learning can solve is the construction of emotion dictionary. In the following, we will take centralized typical text classification model as an example to show the evolution direction and application scenarios of deep text classification model.

2.1 FastText [6]

Model running steps:

2.2 TextCNN [7]

2.3 TextRNN [8]

2.4 TextRNN + Attention [9]

HAN is a Hierarchical Attention network. He divided the texts to be classified into a certain number of sentences and carried out encoder and Attention operations at word level and sentence level respectively, so as to realize the classification of long texts. Compared with the above algorithm model, the structure of HAN is slightly more complex, which can be divided into the following steps.

2.5 TextRCNN [10]

**RCNN**** algorithm process: ** First, bidirectional LSTM is used to learn the context of word, and forward and backward RNN is used to get the forward and backward representation of each word:

The word representation becomes the form of the word vector connected with the front, back, up and below vector:

After that, the same convolution layer as TextCNN is followed by pooling layer, Max pooling layer is carried out in seq_length dimension, and fc operation is carried out to classify. The network can be regarded as an improved version of FastText.

conclusion

From the perspective of model and algorithm, this blog introduces emotion analysis model based on statistical method and emotion analysis model based on deep learning respectively. The emotion analysis model based on statistical method is simple and easy to use, but it has great defects in accuracy, flexibility and generalization. The evolution of deep learning-based models is to capture contextual information through deeper and complex networks, and train neural networks to accomplish this task by means of powerful word vectors generated by pre-training models. The open source repository [13] below details the PyTorch implementation for each model and the comparison over the same Chinese baseline; The following two blog posts [11][12] also give a detailed introduction to other deep learning models for emotion analysis, which can serve as a guide for further exploration.

reference

[1] wordnet.princeton.edu/

[2] HU M, LIU B. Journal of Food planning and Development [C]. NY, USA:Proceedings of Knowledge Discoveryand DA-Ta Mining, 2004: 168-177.

[3] languageresources. Making. IO / 2018/03/07 /

%E9%87%91%E5%A4%A9%E5%8D%8E_Hownet/

[4] Zhu Yanlan, Min Jin, ZHOU Yaqian, et al. Semantic Orientation Calculation based on How Net [J]. Journal of Chinese information science, 2006,20 (1): 14-20

[5] blog.csdn.net/weixin\_416…

details/93163519

[6] arxiv.org/abs/1612.03…

[7] arxiv.org/abs/1408.58…

[8] www.ijcai.org/Proceedings…

[9] www.aclweb.org/anthology/P…

[10] zhengyima.com/my/pdfs/Tex…

[11] zhuanlan.zhihu.com/p/76003775

[12] zhuanlan.zhihu.com/p/73176084

[13] github.com/649453932/C…

This article is shared from huawei cloud community “NLP column | Emotion Analysis Method Introduction”, the original author: quite suddenly.

Click to follow, the first time to learn about Huawei cloud fresh technology ~