NLP: Preprocessing and Emotion Analysis (simple starter code)

I just began to learn NLP, put the code first, some places are not very clear, behind perfect.

(The following code is run in the Jupyter Notebook)

dict1:

Ideas:

Let’s start with re, the regular expression library.

1. Define preprocessor functions:

Start by separating each word and punctuation mark in the sentence with a space:

Text = re. Sub (r "(\ w) ([.,, :!? '\" "\])," r "1 \ \ 2", the text)

Text = re. Sub (r "([.,, :!? '" \ \" (]) (\ w) 2 ", "r" \ \ 1, text)

The re.sub function takes the first argument to find the (first group) (second group) of (\w) ([.,;:!?'” “)]) and replaces the first argument with the second argument.

For example, “y” and “happy.” That’s the combination, and the code splits it into “y.” (with a space in the middle), separating happy and period by a space to separate each element in the sentence. (Same with the second code)

To separate separated elements from Spaces, code:

The re.split function splits the string by the occurrence of patterns, \s being Spaces, and splits once when Spaces are present

Tokens = re.split(r”\s+”,text) \s+ stands for successive Spaces or a space.

And then normalisation:

In this way, text preprocessing is realized.

Then we use a score to indicate whether the sentence is positive or negetive.

Start with a simple dictionary that says +1.0 for happy and -1.0 for sad.

Define a getSentiment function to get an emotional score for a word.

Then use the analyseSentiment function to analyze

words = preProcess(text)

scores = [getSentiment(w) for w in words]

This gives you an emotional score for each word in the text,

And then you sum it up to see whether the text is positive or negetive.

! [](https://p1-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/c161a9146c6f4ea69fbb3681eedacb03~tplv-k3u1fbpfcp-zoom-1.image)

! [](https://p6-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/ad17b993f473495d850b5b5f93670eb8~tplv-k3u1fbpfcp-zoom-1.image)

We can also import the external dictionary:

with open('sentiment.csv', 'rb') as f:

reader = unicodecsv.reader(f, encoding='utf-8')

for line in reader:

sentimentDict[line[0]] = float(line[1])

The sentiment. CSV file is opened in binary and read in ‘UTF-8’ encoding.

The new dictionary is assigned to sentimentDict through the reader.

(To be updated later)

NLP: Preprocessing and Emotion Analysis (simple starter code)

Related Posts

AM Softmax

Handwritten number recognition Based on MATLAB GUI RBM neural network Handwritten number recognition

Garbage Classification System based on Tensorflow