(The following code is run in the Jupyter Notebook)
dict1:
Ideas:
Let’s start with re, the regular expression library.
1. Define preprocessor functions:
Start by separating each word and punctuation mark in the sentence with a space:
Text = re. Sub (r "(\ w) ([.,, :!? '\" "\])," r "1 \ \ 2", the text)
Text = re. Sub (r "([.,, :!? '" \ \" (]) (\ w) 2 ", "r" \ \ 1, text)
The re.sub function takes the first argument to find the (first group) (second group) of (\w) ([.,;:!?'” “)]) and replaces the first argument with the second argument.
For example, “y” and “happy.” That’s the combination, and the code splits it into “y.” (with a space in the middle), separating happy and period by a space to separate each element in the sentence. (Same with the second code)
To separate separated elements from Spaces, code:
The re.split function splits the string by the occurrence of patterns, \s being Spaces, and splits once when Spaces are present
Tokens = re.split(r”\s+”,text) \s+ stands for successive Spaces or a space.
And then normalisation:
In this way, text preprocessing is realized.
Then we use a score to indicate whether the sentence is positive or negetive.
Start with a simple dictionary that says +1.0 for happy and -1.0 for sad.
Define a getSentiment function to get an emotional score for a word.
Then use the analyseSentiment function to analyze
words = preProcess(text)
scores = [getSentiment(w) for w in words]
This gives you an emotional score for each word in the text,
And then you sum it up to see whether the text is positive or negetive.
We can also import the external dictionary:
with open('sentiment.csv', 'rb') as f:
reader = unicodecsv.reader(f, encoding='utf-8')
for line in reader:
sentimentDict[line[0]] = float(line[1])
The sentiment. CSV file is opened in binary and read in ‘UTF-8’ encoding.
The new dictionary is assigned to sentimentDict through the reader.
(To be updated later)