Hello, I am K

Spring to autumn, time is ticking in the past, I do not know how we do the project

Today, K will share with you a practical case of emotional analysis of online course comments to help you find inspiration.

The data used is a public online course review data, my environment is as follows

  • Language: Python3.6.5
  • Compiler: Jupyter Notebook
  • Gensim version number: 4.0.1

You can configure Gensim using the following statement

PIP install gensim = = 4.0.1 -i pypi.mirrors.ustc.edu.cn/simple/

If you’re interested in an article on natural language processing, maybe 📚 NLP- Example Tutorial has what you need

If you want to find some practical cases related to graduation design for reference, you can find practical cases with source code and data in 📚 “100 cases of deep learning”, deep learning small white suggestions from 📚 “small white introduction to deep learning” this column to learn!

These are not enough for you, you can add my wechat (mTYjkh_) to provide you with some help within your power.

🎯 code + data see the end of the article, to get to the point

1. Import data


Load the corpus file and import the data
neg = pd.read_excel('data/neg.xls', header=None)#, index=None
pos = pd.read_excel('data/pos.xls', header=None)#

pos.head()
Copy the code
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

0
0 Do parents must have liu Yong such mentality, continue to learn, continue to progress, continue to give their own complement fresh blood, so that they maintain a…
1 The author really has the rigorous English style, puts forward the opinion, carries on the elaboration argument, although I do not know the physics deeply, but still can feel…
2 The authors support their new ideas with lengthy reports of detailed data processing and computational results. Why did Holland once have the highest production in Europe…
3 The author used “Hug” right before the war, which is amazing. If Japan had not been defeated, there would have been an American occupation, no bureaucratic delay…
4 The author was fond of reading when he was young, and it can be seen that he has read countless classics intensively, so he has a huge inner world. His works are the most valuable…

Word processing

# jieba participle
word_cut = lambda x: jieba.lcut(x)
pos['words'] = pos[0].apply(word_cut)
neg['words'] = neg[0].apply(word_cut)

pos.head()
Copy the code
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\ADMINI~1\AppData\Local\Temp\jieba.cache
Loading model cost 0.502 seconds.
Prefix dict has been built successfully.
Copy the code
0 words
0 Do parents must have liu Yong such mentality, continue to learn, continue to progress, continue to give their own complement fresh blood, so that they maintain a… Do, parents, must, have, liu Yong, such, mentality, constantly, to learn,…
1 The author really has the rigorous English style, puts forward the opinion, carries on the elaboration argument, although I do not know the physics deeply, but still can feel… [Author, true English, rigorous style, advanced, argued…
2 The authors support their new ideas with lengthy reports of detailed data processing and computational results. Why did Holland once have the highest production in Europe… [Author, lengthy, borrowed, detailed, report, data processing, work, and, computational results, support,…
3 The author used “Hug” right before the war, which is amazing. If Japan had not been defeated, there would have been an American occupation, no bureaucratic delay… [The author used “embrace” before the war…
4 The author was fond of reading when he was young, and it can be seen that he has read countless classics intensively, so he has a huge inner world. His works are the most valuable… [When the author was young, he enjoyed reading, could see, he read intensively, innumerable…
# Use 1 to represent positive emotion, 0 to represent negative emotion, and complete array stitching
x = np.concatenate((pos['words'], neg['words']))
y = np.concatenate((np.ones(len(pos)), np.zeros(len(neg))))
Copy the code

Two, Word2vec processing

# Train Word2Vec shallow neural network model
w2v = Word2Vec(vector_size=300.# refers to the dimension of the eigenvector, which defaults to 100.
               min_count=10)     You can truncate the dictionary. Words with a frequency less than min_count are discarded, with a default value of 5.

w2v.build_vocab(x)
w2v.train(x,                         
          total_examples=w2v.corpus_count, 
          epochs=20)

# Average the word vectors of each sentence
def average_vec(text) :
    vec = np.zeros(300).reshape((1.300))
    for word in text:
        try:
            vec += w2v.wv[word].reshape((1.300))
        except KeyError:
            continue
    return vec

Save the word vector as Ndarray
x_vec = np.concatenate([average_vec(z) for z in x])

# Save the Word2Vec model and word vector
w2v.save('data/w2v_model.pkl')
Copy the code

Training support vector machine emotion classification model

model = SVC(kernel='rbf', verbose=True) # Build support vector machine classification model
model.fit(x_vec, y) # Training model

# Save the trained model
joblib.dump(model, 'data/svm_model.pkl')
Copy the code
[LibSVM]

['data/svm_model.pkl']
Copy the code
# Output model cross validation accuracy
print(cross_val_score(model, x_vec, y))
Copy the code
[LibSVM][LibSVM][LibSVM][LibSVM][0.9156598 0.89623312 0.8047856 0.83961147 0.79436153]Copy the code

4. Affective prediction

# Read Word2Vec and perform word vector computations on the new input
def average_vec(words) :
    # Read the Word2Vec model
    w2v = Word2Vec.load('data/w2v_model.pkl')
    vec = np.zeros(300).reshape((1.300))
    for word in words:
        try:
            vec += w2v.wv[word].reshape((1.300))
        except KeyError:
            continue
    return vec

# Make emotional judgments about movie reviews
def svm_predict(string) :

    # Comment participle
    words = jieba.lcut(str(string))
    words_vec = average_vec(words)
    # Read support vector machine model
    model = joblib.load('data/svm_model.pkl')

    result = model.predict(words_vec)

    Return positive or negative results in real time
    if int(result[0= =])1:
        print(string, '[positive]')
        return result[0]
    else:
        print(string, '[negative]')
        return result[0]


# Read course review data
df = pd.read_csv("comments.csv", header=0)
comment_sentiment = []

Test with 10 pieces of data
for string in df['Comment content'] [:10]:
    result = svm_predict(string)
    comment_sentiment.append(result)

# Merge emotional results with original data into new data
merged = pd.concat([df, pd.Series(comment_sentiment, name='User sentiment')], axis=1)
# Save file
pd.DataFrame.to_csv(merged,'comment_sentiment.csv',encoding="utf-8-sig")
print('done.')
Copy the code
[Negative] ABC [Negative] Because the network is not stable, when the network is down, the interface of the laboratory building cannot be clicked and input. [Negative] Very good. [negative] Connect to remote library [negative] Browser can not open [negative] restart [negative] done.Copy the code

Our final predictions are as follows:


Finally, I will send you a copy of the data structure brush notes to help you get the offer from BAT and other first-line big factories. It is written by the big guys of Google and Ali. It is very useful for students who have weak algorithms or need to improve (extraction code: 9GO2) :

Google and Ali big guy’s Leetcode brush notes

As well as the 7K+ open source ebooks I organized, there is always one that can help you 💖 (extract code: 4eg0)

7K+ This open source ebook