You have material, you have material, wechat search [K classmate ah] pay attention to the blogger who shares dry goods.
In this paper, making github.com/kzbkzb/Pyth… Included, Python, deep learning, and my series of articles.
Hello, I am K
Spring to autumn, time is ticking in the past, I do not know how we do the project
Today, K will share with you a practical case of emotional analysis of online course comments to help you find inspiration.
The data used is a public online course review data, my environment is as follows
- Language: Python3.6.5
- Compiler: Jupyter Notebook
- Gensim version number: 4.0.1
You can configure Gensim using the following statement
PIP install gensim = = 4.0.1 -i pypi.mirrors.ustc.edu.cn/simple/
If you’re interested in an article on natural language processing, maybe 📚 NLP- Example Tutorial has what you need
If you want to find some practical cases related to graduation design for reference, you can find practical cases with source code and data in 📚 “100 cases of deep learning”, deep learning small white suggestions from 📚 “small white introduction to deep learning” this column to learn!
These are not enough for you, you can add my wechat (mTYjkh_) to provide you with some help within your power.
🎯 code + data: public. (K student ah) reply: 002
1. Import data
Load the corpus file and import the data
neg = pd.read_excel('data/neg.xls', header=None)#, index=None
pos = pd.read_excel('data/pos.xls', header=None)#
pos.head()
Copy the code
0 | |
---|---|
0 | Do parents must have liu Yong such mentality, continue to learn, continue to progress, continue to give their own complement fresh blood, so that they maintain a… |
1 | The author really has the rigorous English style, puts forward the opinion, carries on the elaboration argument, although I do not know the physics deeply, but still can feel… |
2 | The authors support their new ideas with lengthy reports of detailed data processing and computational results. Why did Holland once have the highest production in Europe… |
3 | The author used “Hug” right before the war, which is amazing. If Japan had not been defeated, there would have been an American occupation, no bureaucratic delay… |
4 | The author was fond of reading when he was young, and it can be seen that he has read countless classics intensively, so he has a huge inner world. His works are the most valuable… |
Word processing
# jieba participle
word_cut = lambda x: jieba.lcut(x)
pos['words'] = pos[0].apply(word_cut)
neg['words'] = neg[0].apply(word_cut)
pos.head()
Copy the code
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\ADMINI~1\AppData\Local\Temp\jieba.cache
Loading model cost 0.502 seconds.
Prefix dict has been built successfully.
Copy the code
0 | words | |
---|---|---|
0 | Do parents must have liu Yong such mentality, continue to learn, continue to progress, continue to give their own complement fresh blood, so that they maintain a… | Do, parents, must, have, liu Yong, such, mentality, constantly, to learn,… |
1 | The author really has the rigorous English style, puts forward the opinion, carries on the elaboration argument, although I do not know the physics deeply, but still can feel… | [Author, true English, rigorous style, advanced, argued… |
2 | The authors support their new ideas with lengthy reports of detailed data processing and computational results. Why did Holland once have the highest production in Europe… | [Author, lengthy, borrowed, detailed, report, data processing, work, and, computational results, support,… |
3 | The author used “Hug” right before the war, which is amazing. If Japan had not been defeated, there would have been an American occupation, no bureaucratic delay… | [The author used “embrace” before the war… |
4 | The author was fond of reading when he was young, and it can be seen that he has read countless classics intensively, so he has a huge inner world. His works are the most valuable… | [When the author was young, he enjoyed reading, could see, he read intensively, innumerable… |
# Use 1 to represent positive emotion, 0 to represent negative emotion, and complete array stitching
x = np.concatenate((pos['words'], neg['words']))
y = np.concatenate((np.ones(len(pos)), np.zeros(len(neg))))
Copy the code
Two, Word2vec processing
# Train Word2Vec shallow neural network model
w2v = Word2Vec(vector_size=300.# refers to the dimension of the eigenvector, which defaults to 100.
min_count=10) You can truncate the dictionary. Words with a frequency less than min_count are discarded, with a default value of 5.
w2v.build_vocab(x)
w2v.train(x,
total_examples=w2v.corpus_count,
epochs=20)
# Average the word vectors of each sentence
def average_vec(text) :
vec = np.zeros(300).reshape((1.300))
for word in text:
try:
vec += w2v.wv[word].reshape((1.300))
except KeyError:
continue
return vec
Save the word vector as Ndarray
x_vec = np.concatenate([average_vec(z) for z in x])
# Save the Word2Vec model and word vector
w2v.save('data/w2v_model.pkl')
Copy the code
Training support vector machine emotion classification model
model = SVC(kernel='rbf', verbose=True) # Build support vector machine classification model
model.fit(x_vec, y) # Training model
# Save the trained model
joblib.dump(model, 'data/svm_model.pkl')
Copy the code
[LibSVM]
['data/svm_model.pkl']
Copy the code
# Output model cross validation accuracy
print(cross_val_score(model, x_vec, y))
Copy the code
[LibSVM][LibSVM][LibSVM][LibSVM][0.9156598 0.89623312 0.8047856 0.83961147 0.79436153]Copy the code
4. Affective prediction
# Read Word2Vec and perform word vector computations on the new input
def average_vec(words) :
# Read the Word2Vec model
w2v = Word2Vec.load('data/w2v_model.pkl')
vec = np.zeros(300).reshape((1.300))
for word in words:
try:
vec += w2v.wv[word].reshape((1.300))
except KeyError:
continue
return vec
# Make emotional judgments about movie reviews
def svm_predict(string) :
# Comment participle
words = jieba.lcut(str(string))
words_vec = average_vec(words)
# Read support vector machine model
model = joblib.load('data/svm_model.pkl')
result = model.predict(words_vec)
Return positive or negative results in real time
if int(result[0= =])1:
print(string, '[positive]')
return result[0]
else:
print(string, '[negative]')
return result[0]
# Read course review data
df = pd.read_csv("comments.csv", header=0)
comment_sentiment = []
Test with 10 pieces of data
for string in df['Comment content'] [:10]:
result = svm_predict(string)
comment_sentiment.append(result)
# Merge emotional results with original data into new data
merged = pd.concat([df, pd.Series(comment_sentiment, name='User sentiment')], axis=1)
# Save file
pd.DataFrame.to_csv(merged,'comment_sentiment.csv',encoding="utf-8-sig")
print('done.')
Copy the code
[Negative] ABC [Negative] Because the network is not stable, when the network is down, the interface of the laboratory building cannot be clicked and input. [Negative] Very good. [negative] Connect to remote library [negative] Browser can not open [negative] restart [negative] done.Copy the code
Our final predictions are as follows:
Finally, I will send you a copy of the data structure brush notes to help you get the offer from BAT and other first-line big factories. It is written by the big guys of Google and Ali. It is very useful for students who have weak algorithms or need to improve (extraction code: 9GO2) :
Google and Ali big guy’s Leetcode brush notes
As well as the 7K+ open source ebooks I organized, there is always one that can help you 💖 (extract code: 4eg0)
7K+ This open source ebook