[Big Data Tribe] R language conducts text sentiment analysis on Twitter data

Original link:tecdat.cn/?p=4012

Original source:Tuo End number according to the tribe public number

Taking the Twitter data captured by R language as an example, we conducted text mining on the data and further carried out sentiment analysis, thus obtaining a lot of interesting information.

Find tweets that are from aN iPhone or Android phone, and eliminate tweets from other sources.

tweets <-tweets_df>%select(id, statusSource, text, created) %>% extract(statusSource, "source", "Twitter for (.*?) <")>%filter(source %in%c("iPhone", "Android"))Copy the code

The e data are visualized to calculate the proportion of tweets corresponding to different times.

And compare the number of tweets on an Android phone and an iPhone.

As we can see from the comparison chart, there is a significant difference between Android and iPhone tweet times. Android phones tend to tweet between 5pm and 10pm, while iphones tend to tweet between 10pm and 20pm. We can also see that android phones post a higher percentage of tweets than iphones.

It then looked to see if there were any references in the tweets and compared the number across different platforms.

ggplot(aes(source, n, fill = quoted)) +
geom_bar(stat ="identity", position ="dodge") +
labs(x ="", y ="Number of tweets", fill ="") 
Copy the code

The percentage of Android phones not cited was significantly lower than that of Apple phones. The number of android phone citations is significantly higher than apple’s. Therefore, it can be assumed that most tweets from iphones are original, while most tweets from Android phones are quotes.

Then look at tweets for links or images and compare them across platforms.

ggplot(tweet_picture_counts, aes(source, n, fill = picture)) +
geom_bar(stat ="identity", position ="dodge") +
labs(x ="",
Copy the code

From the comparison above, we can see that android phones do not have images or links more often than Apple phones. That is to say, iPhone users tend to post photos or links when tweeting.

It can also be seen that Android users generally do not use images or links in their tweets, while iPhone users do the opposite.

spr <-tweet_picture_counts>%spread(source, n) %>% mutate_each(funs(. /sum(.) ), Android, iPhone) rr <-spr$iPhone[2] /spr$Android[2]Copy the code

Then we detect the abnormal characters in the tweets and delete them. Then we find the keywords in the tweets and sort them by number

reg <- "([^A-Za-z\\d#@']|'(? ! [A-Za-z\\d#@])) "tweet_words <-tweets>%filter(! str_detect(text, '^"')) %>%m utate(text =str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&", "")) %>% unnest_tokens(word, text, token ="regex", pattern = reg) %>% filter(! word %in%stop_words$word,str_detect(word, "[a-z]")) tweet_words %>%count(word, sort =TRUE) %>%head(20) %>% mutate(word =reorder(word, n)) %>%ggplot(aes(word, n)) +geom_bCopy the code

Sentiment analysis was performed on the data, and the relative influence ratio of Android and iPhone was calculated.

The sentiment ratio of different platforms is calculated and visualized through the sentiment tendency of feature words.

After counting the number of words with different emotional tendencies, plot their confidence intervals. As you can see from the graph above, Android phones report the most negative emotions compared to iphones, followed by disgust and then sadness. There is little tendency to express positive emotions.

Then we counted the number of keywords that appeared in each emotion category.

android_iphone_ratios %>%inner_join(nrc, by ="word") %>% filter(! sentiment %in%c("positive", "negative")) %>% mutate(sentiment =reorder(sentiment, -logratio),word =reorder(word, -logratio)) %>%Copy the code

As can be seen from the results, most of the negative words appeared on Android phones, while the number of negative words appeared on apple phones was much smaller than the number of negative words on Android platforms.

Most welcome insight

1. Data side of data job demand

2. Research hot spots of big data journal articles

3. Machine learning boosts accurate sales forecast of fast fashion

4. Machine learning to identify changing stock market conditions — the application of hidden Markov model (HMM)

5. Data inventory: new trend of online consumption of home appliances

6. Use GAM (Generalized additive Model) in R language for power load time series analysis

7. Genetic Exploration of Hupu Forum: Insight into community user behavior data

8. Take the pulse of taxi data

9. Data introduction of smart door lock “Cutting hands

[Big Data Tribe] R language conducts text sentiment analysis on Twitter data

Original link:tecdat.cn/?p=4012

Original source:Tuo End number according to the tribe public number

Related Posts

Easy to ignore test points and lessons learned from my own testing process – constantly updated

A whole personal homepage based on Gitee Pages (testing the waters)

Interface test platform code implementation 66: multi-interface use case -6