♚ \
Author: Xu Lin, currently working in Shanghai ViPSHOP Product Technology Center, Columbia University statistical data dog, engaged in data mining & analysis, like to play some different data with R&Python.
preface
\
With the Spring Festival approaching, you must have started to plan how to spend the happy Spring Festival holiday. Family reunion, visiting relatives and friends, and talking and fighting with gay friends whom I haven’t seen for a long time are all happy thoughts.
\
In addition, the annual Spring Festival movies will meet with you as promised. Many classic movies are born in the Spring Festival, and the 2019 Spring Festival movies are also packed with excellent movies, known as “the strongest Spring Festival in history”. Today, we will take you to interpret the most worthy movies with data.
\
Data acquisition
\
Our data this time mainly comes from Maoyan, and part of it is real-time pre-sale box office data of Maoyan:
\
\
This data can be obtained through Selenium with the following code:
\
driver = webdriver.Chrome()
driver.maximize_window()
driver.close()
driver.switch_to_window(driver.window_handles[0])
url = 'https://piaofang.maoyan.com/dashboard?date=2019-02-05'
js='window.open("'+url+'")'
driver.execute_script(js)
driver.close()
driver.switch_to_window(driver.window_handles[0])
Copy the code
\
The other part of the data comes from maoyan’s audience comments. Since the film has not been released yet, the scores given by audience comments indicate their expectations for the film. It should be noted that many viewers did not give scores in the comments, which will be directly displayed as 0, which needs to be excluded in the subsequent calculation. The data is as follows:
\
\
The acquisition method of this part of data can refer to the previous 3 days 900 million! I’m going to crawl through tens of thousands of reviews to read “The Richest Man in Hong Kong” and predict the box office, and we’re going to skip the crawl code
\
Advance sale
\
An important way to measure the attention of a film is to look at the first-day pre-sales. We selected eight of the most important Spring Festival films for comparison, with the following code: \
\
p<-ggplot(data[order(data$sale,decreasing = T),][1:8,], Aes (x = the reorder (name, sale), y = sale, the fill = name)) + geom_bar (stat = 'identity', width = 0.5) + Geom_image (aes (x = name, y = 0, image = image), size = 0.08) + geom_text (aes (x = name, y = 2500, label = label_sale), size = 3,col='black',fontface='bold')+ theme_economist()+ scale_fill_tableau()+ theme(axis.text. X = Y = element__blank (), plot.title = element_text(hjust=0.5,size=30), panel.grid = element_blank(), legend.position = 'none', panel.background = element_blank(), axis.title = element_blank(), axis.line = element_blank(), Axis. Ticks = element_blank() +coord_flip()+ylim(0,6500) ggsave(" 新 年 中 的 新 年 中 的 新 年 中 的 新 年 中 的 新 年 中 的 新 年.Copy the code
\
Take a look at the final result:
\
\
At present, the top three pre-sales are comedy themes, it seems that in the Spring Festival, we still want to be able to relax, read the light theme of the film review. However, pre-sale ticket sales cannot fully predict the final box office trend, and it can be seen that “Earth’s Last Night” and “Iapartment” failed after their release.
\
The top two films have Shen Teng’s participation, it seems that shen Teng’s box office recognition is still good, I hope the two films will eventually achieve good results. \
\
Judging from the pre-sale box office, “Honest Against Corruption” and “Detective Pu Songling” are both in danger of being thrown out of the market. Given the recent somewhat lackluster performance of Hong Kong films, we hope these two films can bring some surprises.
\
Pre-release buzz
\
In addition, we also take a look at the audience’s overall evaluation of the film before its release, which will also reflect the audience’s expectation value to a certain extent:
\
The code is as follows:
\
p<-ggplot(data[order(data$score,decreasing = T),][1:8,], Aes (x = the reorder (name, score), y = score, the fill = name)) + geom_bar (stat = 'identity', width = 0.5) + Geom_image (aes (x = name, y = 0, image = image), size = 0.08) + geom_text (aes (x = name, y = 2, label = label_score), size = 7,col='black',fontface='bold')+ ggtitle(' Spring Festival movie preview ')+ theme_wsj()+ scale_fill_tableau()+ theme(axis.text. X = Y = element__blank (), plot.title = element_text(hjust=0.5,size=30), panel.grid = element_blank(), legend.position = 'none', panel.background = element_blank(), axis.title = element_blank(), axis.line = element_blank(), Axis. Ticks = element_blank())+coord_flip()+ylim(0,5) ggsave(" 中 春 中 中 中 中 总 有 关 键 盘. PNG ", p, width = 8, height = 12)Copy the code
\
Take a look at the final result:
\
\
“Boonie the Bears” was a surprise no. 1 in the ratings“, which to some extent also raises the author’s expectations for this film, although the author’s age should not be suitable to see this film“, “Peppa Pig” with the previous super marketing, the success of everyone’s main, however, the final performance, or need to be tested by the audience.
\
At the same time, we saw that “Detective Pu Songling” led by Jackie Chan lagged behind other films in the score, it seems that people have more reservations about this kind of films, we also look forward to the release of reputation can be reversed.
\
The film to watch
\
We finally dig through the comments before the release of people’s more attention points, mainly adopts jieba participles, it is important to note that we need to add some custom before word segmentation dictionary, “Huang Jingyu”, for example, if you don’t add the custom dictionary, can be divided into “Huang Jing”, then we will according to the word frequency extract important keywords:
\
def key_words(df): comment_str = ' '.join(df) words_list = [] jieba.load_userdict('spring_film_dict.txt') word_generator = Jieba. cut(comment_str) # return an iteration f.close() # stopWords words_list.append(word) words_list = Counter([k for k in words_list if len(k)>1]) return list(dict(words_list.most_common(30)).keys())Copy the code
\
Finally, we selected five words in each film that can reflect its hotspot and visualized them:
\
\
We select some interesting points of view combination, a deep (suixing) interpretation: \
\
“Detective Pu Songling” : everyone is looking forward to Jackie Chan’s performance, although there are a lot of people ahead of time to its dozen on the “bad film” label, but still looking forward to the film reputation overturn. There will also be a lot of attention paid to special effects, and I wonder if there will be any homage to the 50 cent “Duang” special effects.
\
“Fast Life” “Crazy Aliens” : feel Shen Teng has contracted this year’s Spring Festival trend, pre-sale ranking of the top two films, the audience is most concerned about shen Teng, looking forward to Shen Teng Spring Festival screen at the same time can also harvest good reputation. Fans of Shen Teng, who has to deal with aliens and experience fast life, can enjoy a feast for the eyes during the Spring Festival.
\
Peppa Pig’s Chinese New Year: a kid-friendly film and a highly discussed promo, let’s hope it doesn’t follow in the footsteps of last Night on Earth, which saw its critical collapse.
\
The King of Comedy: Obviously most of the audience’s expectations for this film came from Stephen Chow, and with the classic king of Comedy in the lead, it’s inevitable that The King of Comedy will be compared to the original, and we’re looking forward to a classic movie.
\
Finally, I wish you all a happy New Year and a happy reunion with your family and a wonderful movie! At the same time, you can also interact with us in the comments area and share the movies you will choose to watch during the Spring Festival.
* * * *
Long press to follow the public account below,
Reply Spring Festival to obtain the full source of this article
\
§ § \
Python Chinese community as a decentralized global technology community, to become the world’s 200000 Python tribe as the vision, the spirit of Chinese developers currently covered each big mainstream media and collaboration platform, and ali, tencent, baidu, Microsoft, amazon and open China, CSDN industry well-known companies and established wide-ranging connection of the technical community, Have come from more than 10 countries and regions tens of thousands of registered members, members from the Ministry of Public Security, ministry of industry, tsinghua university, Beijing university, Beijing university of posts and telecommunications, the People’s Bank of China, the Chinese Academy of Sciences, cicc, huawei, BAT, represented by Google, Microsoft and other government departments, scientific research institutions, financial institutions, and well-known companies at home and abroad, nearly 200000 developers to focus on the platform.
\
More recommended
\
Hi! Use Python to create a doodle emoji \
\
170 lines of code to crawl white Snake: The Origin of the short review data \
\
I analyzed the barrage of station B and saw what was peppa \
\
Hidden Markov model (HMM) and Viterbi algorithm \
\
Email: [email protected]
\
**** Free membership of the Data Science Club ****