Data Science Club
Chinese data scientist community
♚
Author: Xu Lin, currently working in Shanghai ViPSHOP Product Technology Center, Columbia University statistical data dog, engaged in data mining & analysis, like to play some different data with R&Python.
\
Foreword: \
With the development of society, more and more TV plays appear on TV screens or video websites. There are many excellent dramas, such as journey to the West and a Dream of Red Mansions, or Nirvana in Fire and Chasing the Dead in the Night, which have received great reviews in recent years. However, there are also some dramas that, for various reasons, are not very satisfactory.
Today, we surveyed nearly 5,000 rated Chinese TV dramas (excluding Those from Hong Kong and Taiwan) on Douban to compare the ratings of the shows and their actors.
\
01
The data source
**** The data sources we climb this time are divided into three parts: list of rated episodes, series scores and other information, and actor information, corresponding to the following three pages:
PART1: List of episodes \
PART2: Show information
****PART3: Actor information
**** By climbing the data of the above three pages, we can obtain the complete TV series score and actor information data, which can be used for data comparison and visualization later. We take the code of climbing the first part as an example to show the overall climbing idea:
driver = webdriver.Chrome() driver.maximize_window() driver.close() driver.switch_to_window(driver.window_handles[0]) url = 'https://movie.douban.com/tag/#/?sort=U&range=2, 10 & E7 tags = % % 94% E5 A7 E8 B5% % % 86% % 89% A7, 9 E5 AD B8 E4% % % % % % b BD E9 A7 A4 E5% % % % % 99% by 86 ' js='window.open("'+url+'")' driver.execute_script(js) driver.close() driver.switch_to_window(driver.window_handles[0]) while True: try: js="var q=document.documentElement.scrollTop=10000000" driver.execute_script(js) driver.find_element_by_class_name('more').click() time.sleep(2) except: break name = [k.text for k in driver.find_elements_by_class_name('title')] score = [k.text for k in driver.find_elements_by_class_name('rate')] url = [k.get_attribute('href') for k in Driver.find_elements_by_class_name ('item')] pd.dataframe ({'name':name,'score':score,'url':url}).to_excel(' XLSX ')Copy the code
\
02
The series contrast
**** We mainly show two parts of data in the episode comparison part, the first is the series score and shooting time of TOP15 and BOTTOM15:
\
\
\
It can be seen that there is a very sharp contrast. Most of the dramas with high ratings were shot in a certain period of time, and they show their unique charm more and more after the test of time. Conversely, some of the lower-rated shows tend to be made in recent years, which may also have something to do with the proliferation of shows. It is necessary to remind that the search for Qin Ji in the list is not Louis Koo’s version, it is the classic of the classics, as for the version in the list, you can have the opportunity to personally understand
Just as there is no such thing as love or hate without a reason, we’ve also picked some interesting comments from Douban about the show. Through comments, let us know the reasons for these scores:
TOP articles:
1
Many years later to watch the discovery, which modeling and character modeling perfect, accurate grasp of the original work, the era of huge influence, the impression is extremely deep. —— Journey to the West
\
2
The old generation of film artists treat a dream of Red Mansions with a pious attitude. After the 87 version, there is no real version of Red Mansions in the world. —— A Dream of Red Mansions
\
3
This is my Bible, my enlightenment —— I Love my family
\
4
Shoes broken hat broken body cassock broken you laugh at me he laughed at me a fan broken classic —— “Living Buddha”
\
5
Is absolutely domestic sitcom beyond the peak! Each of these characters is impossible to replicate! —— My Own Swordsman
\
BOTTOM article:
1
Did not see sweet, but this acting every second is a violent hit —— “sweet violent hit”
\
2
There are still so many crude idol dramas, playing the same old story for decades and still using —— aurora Love
\
3
Glance half set was scared to death, acting with dementia like = = —— “Road run sweetheart”
\
4
The plot is weird, the acting is grandiose, the special effects are crude. A drama fusion of all the above elements, the director is simply the show business hotpot restaurant owner —— “From the Stars of the successors”
\
5
The version that yu Ma takes….. It’s actually pretty good! —— New Swordsman
\
03
Actor contrast
**** We get the ratings of each actor according to the ratings of the episodes he/she participated in, considering the importance of the roles he/she plays, and combining the number of reviews of the episodes with the weighted average. First up, our top rated actors and their year of birth:
Most young readers may not be familiar with many of the actors on the list, so I suggest you check out some of these old dramas to feel their charm. We also compared the scores of those born in the 1980s and 1990s (including those born in the 2000s) to find the best: \
\
Liu haoran leads the rest of the post-90s generation, and considering he’s only in his early 20s, we expect him to bring us more classic works in the future.
Many readers may not be familiar with the TOP20 actors, but you don’t have to, as most of the following list will be familiar to you:
\
Here you will find a familiar feeling, we believe that the list of actors are actually the most promising actors. As long as they work hard, they will be recognized by the audience in the future. We also compare the male and female actors:
\
It should be noted that Shawn Yue and Mark Chao made the list not because of their low ratings, but because of the low ratings of mainland productions they worked on, and we didn’t count Hong Kong and Taiwan dramas this time. We also wish all the actors on this list more exciting performances in the future. \
\
04
The constellation distribution
Thank you Douban for providing us with the actor constellation data, xiaobian around there are many friends who are very interested in constellation, we might as well take a look at the distribution of the constellation:
\
\
It seems that the overall distribution of constellation is relatively average, but Libra and Scorpio slightly more than other constellation, about constellation, as for you believe it or not, anyway xiaobian is not how to believe
For the implementation of rectangular tree graph, you can refer to the following code:
\
from pyecharts import TreeMap star_stat = actor_data.groupby('xingzuo').agg({'name':'count'}).reset_index().sort_values('name' ,ascending=False)[0:12].reset_index() data = [{'value':star_stat['name'][i], 'name':star_stat['xingzuo'][i]+' '+str(star_stat['name'][i])} for i in range(star_stat.shape[0])] treemap = TreeMap(width=1200, height=600) TreeMap. Add (" constellation ", data, is_label_show=True, Label_pos ='inside') treemap.render(' constellation distribution.html ')Copy the code
\
05
Urban distribution
After looking at the constellation distribution, we continue to look at the distribution of actors in cities, to see how many of their hometown actors in the TV series:
\
\
Unsurprisingly, The two central cities of Beijing and Shanghai also have the largest number of actors, while Xiaobian’s hometown Qingdao comes in third. Before every xiaobian mention Qingdao, always mention Qingdao stars, this data more make xiaobian future communication (Chuixu) confidence.
Let’s take a look at the TOP5 stars in each city:
\
Beijing
\
Shanghai
\
Qingdao
Harbin
\
Xi ‘an
\
That’s all the content of this article. Please leave a comment and share your thoughts about the TV series or the actors. We are looking forward to your reply
Python Chinese community as a decentralized global technology community, to become the world’s 200000 Python tribe as the vision, the spirit of Chinese developers currently covered each big mainstream media and collaboration platform, and ali, tencent, baidu, Microsoft, amazon and open China, CSDN industry well-known companies and established wide-ranging connection of the technical community, Have come from more than 10 countries and regions tens of thousands of registered members, members from the Ministry of Public Security, ministry of industry, tsinghua university, Beijing university, Beijing university of posts and telecommunications, the People’s Bank of China, the Chinese Academy of Sciences, cicc, huawei, BAT, represented by Google, Microsoft and other government departments, scientific research institutions, financial institutions, and well-known companies at home and abroad, nearly 200000 developers to focus on the platform.
Python Chinese community public account bottom reply “internal push”
Get a weekly list of technical positions to be promoted
********▼ Click below **** to read the original article and become a free **** community member