preface

For “Da Qin Fu”, I believe that everyone will not be unfamiliar, directed by Yan Yi, Zhang Luyi starred in the historical costume drama.

The drama tells the story of The first Emperor of Qin, Ying Zheng, with the help of Lu Buwei, Li Si and Wang Jian, destroying six states and unifying the world, establishing the first unified centralized state in Chinese history. In the late Warring States period, China, which had been in turmoil for more than 500 years, was still suffering from war. At that time, the six states were weak and the Qin state was strong. Merchant Lv Buwei with the quality of zhao won different people fled to qin, the young emperor Ying Zheng was abandoned to stay in Handan, suffering from life and death, also witnessed the war to bring people’s pain and despair, the heart of the world coags a ambition from this. Since then Ying Zheng qin, in xianyang political whirlpool experience spiritual pain, metamorphosis into a real king. To seize the opportunity and strength of Lao Ai and realize her dream of unification, she devised and acted secretly, and finally calmed down the Lao Ai turmoil, removed the prime minister and regained the royal power. Later, they uprooted the imperial restoration forces and cleared the way for the east to go out and destroy the state. Under the assistance of Li Si, Wang Jian, Meng Tian and other civil servants and military generals, the six states were wiped out and the first unified centralized state was established in Chinese history.

My topic today, of course, is not history, but reptiles.

Now enter the theme……

Demand analysis

“Daqin Fu” this drama is still relatively hot in a TV series recently. As a person who likes to write crawlers, it is better to get comments from the audience and do data analysis, so today we will come to douban to get short comments from the audience.

The trend of comments over time

Generally speaking, a TV series is popular in the first few days, so there will be a lot of audience comments. But when the show is not as good as you expected, then the audience will not want to watch it, since the audience don’t watch, do you think there will be reviews?

Trends in the number of comments in 24 hours

This is mainly to analyze the time range of the audience watching this play. Generally after watching the comments or while watching the comments, there will be no big difference in time, of course, this is the practice of most people, there should be very few people in a few days before the story of the comments on it. Who’s gonna go through the front stuff.

Audience comments on the play

Just like buying things on Taobao, if you buy a product with a very high price ratio, it is very affordable. Then I’m sure you won’t be stingy with the 5 stars. Of course, if you think my article is good, you won’t be stingy with the likes.

Major comments from the audience

Here is mainly to see what kind of people or things the audience mentioned more, convenient for further analysis.

Implementation of data acquisition

One cannot make bricks without straw.

Similarly, data is the basis of data analysis, and you can brag about it without data.

What data to get

From the requirements analysis above, we now need to obtain three data: ‘stars’,’ reviews’, and ‘review time’.

Web analytics

Open the developer tool and use the selector to quickly locate the data.

The core code gets reviews, stars, and times below.

    def get_info(self) :
        html = etree.HTML(self.login())
        time = html.xpath('//div[@class="comment"]/h3/span/span[3]/@title')
        star = html.xpath('//div[@class="comment"]/h3/span/span[2]/@title')
        content = html.xpath('//p[@class=" comment-content"]/span/text()')
        content = [i.replace('\n'.' ') for i in content]
        df = pd.DataFrame(
            {'content_time': time,
             'star': star,
             'comment-content': content
    }
        )
        return df
Copy the code

About the climb

After testing, there are only user-agent and cookies on this side. You only need to add these two messages to headers.

So for this anti – crawl means or relatively easy to solve. So here we can change the writing slightly and use session to save cookies. The core code is as follows:

    def login(self) :
        data = {
            'ck': ' '.'remember': 'true'.'name': '18218138350'.'password': '698350As? ',
        }
        self.session.post(self.login_url, data=data)
Copy the code

Flip up to take

If you look at short comments on the web, you will find that there are not just one page of comments, but several pages. But by running the code above you can see that you only get the first page of data.

# # the first page of https://movie.douban.com/subject/26413293/comments?start=0&limit=20&status=P&sort=new_score page two https://movie.douban.com/subject/26413293/comments?start=20&limit=20&status=P&sort=new_score page # 3 https://movie.douban.com/subject/26413293/comments?start=40&limit=20&status=P&sort=new_scoreCopy the code

By looking at the link above you can see that only the start parameter is different, and it is a multiple of 20. Just change the start parameter value regularly to achieve the effect of turning the page.

But there is a problem here, when start=480, it can not continue to turn the page, so this is also one of the means of douban reverse crawling, interested masters can try their own.

The core code is as follows:

    def get_content_url(self, i) :
        url = f'https://movie.douban.com/subject/26413293/comments?start={i}&limit=20&status=P&sort=new_score'
        return url
    
    if __name__ == '__main__':
    douban = Douban()
    douban.login()
    df = pd.DataFrame(columns=['content_time'.'star'.'comment-content'])
    for i in range(25) :print(F prime is printing number one{i+1}Page ')
        url = douban.get_content_url(i*20)
        df1 = douban.get_info(url)
        df = pd.concat([df, df1])
        time.sleep(3)
        df = df.reset_index(drop=True)
    df.to_csv('.. /data/conment-content_all.csv', encoding='utf-8-sig')
    print('Obtain success')
Copy the code

By testing the above code, you’ve got all the data down.

Now it’s time to get into the data analysis.

Data analysis implementation

Number of comments over time

From the graph above, you can see that from the first five days of the show, the number of comments was relatively high, but by the sixth day, the number of comments plummeted, which shows that the popularity of the show is not high.

Change in number of comments over 24 hours

An analysis of the change in the number of reviews over a 24-hour period shows that viewers peak between 19:00 and 24:00. I can watch the drama at 19 o ‘clock, think I am also very envious.

grading

As you can see, the ratings are terrible.

Since the premiere, the reputation of Daqin Fu has plummeted, from 9.3, 8.5, 7.6 and now 6.1, the high expectations before the release seems to have been a story in ancient times.

Personally, I don’t like Daqin fu. Trilogy the taste a bit by bit daqin empire took old audience on daqin endowed with loss is normal lines – to curry favor with the younger generation will be straightforward, in order to attract more audiences to join a large number of “GongDou emotional drama”, in order to ying zheng shape out of the path followed by a previous without, in the plot and people have big effort; These attempts to adapt to the market are tantamount to betrayal in the original audience. Especially for me, a fan who has transferred from the original work of Sun Haohui to the film and TV series, this dissatisfaction seems particularly well founded.

Comment on the content

The last

This is the end of the sharing, if you read here, then it means that this article is still helpful to you, so I also hope readers can give me a like, comment and forward, I would appreciate it.

The way ahead is so long without ending, yet high and low I’ll search with my will unbending.

I am book-learning, a person who concentrates on learning. The more you know, the more you don’t know.

See you next time for more exciting content!