(a)

This summer, the weather in Chongqing is out of the normal, until now fast June, the weather is still very cool. It used to be 40° high temperature in Chongqing. Walking on the street for ten minutes was like running ten laps in the playground. Now when I go out, I have to wear another coat. I am afraid that I will catch a cold when I am alone in a foreign land.

The cool weather did not bring peace of mind. The young city of Chongqing, just like its nickname “furnace”, always exudes its vitality.

(2)

This day small crazy as usual, began to get up early to take two intersection of the light rail. Xiao Chi found a seat to sit down, is ready to take out the mobile phone, see today all happened fun. But then there was a noise not far away. It turned out to be three girls and another boy, arguing. Boys a MMP, Bao Batch dragon, very disgusting people. And the girl also dare not show weakness, said a fluent Chongqing dialect, to fight back.

(3)

He has no idea why they are fighting. But the scene reminded him of a movie he had recently seen called blood Guanyin. It’s also about three women, but it’s even more brutal.

(4)

“Blood guanyin” description is in the tang mansion that the woman forms, there are 3 different generation but the female that understands popular feeling like, by Tang madam (Hui Yinghong is acted the role of) host general situation, shuttle between powerful person, recant superb wrist and soft figure, take profit in the survival in complex political business relation; Individual character is like the eldest daughter Tang ning like hedgehog (Wu Ke Xi is acted the role of) to beg mother affirmation, try to cooperate; Clever little daughter Tang is true (Wen Qi is acted the role of) mostly silent observation, only mother life is from. Until one of these days, close friends of the general family were killed event broke out, 3 people were involved in them respectively, all along with the general situation as heavy lady of the general situation, to guard everything, try hard, let 3 people move toward different fate however.

This is a synopsis of the plot given by Douban. Here is a short comment on Blood Avalokitesvara (Blood Avalokitesvara) that can be crawled using Python crawlers, as well as other movies by changing the URL.

Douban film review reptilian

The crawler uses Requests + regular expressions to crawl. Another reader asked me backstage yesterday if there was a program like Douban review crawler. Because she found in the Internet can not use, just recently review the use of regular expressions, incidentally wrote this program. (Mostly because of girls)

Program fetching result

After running the program, the User (User name), Time (release date), and Content (comment result) will be saved to CSV files.

Program structure

The program has three main functions, get_one(), parse_page(), and write_to_file().

The program logic

First, analyze the target website and open the detailed page of Douban blood Avalokitesvara. Press F12 in Chrome to analyze the page.

As you can see from the figure, GET is used for the request, along with some information about the request header. In the program I set up user-agent for UA, the request header. The UA is set up to prevent target sites from identifying our program as a crawler that cannot be accessed.

The data request is made through the get_one() function, which returns the source code of the request. The next step is to extract the data.

Through the structure analysis of the web page, it is easy to find the data we need. For example, in the figure above, the user name information is stored in the a tag. Corresponding we can use regular expression to obtain. Of course, you can also use other libraries for fetching, such as xpath and BeatifulSoup.

Once you get the data, you save the data in a file in the appropriate format. This program, I saved into a CSV file.

Data request

Field to extract

Data storage

I have uploaded the complete code to the public number “Crazy Sea”, the need for students to reply to the background “Douban” can be downloaded.

The public account shares Python dry stuff every day