The text and pictures in this article come from the network, only for learning, exchange, do not have any commercial purposes, copyright belongs to the original author, if you have any questions, please contact us to deal with

The following article comes from Tencent Cloud author: user 7760819

(Want to learn Python? Python Learning exchange group: 1039649593, to meet your needs, materials have been uploaded to the group file stream, you can download! There is also a huge amount of new 2020Python learning material.)

Python crawler – 800 word cloud

An overview of the

Douban eight hundred short comments on reptiles

Train of thought

Use re to parse web pages and get data to draw word clouds using Wordcloud

code

# Import requests import re import CSV import jieba import wordCloud # Implement multi-page crawler via loop # Observe page link rules # https://movie.douban.com/subject/26754233/comments?start=0&limit=20&sort=new_score&status=P # https://movie.douban.com/subject/26754233/comments?start=20&limit=20&sort=new_score&status=P # https://movie.douban.com/subject/26754233/comments?start=40&limit=20&sort=new_score&status=P # https://movie.douban.com/subject/26754233/comments?start=60&limit=20&sort=new_score&status=P # 20 from 0 to each page, so setting step, Page =[] for range(0,80,20): page.append(i) with open CSV ','a',newline= ",encoding=' UTF-8 ') as (r'd :\360MoveData\Users\cmusunqi\Documents\GitHub\R_and_python\python =' utF-8 ') as (r'd :\360MoveData\Users\cmusunqi\Documents\GitHub\R_and_python\python\  f: for i in page: url='https://movie.douban.com/subject/26754233/comments?start='+str(i)+'&limit=20&sort=new_score&status=P' headers={ 'user-agent ': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; Rv :80.0) Gecko/20100101 Firefox/80.0'} resp=requests. Get (URL,headers=headers) HTML =resp.text Res =re.compile('<span class="short">(.*?)</span>') duanpin=re.findall(res, HTML) # Save data for duan in duanpin: Writer =csv.writer(f) duanpin=[] duanpin. Append (Duan) writerow(duanpin) # TXT =f.read() (r'd :\360MoveData\Users\cmusunqi\Documents\GitHub\R_and_python\python =' utF-8 ') TXT =f.read() txt_list=jieba.lcut(txt) string=' '.join(txt_list) w=wordcloud.WordCloud( width=1000, height=700, background_color='white', font_path="msyh.ttc", scale=15, stopwords={" "}, contour_width=5, contour_color='red' ) w.generate(string) W.to_file (r'd :\360MoveData\Users\cmusunqi\Documents\GitHub\R_and_python\python\ doupe200.png ')Copy the code

The results of



There are only a few comments in the source code of the web page, which makes me scratching my head. I feel there is something wrong with it. It may be necessary to convert the web code into mobile phone data for browsing

Judging from the ci cloud, Eight hundred is still propagating in the name of history, so don’t look at such historical nihilistic movies, guan Hu’s ass is not correct.