Python Crawler - Crawler movie 800 word cloud

The text and pictures in this article come from the network, only for learning, exchange, do not have any commercial purposes, copyright belongs to the original author, if you have any questions, please contact us to deal with

The following article comes from Tencent Cloud author: user 7760819

(Want to learn Python? Python Learning exchange group: 1039649593, to meet your needs, materials have been uploaded to the group file stream, you can download! There is also a huge amount of new 2020Python learning material.)

Python crawler – 800 word cloud

An overview of the

Douban eight hundred short comments on reptiles

Train of thought

Use re to parse web pages and get data to draw word clouds using Wordcloud

code

# Import requests import re import CSV import jieba import wordCloud # Implement multi-page crawler via loop # Observe page link rules # https://movie.douban.com/subject/26754233/comments?start=0&limit=20&sort=new_score&status=P # https://movie.douban.com/subject/26754233/comments?start=20&limit=20&sort=new_score&status=P # https://movie.douban.com/subject/26754233/comments?start=40&limit=20&sort=new_score&status=P # https://movie.douban.com/subject/26754233/comments?start=60&limit=20&sort=new_score&status=P # 20 from 0 to each page, so setting step, Page =[] for range(0,80,20): page.append(i) with open CSV ','a',newline= ",encoding=' UTF-8 ') as (r'd :\360MoveData\Users\cmusunqi\Documents\GitHub\R_and_python\python =' utF-8 ') as (r'd :\360MoveData\Users\cmusunqi\Documents\GitHub\R_and_python\python\  f: for i in page: url='https://movie.douban.com/subject/26754233/comments?start='+str(i)+'&limit=20&sort=new_score&status=P' headers={ 'user-agent ': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; Rv :80.0) Gecko/20100101 Firefox/80.0'} resp=requests. Get (URL,headers=headers) HTML =resp.text Res =re.compile('<span class="short">(.*?)</span>') duanpin=re.findall(res, HTML) # Save data for duan in duanpin: Writer =csv.writer(f) duanpin=[] duanpin. Append (Duan) writerow(duanpin) # TXT =f.read() (r'd :\360MoveData\Users\cmusunqi\Documents\GitHub\R_and_python\python =' utF-8 ') TXT =f.read() txt_list=jieba.lcut(txt) string=' '.join(txt_list) w=wordcloud.WordCloud( width=1000, height=700, background_color='white', font_path="msyh.ttc", scale=15, stopwords={" "}, contour_width=5, contour_color='red' ) w.generate(string) W.to_file (r'd :\360MoveData\Users\cmusunqi\Documents\GitHub\R_and_python\python\ doupe200.png ')Copy the code

The results of

There are only a few comments in the source code of the web page, which makes me scratching my head. I feel there is something wrong with it. It may be necessary to convert the web code into mobile phone data for browsing

Judging from the ci cloud, Eight hundred is still propagating in the name of history, so don’t look at such historical nihilistic movies, guan Hu’s ass is not correct.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Python Crawler – Crawler movie 800 word cloud

Python crawler – 800 word cloud

An overview of the

Train of thought

code

The results of

Python Crawler – Crawler movie 800 word cloud

Python crawler – 800 word cloud

An overview of the

Train of thought

code

The results of

Related Posts

Python Tkinter Tutorial Series 02: Number Guessing Games

Self-made speed Recorder “GitHub Hot Spot overview V.21.31”

Go2Shell+iTerm2+Oh My Zsh