preface
Recently found a very interesting website (the dog head to save life), some movies and TV dramas let people blood bloated shots made into GIF pictures, full of all is love, as a qualified crawler, don’t put it into the ‘homework’ document how
Crawl target
Url: GIF source
Renderings on their own, here I will not show ~
Tool use
Development tools: pycharm development environment: python3.7, Windows10 uses toolkit: requests, LXML
Key content learning
-
Requests to use
-
Xpath parses data
-
Get GIF data
Project idea Analysis
The first step is to figure out which destination data you want to collect from the web site by sending web requests through the Requests toolkit by changing the URL
http://gifcc.com/forum-38-{}.html
Copy the code
Convert the current page data
Extract web page data through xpath
The extracted data is the value of the A label
What we need is a gifs
GIF in the details page
url = 'http://gifcc.com/forum-38-{}.html'.format(page)
response = RequestTools(url).text
html = etree.HTML(response)
atarget = html.xpath('//div[@class="c cl"]/a/@href')
for i in atarget:
urls = 'http://gifcc.com/' + i
Copy the code
Send a network request to the detail page again to enter the detail page, and extract the corresponding title and the address of the corresponding GIF picture through xpath
The name of the picture can also be defined
Response = RequestTools(url).text HTML = etree.html (response) # HTML object created instead of namespace try: Gifurl = html.xpath('//td[@class="t_f"]/div[1]/div/div/div/div/div/div[1]/img/@src')[0] # Title = gifurl.split('/')[-1] # Save(gifContent, title) except Exception as e: print(e)Copy the code
Request the corresponding image address
Get GIF image data
Save the image information
def Save(gifcontent, title): F = open('./GIF/' + title, 'wb') F.write (gifcontent.content) f.close() print('{} '... '.format(title))Copy the code
Simple source sharing
Import requests from LXML import etree # xpath Headers = {' user-agent ': 'Mozilla/5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.101 Safari/537.36'} headers=headers) return response def Save(gifcontent, title): F = open('./GIF/' + title, 'wb') F.write (gifcontent.content) f.close() print('{} '... '.format(title)) def DateilsPage(url): Response = RequestTools(url).text HTML = etree.html (response) # HTML object creation replaces namespace try: Gifurl = html.xpath('//td[@class="t_f"]/div[1]/div/div/div/div/div/div[1]/img/@src')[0] # Title = gifurl.split('/')[-1] # Save(gifContent, title) except Exception as e: print(e) def main(page): url = 'http://gifcc.com/forum-38-{}.html'.format(page) response = RequestTools(url).text html = etree.HTML(response) atarget = html.xpath('//div[@class="c cl"]/a/@href') for i in atarget: If __name__ == '__main__': for page in range(1, 11): main(page)Copy the code
I am white and white I, a love to share knowledge of the program yuan ❤️
If you have no experience in programming, you can read this blog and find that you don’t know or want to learn Python, you can directly leave a message or private me.