preface

Recently found a very interesting website (the dog head to save life), some movies and TV dramas let people blood bloated shots made into GIF pictures, full of all is love, as a qualified crawler, don’t put it into the ‘homework’ document how

Crawl target

Url: GIF source

Renderings on their own, here I will not show ~

Tool use

Development tools: pycharm development environment: python3.7, Windows10 uses toolkit: requests, LXML

Key content learning

  • Requests to use

  • Xpath parses data

  • Get GIF data

Project idea Analysis

The first step is to figure out which destination data you want to collect from the web site by sending web requests through the Requests toolkit by changing the URL

http://gifcc.com/forum-38-{}.html
Copy the code

Convert the current page data

Extract web page data through xpath

The extracted data is the value of the A label

What we need is a gifs

GIF in the details page

    url = 'http://gifcc.com/forum-38-{}.html'.format(page)
    response = RequestTools(url).text
    html = etree.HTML(response)
    atarget = html.xpath('//div[@class="c cl"]/a/@href')
    for i in atarget:
        urls = 'http://gifcc.com/' + i
Copy the code

Send a network request to the detail page again to enter the detail page, and extract the corresponding title and the address of the corresponding GIF picture through xpath

The name of the picture can also be defined

Response = RequestTools(url).text HTML = etree.html (response) # HTML object created instead of namespace try: Gifurl = html.xpath('//td[@class="t_f"]/div[1]/div/div/div/div/div/div[1]/img/@src')[0] # Title = gifurl.split('/')[-1] # Save(gifContent, title) except Exception as e: print(e)Copy the code

Request the corresponding image address

Get GIF image data

Save the image information

def Save(gifcontent, title): F = open('./GIF/' + title, 'wb') F.write (gifcontent.content) f.close() print('{} '... '.format(title))Copy the code

Simple source sharing

Import requests from LXML import etree # xpath Headers = {' user-agent ': 'Mozilla/5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.101 Safari/537.36'} headers=headers) return response def Save(gifcontent, title): F = open('./GIF/' + title, 'wb') F.write (gifcontent.content) f.close() print('{} '... '.format(title)) def DateilsPage(url): Response = RequestTools(url).text HTML = etree.html (response) # HTML object creation replaces namespace try: Gifurl = html.xpath('//td[@class="t_f"]/div[1]/div/div/div/div/div/div[1]/img/@src')[0] # Title = gifurl.split('/')[-1] # Save(gifContent, title) except Exception as e: print(e) def main(page): url = 'http://gifcc.com/forum-38-{}.html'.format(page) response = RequestTools(url).text html = etree.HTML(response) atarget = html.xpath('//div[@class="c cl"]/a/@href') for i in atarget: If __name__ == '__main__': for page in range(1, 11): main(page)Copy the code

I am white and white I, a love to share knowledge of the program yuan ❤️

If you have no experience in programming, you can read this blog and find that you don’t know or want to learn Python, you can directly leave a message or private me.