Python Crawler in action - this is definitely the tutorial you want to capture GIF images

preface

Recently found a very interesting website (the dog head to save life), some movies and TV dramas let people blood bloated shots made into GIF pictures, full of all is love, as a qualified crawler, don’t put it into the ‘homework’ document how

Crawl target

Url: GIF source

Renderings on their own, here I will not show ~

Tool use

Development tools: pycharm development environment: python3.7, Windows10 uses toolkit: requests, LXML

Key content learning

Requests to use
Xpath parses data
Get GIF data

Project idea Analysis

The first step is to figure out which destination data you want to collect from the web site by sending web requests through the Requests toolkit by changing the URL

http://gifcc.com/forum-38-{}.html
Copy the code

Convert the current page data

Extract web page data through xpath

The extracted data is the value of the A label

What we need is a gifs

GIF in the details page

    url = 'http://gifcc.com/forum-38-{}.html'.format(page)
    response = RequestTools(url).text
    html = etree.HTML(response)
    atarget = html.xpath('//div[@class="c cl"]/a/@href')
    for i in atarget:
        urls = 'http://gifcc.com/' + i
Copy the code

Send a network request to the detail page again to enter the detail page, and extract the corresponding title and the address of the corresponding GIF picture through xpath

The name of the picture can also be defined

Response = RequestTools(url).text HTML = etree.html (response) # HTML object created instead of namespace try: Gifurl = html.xpath('//td[@class="t_f"]/div[1]/div/div/div/div/div/div[1]/img/@src')[0] # Title = gifurl.split('/')[-1] # Save(gifContent, title) except Exception as e: print(e)Copy the code

Request the corresponding image address

Get GIF image data

Save the image information

def Save(gifcontent, title): F = open('./GIF/' + title, 'wb') F.write (gifcontent.content) f.close() print('{} '... '.format(title))Copy the code

Simple source sharing

Import requests from LXML import etree # xpath Headers = {' user-agent ': 'Mozilla/5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.101 Safari/537.36'} headers=headers) return response def Save(gifcontent, title): F = open('./GIF/' + title, 'wb') F.write (gifcontent.content) f.close() print('{} '... '.format(title)) def DateilsPage(url): Response = RequestTools(url).text HTML = etree.html (response) # HTML object creation replaces namespace try: Gifurl = html.xpath('//td[@class="t_f"]/div[1]/div/div/div/div/div/div[1]/img/@src')[0] # Title = gifurl.split('/')[-1] # Save(gifContent, title) except Exception as e: print(e) def main(page): url = 'http://gifcc.com/forum-38-{}.html'.format(page) response = RequestTools(url).text html = etree.HTML(response) atarget = html.xpath('//div[@class="c cl"]/a/@href') for i in atarget: If __name__ == '__main__': for page in range(1, 11): main(page)Copy the code

I am white and white I, a love to share knowledge of the program yuan ❤️

If you have no experience in programming, you can read this blog and find that you don’t know or want to learn Python, you can directly leave a message or private me.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Python Crawler in action – this is definitely the tutorial you want to capture GIF images

preface

Crawl target

Tool use

Key content learning

Project idea Analysis

Simple source sharing

Python Crawler in action – this is definitely the tutorial you want to capture GIF images

preface

Crawl target

Tool use

Key content learning

Project idea Analysis

Simple source sharing

Related Posts

SQL optimization requirements: Parallel execution framework and execution plan

Jdk8 ConcurrentHashMap source code parsing

Artifact MdNice!