Requests Library Introduction:

Requests is the only Non-gmO Python HTTP library that is safe for human consumption. This statement directly and defiantly declares that the Requests library is python’s best HTTP library.

Simple usage of Requests

The seven main methods of the Requests library

Python Development IT Exchange group: 887934385 Provides information, source code, common exchange, endeavour

11. Requests. Get

import requests  # import Requests library
r = requests.get(url) Send the request using the get method, return the Response containing the web page data and store it in the Response object RCopy the code

Properties of the Response object:

R.status_code: return status of the HTTP request. 200 indicates successful connection (HTTP status code)
R.ext: Returns the text content of the object
R.tent: Guess the binary form of the returned object
R.encoding: Analyzes the encoding of returned objects
R.apparent_encoding: Response content encoding (alternative encoding)

Take Zhihu as an example to show the use of the above code:

>>> import requests
>>> r = requests.get('https://www.zhihu.com/')
>>> r.status_code
500
>>> r.text   # omit
>>> r.content   # omit
>>> r.encoding
'ISO-8859-1'
>>> r.apparent_encoding
'ascii'Copy the code

In actual combat

Analysis douban short comments web page

First through the browser tools to analyze the loading way of the web page. Only synchronously loaded data can be directly viewed in the source code of the web page. Asynchronously loaded data cannot be directly viewed in the source code of the web page.

Change the JavaScript from “allow” to “prevent” and refresh the page again. If the page is loaded normally, it indicates that the page is loaded synchronously; if the page is not loaded normally, it indicates that the page is loaded asynchronously.

Steps to download data using Requests

Import Requests library
Enter the url
Use the get method
Print return text
An exception is thrown

import requests # import Requests library

url = ' https://book.douban.com/subject/27147922/?icn=index-editionrecommend' # enter the url
r = requests.get(url,timeout=20) Use the get method
##print(r.ext) #print returns text
print(r.raise_for_status()) Throw an exceptionCopy the code

None

Copy the code

Crawl the common web page frame

Define a function
Set the timeout
Exception handling
Call a function |

# define function
def getHTMLText(url):
    try:
        r = requests.get(url,timeout=20) # set timeout
        r.raise_for_status()
        r.encoding = r.apparent_encoding
        return r.text
    except: # exception handling
        return "Generate an exception"

if __name__ == '__main__':
    url = ""
    print(getHTMLText(url)) # call function
Copy the code

The crawler agreement

What is crawler protocol

The crawler protocol, also known as robots, is designed to tell web spiders which pages to crawl and which not to

How to view crawler protocol

Add robots.txt after visiting the website domain name, for example, check the crawler agreement of Baidu website: www.baidu.com/robots.txt

Crawler protocol attributes

Intercept all bots: user-agent: * Disallow: /

Allow all robots: user-agent: * Disallow:

The crawler advice

Crawl public data on the Internet
Try to slow down your speed
Try to follow the robots protocol
Do not use it for commercial purposes
Do not publish crawlers and data

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Crawler Learning Notes: Using Requests to crawl douban.com

Requests Library Introduction:

Simple usage of Requests

In actual combat

Analysis douban short comments web page

Steps to download data using Requests

Crawl the common web page frame

The crawler agreement

What is crawler protocol

How to view crawler protocol

Crawler protocol attributes

The crawler advice

Crawler Learning Notes: Using Requests to crawl douban.com

Requests Library Introduction:

Simple usage of Requests

In actual combat

Analysis douban short comments web page

Steps to download data using Requests

Crawl the common web page frame

The crawler agreement

What is crawler protocol

How to view crawler protocol

Crawler protocol attributes

The crawler advice

Related Posts

Sorry, I lost my mind after learning all this JAVA knowledge

Redis Basic Series (7) – Those optimization commands

Elasticsearch- Core (4)- Document manipulation