Urllib module web crawler failed to access Chinese url

Learning web crawler, using Python3+ URllib module, always error when encountering Chinese characters in links. I thought it was the problem of Python coding, but I kept trying different codes to encode and decode, but the problem could not be solved, and I could not continue to consult the data. Finally, I found that the solution was very simple.

I always get an error when I visit the website with Chinese characters:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 10-11: ordinal not in range(128)
Copy the code

At first thought coding problem, did not solve the source code code as follows:

The import urllib rooturl = "https://baike.baidu.com/item/" item = "crawlers" url = rooturl + item print request = (url) urllib.request.Request(url=url) reponse = urllib.request.urlopen(request) result = reponse.read() result = str(result, encoding="utf-8") print(result)Copy the code

The solution finally found the problem, is the Chinese “crawler” problem, but not the coding problem. Urllib.request.request () : urllib.parse.quote() : urllib.parse.quote() :

item = urllib.parse.quote(item)
Copy the code

However, if you convert the Chinese part of the link, you will get an error if you convert all the links:

ValueError: unknown url type: 'https%3A//baike.baidu.com/item/%E7%88%AC%E8%99%AB'
Copy the code

Complete code:

The import urllib rooturl = "https://baike.baidu.com/item/" item = "crawlers" item = urllib. Parse the quote (item) url = rooturl + items print(url) request = urllib.request.Request(url=url) reponse = urllib.request.urlopen(request) result = reponse.read() result = str(result, encoding="utf-8") print(result)Copy the code

Today, I stepped on this pit again, I filled it up, I hope that the road after the smooth spread more thank you for watching

Urllib module web crawler failed to access Chinese url

Related Posts

The future is here: Will your job be replaced by ARTIFICIAL intelligence?

Map collection car | time synchronization to do those things

Image to background