Learning web crawler, using Python3+ URllib module, always error when encountering Chinese characters in links. I thought it was the problem of Python coding, but I kept trying different codes to encode and decode, but the problem could not be solved, and I could not continue to consult the data. Finally, I found that the solution was very simple.
- I always get an error when I visit the website with Chinese characters:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 10-11: ordinal not in range(128)
Copy the code
At first thought coding problem, did not solve the source code code as follows:
The import urllib rooturl = "https://baike.baidu.com/item/" item = "crawlers" url = rooturl + item print request = (url) urllib.request.Request(url=url) reponse = urllib.request.urlopen(request) result = reponse.read() result = str(result, encoding="utf-8") print(result)Copy the code
- The solution finally found the problem, is the Chinese “crawler” problem, but not the coding problem. Urllib.request.request () : urllib.parse.quote() : urllib.parse.quote() :
item = urllib.parse.quote(item)
Copy the code
However, if you convert the Chinese part of the link, you will get an error if you convert all the links:
ValueError: unknown url type: 'https%3A//baike.baidu.com/item/%E7%88%AC%E8%99%AB'
Copy the code
Complete code:
The import urllib rooturl = "https://baike.baidu.com/item/" item = "crawlers" item = urllib. Parse the quote (item) url = rooturl + items print(url) request = urllib.request.Request(url=url) reponse = urllib.request.urlopen(request) result = reponse.read() result = str(result, encoding="utf-8") print(result)Copy the code
Today, I stepped on this pit again, I filled it up, I hope that the road after the smooth spread more thank you for watching