Error 403 Forbidden when using Python to crawl a web crawler, it is common to assume that the target website has 403 Forbidden to crawl a web crawler

Urllib2. HTTPError: HTTPError 403: Forbidden Urllib2. urperror: HTTPError 403: Forbidden Urllib2. urperror: HTTPError 403: Forbidden urllib2. urperror: HTTPError 403: Forbidden urllib2. urperror: HTTPError 403: Forbidden urllib2. urperror: HTTPError 403: Forbidden

Q: So how can it be resolved? Request(url=”http://en.wikipedia.org”+pageUrl) HTML = urlopen(req) Add a headers to req so that it becomes

Headers = {' user-agent ': 'Mozilla/5.0 (Windows NT 6.1; WOW64; Rv :23.0) Gecko/20100101 Firefox/23.0'} req = urllib.request.request (url="http://en.wikipedia.org"+pageUrl, headers=headers) # req = urllib.request.Request(url="http://en.wikipedia.org"+pageUrl) html = urlopen(req)Copy the code

Q: How do I find headers?

A: You can use web search in browser developer tools, such as Firefox

Q: Are there other issues with masquerading as a browser? A: Yes, for example, the target website will block the IP address that has been queried too many times