Python socket.error: [Errno 10054] The remote host forced an existing connection to close. Problem solution:
Read web pages using Python the other day. The use of urlopen on a site is considered an attack by that site. Sometimes downloads are no longer allowed. Request.read () remains stuck after urlopen(). Errno 10054 is thrown.
This error is Connection reset by peer. The legendary remote host resets the connection. The possible cause is that the socket timeout period is too long. Request = urllib.request.urlopen(url), request.close(); It could also be that the failure to sleep for a few seconds causes the site to identify the behavior as an attack.
The specific solution is as follows:
[python] view plain copy
- import socket
- import time
- timeout = 20
- Socket.setdefaulttimeout (timeout)# set timeout for the entire socket layer. If the socket is used in subsequent files, you do not need to set it
- sleep_download_time = 10
- Sleep (sleep_download_time) #
- Request = urllib.request.urlopen(url)# here is the url to read the content
- Content = request.read()#
- Request.close ()# Remember to close
Because the read() operation after urlopen actually calls some socket layer function. Therefore, by setting the default socket timeout period, the network can be disconnected by itself. You don’t have to wait forever at read().
You can also write a try,except, for example:
[python] view plain copy
- try:
- time.sleep(self.sleep_download_time)
- request = urllib.request.urlopen(url)
- content = request.read()
- request.close()
- except UnicodeDecodeError as e:
- print(‘—–UnicodeDecodeError url:’,url)
- except urllib.error.URLError as e:
- print(“—–urlError url:”,url)
- except socket.timeout as e:
- print(“—–socket timout:”,url)
Generally speaking, there is no problem. I tested thousands of web downloads before SAYING this. However, if I download thousands of files, AND I run a test, MS will still jump the exception. It could be that time.sleep() is too short, or the network is suddenly down. I tested it using urllib.request.retrieve() and found that there was always a failure as data was downloaded over and over again.
The easy way to do this is to first refer to my article: Simple Implementations of Python checkpoints. Make a checkpoint first. Then, while True the above part of the code will run the exception. See the pseudocode below:
[c-sharp] view plain copy
- def Download_auto(downloadlist,fun,sleep_time=15):
- while True:
- Try: # outsource a layer try
- Value = fun(downloadlist,sleep_time) #
- Only normal execution can exit.
- if value == Util.SUCCESS:
- break
- Except: # if 10054 or IOError or XXXError occurs
- Sleep_time += 5 # Sleep 5 seconds longer and retry the download above. Because of the checkpoint, the above program continues execution where the exception was thrown. Prevents program interruption due to unstable network connection.
- print(‘enlarge sleep time:’,sleep_time)
However, for the corresponding page can not be found, and to do another processing:
[c-sharp] view plain copy
- # Print download information
- def reporthook(blocks_read, block_size, total_size):
- if not blocks_read:
- print (‘Connection opened’)
- if total_size < 0:
- print (‘Read %d blocks’ % blocks_read)
- else:
- # If the page is not found, the page does not exist, maybe totalsize is 0
- Print (‘downloading:%d MB, totalsize:%d MB’ % (blocks_read*block_size/1048576.0,total_size/1048576.0))
- def Download(path,url):
- # url = ‘downloads.sourceforge.net/sourceforge…
- #filename = url.rsplit(“/”)[-1]
- try:
- # Python’s built-in download function
- urllib.request.urlretrieve(url, path, reporthook)
- Except IOError as e:
- print(“download “,url,”/nerror:”,e)
- print(“Done:%s/nCopy to:%s” %(url,path))
If you still have problems… Please add other solutions in the comments.
CSDN Official study Recommendation ↓ ↓ ↓* * * *
CSDN provides a full stack knowledge graph for Python.