BUG record – and step on and cherish, try not to fall in the same place twice
1, the background
In the process of project development, there is a requirement to obtain the image information of the corresponding label, which needs to be queried from the image server. The following methods were used before:
import json
import urllib
url = 'http://127.0.0.1:8080/images/query/? type=%s&tags=%s'% ('yuv'.'4,3,6')
print("url: " + str(url))
response = urllib.request.urlopen(url)
download_list = json.loads(response.read())
print(download_list)
Copy the code
There was no problem when the amount of data was small before, but when the amount of data was large, such as 192708 bytes, the following error occurred:
Traceback (most recent call last): File "/Users/min/Desktop/workspace/python/Demo/fuck.py", line 12, in <module> download_list = json.loads(response.read()) File "/ Library/Frameworks/Python framework Versions/HTTP / 3.6 / lib/python3.6 / client. Py", line 464, in read s = self._safe_read(self.length) File "/ Library/Frameworks/Python framework Versions/HTTP / 3.6 / lib/python3.6 / client. Py", line 618, in _safe_read raise IncompleteRead(b''.join(s), amt) http.client.IncompleteRead: IncompleteRead(144192 bytes read, 48516 more expected)
2. Analysis conclusions
Digging deeper, we see that the read() method in this scenario ends up in the following method block:
def _safe_read(self, amt):
"""Read the number of bytes requested, compensating for partial reads. Normally, we have a blocking socket, but a read() can be interrupted by a signal (resulting in a partial read). Note that we cannot distinguish between EOF and an interrupt when zero bytes have been read. IncompleteRead() will be raised in this situation. This function should be used when
bytes "should" be present for reading. If the bytes are truly not available (due to EOF), then the IncompleteRead exception can be used to detect the problem. """
s = []
while amt > 0:
print("1", amt)
chunk = self.fp.read(min(amt, MAXAMOUNT))
print(chunk)
if not chunk:
raise IncompleteRead(b''.join(s), amt)
s.append(chunk)
print('2',len(chunk))
amt -= len(chunk)
print('3',amt)
return b"".join(s)
Copy the code
It can be seen that this method itself is defective, that is, we cannot distinguish between EOF and an interrupt when zero bytes have been read. The following debugging information is displayed:
1 192708
b'[{"title": "\\u5ba4\\u5185\\u767d\\u8272\\u80cc\\u666f\\u5899+\\u6b63\\u5e38\\u5149+\\u8fd1\\u8ddd(\\u5927\\u8138)+\\u65e0\\u9762\\ U90e8 \\u7a7f\\u623..... #'
Copy the code
Therefore, it is recommended to avoid using URllib library and use Requests instead for big data transmission.
In addition, it seems that the urllib.request header information is much coarser than the requests header information. For example, the most critical transfer-encoding information is missing. Details are as follows:
****urllib.request: **** Server: nginx/1.14.0 (Ubuntu) Date: Mon, 17 Sep 2018 10:02:51 GMT Content-Type: text/ HTML; Charset = UTF-8 Content-Length: 192708 Connection: close X-frame-options: SAMEORIGIN '**</pre> ****request: **** {'Server': 'nginx/1.14.0 (Ubuntu)', 'Date': 'Mon, 17 Sep 2018 09:55:15 GMT', 'content-type ': 'text/ HTML; charset=utf-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'X-Frame-Options': 'SAMEORIGIN', 'Content-Encoding': 'gzip'}Copy the code