Python urllib. Request on pit

BUG record – and step on and cherish, try not to fall in the same place twice

1, the background

In the process of project development, there is a requirement to obtain the image information of the corresponding label, which needs to be queried from the image server. The following methods were used before:

 import json
 import urllib
 
 url = 'http://127.0.0.1:8080/images/query/? type=%s&tags=%s'% ('yuv'.'4,3,6')
 
 print("url: " + str(url))
 response = urllib.request.urlopen(url)
 
 download_list = json.loads(response.read())

 print(download_list)
Copy the code

There was no problem when the amount of data was small before, but when the amount of data was large, such as 192708 bytes, the following error occurred:

Traceback (most recent call last): File "/Users/min/Desktop/workspace/python/Demo/fuck.py", line 12, in <module> download_list = json.loads(response.read()) File "/ Library/Frameworks/Python framework Versions/HTTP / 3.6 / lib/python3.6 / client. Py", line 464, in read s = self._safe_read(self.length) File "/ Library/Frameworks/Python framework Versions/HTTP / 3.6 / lib/python3.6 / client. Py", line 618, in _safe_read raise IncompleteRead(b''.join(s), amt) http.client.IncompleteRead: IncompleteRead(144192 bytes read, 48516 more expected)

2. Analysis conclusions

Digging deeper, we see that the read() method in this scenario ends up in the following method block:

def _safe_read(self, amt):
    """Read the number of bytes requested, compensating for partial reads. Normally, we have a blocking socket, but a read() can be interrupted by a signal (resulting in a partial read). Note that we cannot distinguish between EOF and an interrupt when zero bytes have been read. IncompleteRead() will be raised in this situation. This function should  be used when 
      
        bytes "should" be present for reading. If the bytes are truly not available (due to EOF), then the IncompleteRead exception can be used to detect the problem. """
        s = []
    while amt > 0:
        print("1", amt)
        chunk = self.fp.read(min(amt, MAXAMOUNT))
        print(chunk)
        if not chunk:
            raise IncompleteRead(b''.join(s), amt)
        s.append(chunk)
        print('2',len(chunk))
        amt -= len(chunk)
        print('3',amt)
    return b"".join(s)
Copy the code

It can be seen that this method itself is defective, that is, we cannot distinguish between EOF and an interrupt when zero bytes have been read. The following debugging information is displayed:

 1 192708
 
 b'[{"title": "\\u5ba4\\u5185\\u767d\\u8272\\u80cc\\u666f\\u5899+\\u6b63\\u5e38\\u5149+\\u8fd1\\u8ddd(\\u5927\\u8138)+\\u65e0\\u9762\\ U90e8 \\u7a7f\\u623..... #'
Copy the code

Therefore, it is recommended to avoid using URllib library and use Requests instead for big data transmission.

In addition, it seems that the urllib.request header information is much coarser than the requests header information. For example, the most critical transfer-encoding information is missing. Details are as follows:

****urllib.request: **** Server: nginx/1.14.0 (Ubuntu) Date: Mon, 17 Sep 2018 10:02:51 GMT Content-Type: text/ HTML; Charset = UTF-8 Content-Length: 192708 Connection: close X-frame-options: SAMEORIGIN '**</pre> ****request: **** {'Server': 'nginx/1.14.0 (Ubuntu)', 'Date': 'Mon, 17 Sep 2018 09:55:15 GMT', 'content-type ': 'text/ HTML; charset=utf-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'X-Frame-Options': 'SAMEORIGIN', 'Content-Encoding': 'gzip'}Copy the code

1, the background

2. Analysis conclusions

Related Posts

Java concurrency threads and processes

Roll in the same thread system for older employees

It’s possible to work for years without really understanding interfaces and abstract classes