Thread pool concept:

Thread pool can be understood as a pool of threads, placed a specified number of threads in the pool, when we submit task exceeds the number of thread pool, redundant task will be waiting in line, waiting for other tasks to end, then the queue tasks submitted to the thread, the thread pool advantage is that can perform multiple tasks at the same time, the reuse thread resources, Reduce the creation and destruction of threads and save system resources.

1. Common code, the theory is to execute tasks in one thread, similar to other languages, code is executed from top to bottom.

import time


def test_data(index):
    time.sleep(5)
    if index % 2= =0:
        print(f'{index}Execution error. ')
        raise Exception('I made a mistake.')
    print(f'{index}The execution is complete. ')

for i in range(1.50):
    test_data(i)
Copy the code
  • Output:

1 No further action is required. Traceback (most recent call last): File"/Users/peakchao/Code/Py/ReptileForPython/pool_test.py", line 15, in <module>
    test_data(i)
  File "/Users/peakchao/Code/Py/ReptileForPython/pool_test.py", line 11, in test_data
    raise Exception('I made a mistake.') Exception: I reported an error 2 execution error.Copy the code

Analysis: loop call test_data method, test_data method sleep 5 seconds after the incoming value of the mod, if the remainder is 0, then throw exception, when I =1, the program normal execution, in the second loop, I ==2, mod = 0, throw exception, program crash exit.

Note: this time every 5 seconds to print order.

2. Multi-threaded code, the theory is that multiple threads execute tasks at the same time, thread pool in different languages have similar implementation.

import time
from concurrent.futures import ThreadPoolExecutor

pool = ThreadPoolExecutor(max_workers=2)


def test_data(index):
    time.sleep(5)
    if index % 2= =0:
        print(f'{index}Execution error. ')
        raise Exception('I made a mistake.')
    print(f'{index}The execution is complete. ')


for i in range(0.50):
    pool.submit(test_data, i)
Copy the code
  • Output:

1 No further action is required. 0 Execution error. 3 No further action is required. 2 Execution error. 4 Execution error. 5 No further action is required. 6 Execution error. 7 No further action is required.Copy the code

Analysis: There are two prints at the same time, indicating that there are two tasks in parallel. If we change the number of thread pools to N, then it is n times more efficient than a single thread.

Request disguise:

Sometimes we grab web site data, the server will return an error, and we use the browser to access the but again can normal open, because the server analyzes the our request data, tell us is the crawler, so ended a normal response, when we frequently grab a site data, even set the request camouflage will occasionally fail, because the request information is fixed, They were intercepted on a regular basis.

USER_AGENTS = [
    'the Mozilla / 5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.116 Safari/537.36'.'the Mozilla / 5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.5 Safari/605.1.15'.'the Mozilla / 5.0 (iPhone; CPU iPhone OS 13_2_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.3 Mobile/15E148 Safari/604.1'.'the Mozilla / 5.0 (the device; CPU OS 11_0 like Mac OS X) AppleWebKit/604.1.34 (KHTML, like Gecko) Version/11.0 Mobile/15A5341f Safari/604.1'.'the Mozilla / 5.0 (Linux; The Android 8.0. Pixel 2 Build/OPD3.170816.012) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.116 Mobile Safari/537.36'
]


def get_request_headers(a):
    headers = {
        'User-Agent': random.choice(USER_AGENTS),
        'Accept': 'text/html,application/xhtml+xml,application/xml; Q = 0.9, image/webp image/apng, * / *; Q = 0.8, application/signed - exchange; v=b3'.'Accept-language': 'zh-CN,zh; Q = 0.9 '.'Accept-Encoding': 'gzip, deflate,br'.'Connection': 'keep-alive',}return headers
Copy the code

Custom exceptions:

Often, system-defined exceptions do not meet requirements. In order to throw more explicit errors, and later to intercept and handle the desired errors, we need to customize exceptions.

class BaseException(Exception):
    def __init__(self, msg):
        self.msg = msg

    def __str__(self):
        print(self.msg)


try:
    input_data = input('Please enter:')
    if len(input_data) > 0:
        raise BaseException('Ha ha, no text allowed.')
    print('执行完毕')
except BaseException as err:
    print(F 'catches a custom exception:{err}')
Copy the code
  • Output:

Traceback (most recent call last): File Traceback (most recent call last): File"/Users/peakchao/Code/Py/ReptileForPython/pool_test.py", line 30, in <module>
    raise BaseException('Ha ha, no text allowed.')
__main__.BaseException: <exception str() failed>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/peakchao/Code/Py/ReptileForPython/pool_test.py", line 33, in <module>
    print(f'Custom exception caught :{err}')
TypeError: __str__ returned non-string (type NoneType)

Process finished with exit code 1

Copy the code