“This is the second day of my participation in the November Gwen Challenge. See details of the event: The last Gwen Challenge 2021”.

Python crawlers slow? Let’s take a look at concurrent programming

Daemon thread

In Python multithreading, the main thread waits for other child threads to finish executing if the code in the main thread is finished. This creates a problem. If one thread is set to loop indefinitely, it means that the entire main thread (Python program) cannot be terminated. Let me give you an example.

import threading
import time

# non-daemon thread
def normal_thread() :
    for i in range(10000):
        time.sleep(1)
        print(f'normal thread {i}')

print(threading.current_thread().name, 'Thread start')
thread1 = threading.Thread(target=normal_thread)
thread1.start()
print(threading.current_thread().name, 'Thread terminated')
Copy the code

As you can see from the above result, although the MainThread has ended, the child thread is still running. When the child thread finishes running, the whole program is really finished. If you want to terminate the main thread while terminating the other threads that have not finished running, you can set the thread to daemon. If the only daemon thread in the program is still executing and the main program terminates, the Python program will exit normally. The threading module provides two ways to set up daemon threads.

threading.Thread(target=daemon_thread, daemon=True)

thread.setDaemon(True)

import threading
import time

# daemon thread (force wait 1s)
def daemon_thread() :
    for i in range(5):
        time.sleep(1)
        print(f'daemon thread {i}')

# non-daemon thread (no forced wait)
def normal_thread() :
    for i in range(5) :print(f'normal thread {i}')

print(threading.current_thread().name, 'Thread start')
thread1 = threading.Thread(target=daemon_thread, daemon=True)
thread2 = threading.Thread(target=normal_thread)
thread1.start()
# thread1.setDaemon(True)
thread2.start()
print(threading.current_thread().name, 'Thread terminated')
Copy the code

If thread1 is set to a daemonthread, the program will terminate after the non-daemon_thread () and MainThread () have finished running, so the output statement in daemon_thread() will not have time to execute. The output shows that normal_thread() is still being output after the MainThread ends, because it takes some time before the MainThread ends and the daemons force it to stop.

Inheritance of daemon threads

The child thread inherits the daemon properties of the current thread. The main thread is non-daemons by default, so any thread created in the main thread is non-daemons by default, but when a new thread is created in the daemon thread, it inherits the daemon properties of the current thread, and the child thread is also a daemon thread.

The join () blocks

In multi-threaded crawler, the information of different pages is generally crawled simultaneously through multi-threading, and then analyzed and stored in a unified manner. Therefore, it is necessary to wait for the completion of all sub-threads before continuing the following processing, which requires the join() method.

The join() method blocks (suspends) other threads (unstarted threads and the main thread), waiting for the called thread to finish running before waking up the other threads. Let’s look at an example.

import threading
import time

def block(second) :
    print(threading.current_thread().name, 'Thread running')
    time.sleep(second)
    print(threading.current_thread().name, 'Thread terminated')

print(threading.current_thread().name, 'Thread running')

thread1 = threading.Thread(target=block, name=f'thread test 1', args=[3])
thread2 = threading.Thread(target=block, name=f'thread test 2', args=[1])

thread1.start()
thread1.join()

thread2.start()

print(threading.current_thread().name, 'Thread terminated')
Copy the code

Use join() only on thread1. Note where join() is used, before thread2.start() is started, after which both thread2 and the main thread are suspended, and only after thread1 completes execution, Thread2 and the MainThread will execute. Since thread2 is not a daemon thread, thread2 will continue to run after the MainThread completes execution.

See here, is there a question? If the above code is followed, the entire program becomes completely single-threaded because join() is used in the wrong place. Let’s change the above code a little bit.

import threading
import time

def block(second) :
    print(threading.current_thread().name, 'Thread running')
    time.sleep(second)
    print(threading.current_thread().name, 'Thread terminated')

print(threading.current_thread().name, 'Thread running')

thread1 = threading.Thread(target=block, name=f'thread test 1', args=[3])
thread2 = threading.Thread(target=block, name=f'thread test 2', args=[1])

thread1.start()
thread2.start()

thread1.join()
print(threading.current_thread().name, 'Thread terminated')
Copy the code

The program is now truly multithreaded, and when join() is used, the main thread is suspended and only executed when thread1 completes.

Finally, join() blocks regardless of the object, regardless of whether it is a daemon thread or a main thread. When using join(), you need to start all the child threads before calling join(). Otherwise, it will become a single thread.



That’s all for this article, if it feels good.❤ Just like it before you go!! ❤

For those who are new to Python or want to learn Python, you can search “Python New Horizons” on wechat to communicate and learn with others. They are all beginners. Sometimes a simple question is stuck for a long time, but others may suddenly realize it with a little help.