Selected from PyimageSearch

Heart of the machine compiles

Participation: Lu Xue, Li Zannan


Using OpenCV and Python to do deep learning object detection for live video streams is very simple. We just need to combine some appropriate code, plug in the live video, and then add the original object detection function.


In this article, we will learn how to extend the original object detection project to live video streams and video files using deep learning and OpenCV. This is done through the VideoStream class.


  • Deep learning target detection tutorial: http://www.pyimagesearch.com/2017/09/11/object-detection-with-deep-learning-and-opencv/

  • VideoStream class tutorial: http://www.pyimagesearch.com/2016/01/04/unifying-picamera-and-cv2-videocapture-into-a-single-class-with-opencv/


  • Now, we’ll start applying deep learning + target detection code to video streams while measuring FPS processing speed.


    Deep learning and OpenCV are used for video object detection


    To build a real-time object detector based on OpenCV deep learning, we need to efficiently access the camera/video stream and apply target detection to every frame.


    First, we open a new file named real_time_object_detection.py and add the following code:



    We import the package starting at lines 2-8. Before that, you need imUtils and OpenCV 3.3. On system Settings, you only need to install OpenCV with the default Settings (and make sure you follow all Python virtual environment commands).


    Note: Please ensure that you have downloaded and installed OpenCV 3.3 (or later) and Opencv-contrib (for OpenCV 3.3) to ensure that the deep neural network modules are included.


    Below, we parse these command-line arguments:



    Compared to the previous object detection project, we don’t need image parameters because here we are dealing with video stream and video — except that the following parameters remain the same:


    • –prototxt: Caffe prototxt file path.

    • –model: path of the pre-training model.

    • –confidence: indicates the minimum probability threshold for weak filtering detection. The default value is 20%.


    Next, we initialize the class list and color set:



    In lines 22-26, we initialize the CLASS tag and the corresponding random COLORS. Detailed information about these classes (as well as the training of the network), please refer to: http://www.pyimagesearch.com/2017/09/11/object-detection-with-deep-learning-and-opencv/


    Now we load our model and set up our own video stream:



    We load our serialization model and provide references to our prototxt and model files (line 30). As you can see in OpenCV 3.3, this is very simple.


    Next, we initialize the video stream (the source can be a video file or camera). First, we start VideoStream (line 35), then wait for the camera to start (line 36), and finally start the frame per second calculation (line 37). The VideoStream and FPS classes are part of the ImUtils package.


    Now, let’s go through each frame (skip some if you’re really fast) :



    First, we read a frame from the video stream (line 43) and then resize it (line 44). Since we’ll need width and height later, we grab on line 47. The frame is then converted to a BLOB with DNN modules (line 48).


    Now, we set the BLOB as the input to the neural network (line 52) and pass the input through NET (line 53), which gives us the Detections.


    Now that we have detected the target in the input frame, it is time to look at the confidence value to see if we can draw bounding boxes and labels around the target:



    We first loop through the Detections, remembering that multiple objects can be detected in an image. We also need to check the confidence (probability) of each detection. If the confidence level is high enough (above the threshold), the prediction will be displayed on the terminal and the image will be predicted in the form of text and colored bounding boxes. Let’s look at it line by line:


    In the detections loop, we first extract the confidence value (line 59).


    If the confidence is higher than the minimum threshold (line 63), then we extract the class tag index (line 67) and calculate the coordinates of the detected target (line 68).


    We then extract the (x, y) coordinates of the bounding box (line 69), which will be used to draw the rectangle and text.


    We build a text label with the CLASS name and confidence (lines 72, 73).


    We also draw a colored rectangle around the object using the class colors and the previously extracted (x, y) coordinates (lines 74, 75).


    Normally, we want the label to appear above the rectangle, but if there is no space, we will display the label slightly below the top of the rectangle (line 76).


    Finally, we place the colored text on the frame using the y value we just calculated (lines 77, 78).


    The remaining steps of the frame capture cycle include :(1) display the frame; (2) Check quit key; (3) Update FPS counter



    The code block above is straightforward. First we show the frame (line 81), then find the specific key (line 82) and check if the “Q” key (which stands for “quit”) is pressed. If so, we exit the frame capture loop (lines 85, 86). Last update the FPS counter (line 89).


    If we exit the loop (the “Q” key or the end of the video stream), we also deal with these:



    When we exit the loop, the FPS counter stops (line 92) and the number of frames per second is output to the terminal (lines 93, 94).


    We close the window (line 97) and then stop the video stream (line 98).


    If you’re at this point, you’re ready to try it out with your own webcam. Let’s move on to the next part.


    Results of real-time deep learning target detection


    To get the real-time deep learning object detector up and running, make sure you use the sample code in the “Downloads” section of this guide and a pre-trained convolutional neural network. (Go to the original link to the ‘Downloads’ section, enter your email address, and get the code and other information you need.)


    Open the terminal and execute the following command:



    If OpenCV has access to your camera, you can see the output video frame with the detected target. I used deep learning target detection for the sample video, and the results are as follows:




    Figure 1: Short video of real-time object detection using deep learning and OpenCV + Python.


    Note that the deep learning target detector is not only able to detect the person, but also the couch the person is sitting on and the chair next to them — all in real time!



    conclusion


    In today’s blog, we learned how to use deep learning + OpenCV + video streaming to perform real-time object detection. We accomplished this with the following two tutorials:


    1. The use of deep learning and OpenCV detection (http://www.pyimagesearch.com/2017/09/11/object-detection-with-deep-learning-and-opencv/)

    2. In OpenCV Threading and efficiently on video (http://www.pyimagesearch.com/2016/01/04/unifying-picamera-and-cv2-videocapture-into-a-single-class-with-op Encv /)


    The end result is a deep learning-based target detector that can handle 6-8 FPS of video (depending on your system speed, of course).


    You can speed things up even further by:


    1. Skip frames.

    2. Use different variants of MobileNet (faster, but less accurate).

    3. Use the quantized variant of SqueezeNet (I haven’t tested this yet, but I think it should be faster since it has a smaller network footprint).


    The original link: http://www.pyimagesearch.com/2017/09/18/real-time-object-detection-with-deep-learning-and-opencv/



    This article is compiled for machine heart, reprint please contact this public number for authorization.