Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.
Hardware and software Environment
- Ubuntu 18.04 64 – bit
- NVIDIA GTX 1070Ti 8G
- Anaconda with python 3.6
- Opencv rule 3.4.3
- Cuda 9.0
- YOLO v3
preface
The figure below shows the evolution of object detection algorithms in recent years. YOLO is currently recognized as a relatively accurate object detection algorithm, and has been developed into its third version. About darknet (for detecting YOLO open source projects), the basic situation of the reference before post xugaoxiang.com/2019/12/16/… , there are more detailed elaboration.
The preparatory work
Download the configuration file, weights model file, and object type class file required for YOLO detection
wget https://pjreddie.com/media/files/yolov3.weights
wget https://github.com/pjreddie/darknet/blob/master/cfg/yolov3.cfg?raw=true -O ./yolov3.cfg
wget https://github.com/pjreddie/darknet/blob/master/data/coco.names?raw=true -O ./coco.names
Copy the code
Fundamentals of YOLO
Generally speaking, object detection consists of two parts, object locator and object recognizer. The following is an example of how YOLOv3 works
- Divide the picture into
13x13
Size of grid cells to a sheet416x416
For pixel-sized images, for example, there will be 1024 grid cells, and the model will be in eachcell
On the forecastbounding box
- each
cell
It’s possible to predict more than onebounding box
, most of thebounding box
They all get eliminated because they’re too similar. There’s an algorithm calledNon-Maximum Suppression
, translated as non-maximum suppression, refer to the paperEfficient-Non-Maximum-Suppression.NMS
The algorithm is used to extract the highest similarity
Python code
Opencv versions 3.4.2 and above already support Darknet, as well as models that use other common deep learning frameworks such as Torch, TensorFlow, Caffe, etc.
# -*- coding: UTF-8 -*- # @time: 18-10-26 4:47 PM # @author: xugaoxiang # @email: [email protected] # @website: https://xugaoxiang.com # @file : opencv_yolov3.py # @software: PyCharm # Usage example: python3 opencv_yolov3.py --image=test.png import sys import cv2 import argparse import numpy as np import os.path # NmsThreshold = 0.4 # inpWidth = 416 inpHeight = 416 parser = argparse.ArgumentParser(description = 'Object detection using YOLOv3 in opencv') Parser. Add_argument ('--image', help = 'Path to image file.') args = parse_args() ClassesFile = "coke.names" classes = None with open(classesFile, 'rt') as f: Classes = f.read().rstrip('\n').split('\n') # yolov3 configuration and weights file modelConfiguration = "yolov3.cfg" modelWeights = "Yolov3. Weights" # opencv read net external model = cv2. Within DNN. ReadNetFromDarknet (modelConfiguration, ModelWeights) net.setPreferableBackend(cv2.dnn.dnn_backend_opencv) # The parameter is DNN_TARGET_OPENCL, but the current version only supports interl Gpus, if other Gpus are used, Net. setPreferableTarget(cv2.dnn.dnn_target_CPU) # Get the names of the output layers def getOutputsNames(net) : # Get the names of all the layers in the network layersNames = net.getLayerNames() # Get the names of the output layers, i.e. the layers with unconnected outputs return [layersNames[i[0] - 1] for i in net.getUnconnectedOutLayers()] # Bounding box def drawPred(classId, conf, left, Top, right, bottom) : # Draw a bounding box. cv2.rectangle(frame, (left, top), (right, bottom), (255, 178, 50), 3) label = '%.2f' % conf # Get the label for the class name and its confidence if classes : assert (classId < len(classes)) label = '%s:%s' % (classes[classId], label) # Display the label at the top of the bounding box labelSize, baseLine = cv2.getTextSize(label, Top = Max (top, labelSize[1]) cv2. Rectangle (frame, (left, rectangle) Top-round (1.5 * labelSize[1])), (left + round(1.5 * labelSize[0]), top + baseLine), (255, 255, 255), PutText (Frame, label, (left, top), FONT_HERSHEY_SIMPLEX, 0.75, (0, 0, 0), 1) # Bounding box def postprocess(frame, outs) : frameHeight = frame.shape[0] frameWidth = frame.shape[1] classIds = [] confidences = [] boxes = [] # Scan through all the bounding boxes output from the network and keep only the # ones with high confidence scores. Assign the box's class label as the class with the highest score. classIds = [] confidences = [] boxes = [] for out in outs : for detection in out : scores = detection[5 :] classId = np.argmax(scores) confidence = scores[classId] if confidence > confThreshold : center_x = int(detection[0] * frameWidth) center_y = int(detection[1] * frameHeight) width = int(detection[2] * frameWidth) height = int(detection[3] * frameHeight) left = int(center_x - width / 2) top = int(center_y - height / 2) classIds.append(classId) confidences.append(float(confidence)) boxes.append([left, top, width, height]) # Perform non maximum suppression to eliminate redundant overlapping boxes with # lower confidences. indices = cv2.dnn.NMSBoxes(boxes, confidences, confThreshold, nmsThreshold) for i in indices : i = i[0] box = boxes[i] left = box[0] top = box[1] width = box[2] height = box[3] drawPred(classIds[i], confidences[i], left, top, left + width, top + height) # Process inputs winName = 'Deep learning object detection in OpenCV' cv2.namedWindow(winName, cv2.WINDOW_NORMAL) if (args.image) : if not os.path.isfile(args.image) : print('Input image file {} does not exist.'.format(args.image)) sys.exit(1) frame = cv2.imread(args.image, cv2.IMREAD_ANYCOLOR) outputFile = args.image[:-4] + '_yolov3_out.png' # Create a 4D blob from a frame. blob = cv2.dnn.blobFromImage(frame, 1 / 255, (inpWidth, inpHeight), [0, 0, 0], 1, crop = False) # Sets the input to the network net.setInput(blob) # Runs the forward pass to get output of the output layers outs = net.forward(getOutputsNames(net)) # Remove the bounding boxes with low confidence postprocess(frame, outs) cv2.imshow(winName, frame) cv2.imwrite(outputFile, frame) cv2.destroyAllWindows()Copy the code
Test program output
The resources
- Github.com/pjreddie/da…
- Docs.opencv.org/3.4.3/da/d9…
- Xugaoxiang.com/2019/12/16/…