preface
In many scenarios, people need to wear masks. Manual inspection will waste a lot of manpower. In some aspects, mask identification can better replace manpower for detection
1. Creativity and function of works
- Although the epidemic situation in China is relatively stable, the possibility of importation from abroad cannot be ruled out. Therefore, masks should be worn in public places. However, some people enter public places without masks in the hope of luck. Hence the idea of designing an algorithm that can automatically detect whether pedestrians are wearing masks and put it into action.
- The final effect of the algorithm is to capture the video stream data of the video interface as input to the algorithm for processing, or directly input the image to the algorithm to output the completed video or picture, including face detection and mask identification mark.
2. Design ideas and concepts
The algorithm is mainly divided into three parts.
2.1 Mask identification
- Wearing a mask and is a very clear without a mask of binary classification problems, by wearing a mask face and don’t wear a face mask face two data sets for image feature extraction, using a classifier to classify, because the cover many factors in reality, the classification of the probability adjust (more than 70% is likely to be considered with the mask). After model evaluation and optimization, the model is saved for use by the next two modules.
Author :30500 Date: February 26, 2021 Introduction:
Construct a parameter parser and parse the parameters
ap = argparse.ArgumentParser()
ap.add_argument("-d"."--dataset", default = "dataset".help="path to input dataset")
ap.add_argument("-p"."--plot".type=str, default="plot.png".help="path to output loss/accuracy plot")
ap.add_argument("-m"."--model".type=str, default="mask_detector.model".help="path to output face mask detector model")
args = vars(ap.parse_args())
Set the initial learning rate of initialization, the number of iterations to train and the size of batches of images to read
INIT_LR = 1e-4
EPOCHS = 20
BS = 32
Get the list of images in the dataset directory, then initialize the data list (that is, the image) and the image label
print("[INFO] loading images...")
imagePaths = list(paths.list_images(args["dataset"]))
data = []
labels = []
# Loop in the image path to read in training data and tag data
for imagePath in imagePaths:
Extract class tags from file names using path separators
label = imagePath.split(os.path.sep)[-2]
# Test code
# label1 = imagePath.split(os.path.sep)[1]
# print(label, label1)
Load the input image and preprocess it to read the input with a size of (224*224) for the MobileNetV2 model
image = load_img(imagePath, target_size=(224.224))
image = img_to_array(image)
image = preprocess_input(image)
Add image data and corresponding tag data to the list
data.append(image)
labels.append(label)
Convert data and labels to NumPy array format for easy data processing
data = np.array(data, dtype="float32")
labels = np.array(labels)
# Convert the with_mask without_mask tag to binary mode and one HOT format
lb = LabelBinarizer()
labels = lb.fit_transform(labels)
labels = to_categorical(labels)
Divide the data
# Divide the data to use 80% for training
The remaining 20% is for testing
(trainX, testX, trainY, testY) = train_test_split(data, labels, test_size=0.20, stratify=labels, random_state=42)
Build a training image generator for data enhancement
aug = ImageDataGenerator(
rotation_range=20.# Rotation range
zoom_range=0.15.# zoom range
width_shift_range=0.2.# Horizontal translation range
height_shift_range=0.2.# Vertical translation range
shear_range=0.15.# Perspective transform range
horizontal_flip=True.# Flip horizontal
fill_mode="nearest") # Fill mode
# Load MobileNetV2 network, the output layer of the data is closed, only used to extract the characteristics of the data for building their own processing model
baseModel = MobileNetV2(weights="imagenet", include_top=False, input_tensor=Input(shape=(224.224.3)))
# Construct the top of the model placed in the model header
Base model Average pooling layer full connection layer Hidden layer Dropout layer output layer
headModel = baseModel.output
headModel = AveragePooling2D(pool_size=(7.7))(headModel)
headModel = Flatten(name="flatten")(headModel)
headModel = Dense(128, activation="relu")(headModel)
headModel = Dropout(0.5)(headModel)
headModel = Dense(2, activation="sigmoid")(headModel)
# Model integration
model = Model(inputs=baseModel.input, outputs=headModel)
# freeze model layers loaded in the MobileNetV2 network are no longer updated with corresponding weights
for layer in baseModel.layers:
layer.trainable = False
# Model configuration loss function binary optimization parameters add learning rate attenuation
print("[INFO] compiling model...")
opt = Adam(lr=INIT_LR, decay=INIT_LR / EPOCHS)
model.compile(loss="binary_crossentropy", optimizer=opt, metrics=["accuracy"])
# Training network
print("[INFO] training head...")
H = model.fit(
aug.flow(trainX, trainY, batch_size=BS), # Data shuffling enhancement
steps_per_epoch=len(trainX) // BS,
validation_data=(testX, testY), # Data validation of the model
validation_steps=len(testX) // BS,
epochs=2)
Copy the code
2.2 Image Detection
- The algorithm was evaluated and optimized in advance by testing the wearing of masks in a single image and printing the identification marks and recognition rate in the output image.
# Load mask detector model
print("[INFO] loading face mask detector model...")
model = load_model(args["model"])
Load the input image from disk, copy and grab the image size
image = cv2.imread(args["image"])
orig = image.copy()
(h, w) = image.shape[:2]
Construct a BLOB from the image to preprocess the image scaling input minus the mean value
blob = cv2.dnn.blobFromImage(image, 1.0, (300.300),
(104.0.177.0.123.0))
BLOB is transmitted to the model forward through the network to obtain face detection results
print("[INFO] computing face detections...")
net.setInput(blob)
detections = net.forward()
# Loop over probes
for i in range(0, detections.shape[2) :Extract the confidence (probability) associated with the test
confidence = detections[0.0, i, 2]
# Filter weak detection > 0.5 by ensuring that confidence is greater than minimum confidence
if confidence > args["confidence"] :Compute the (x, y) coordinates of the object's bounding box
box = detections[0.0, i, 3:7] * np.array([w, h, w, h])
(startX, startY, endX, endY) = box.astype("int")
Make sure the bounding box is within the size of the frame
(startX, startY) = (max(0, startX), max(0, startY))
(endX, endY) = (min(w - 1, endX), min(h - 1, endY))
Extract face ROI, convert it from BGR to RGB channel ordering, adjust it to 224x224, and preprocess it and input it to the mask model to judge whether to wear a mask
face = image[startY:endY, startX:endX]
face = cv2.cvtColor(face, cv2.COLOR_BGR2RGB)
face = cv2.resize(face, (224.224))
face = img_to_array(face)
face = preprocess_input(face)
face = np.expand_dims(face, axis=0)
# Use the model to determine whether a face has a mask
(mask, withoutMask) = model.predict(face)[0]
# Determine the class label and color that will be used to draw borders and text
label = "Mask" if mask > 0.75 else "No Mask"
color = (0.255.0) if label == "Mask" else (0.0.255)
# Include probabilities in tags
label = "{}: {:.2f}%".format(label, max(mask, withoutMask) * 100)
# Display the label and border rectangle on the output box
cv2.putText(image, label, (startX, startY - 10),
cv2.FONT_HERSHEY_SIMPLEX, 0.45, color, 2)
cv2.rectangle(image, (startX, startY), (endX, endY), color, 2)
# Display the output image
cv2.imwrite ("result/people_mask_result.jpg", image)
cv2.imshow("Output", image)
cv2.waitKey(0)
Copy the code
2.3 Detection of video stream data
- The number of frames per second is 20. The processed pictures are integrated and output to achieve the purpose of video detection. The real-time video retrieval and processing output is used to achieve the effect of real-time detection.
- The implementation should adopt not only the migration of one model, but also the migration of different models to train and observe the accuracy of the final test set to determine which model to choose.
masknet = load_model('mask_detector.model')
def Iou(bbox1, bbox2) :
# calculation Iou
# bbox1,bbox = xyxy
area1 = (bbox1[2] - bbox1[0]) * (bbox1[3] - bbox1[1])
area2 = (bbox2[2] - bbox2[0]) * (bbox2[3] - bbox2[1])
w = min(bbox1[3], bbox2[3]) - max(bbox1[1], bbox2[1])
h = min(bbox1[2], bbox2[2]) - max(bbox1[0], bbox2[0])
if w <= 0 or h <= 0:
return 0
area_mid = w * h
return area_mid / (area1 + area2 - area_mid)
def GetFace(in_path, out_path, maskNet) :
# in_path is the path of the input image folder
# out_path is the path to the output image folder
files = os.listdir(in_path)
face_detector = hub.Module(name="pyramidbox_lite_server")
for i in range(len(files)):
faces = []
preds = []
# Every image in the file
img = cv2.imread(in_path + '/%d.jpg' % i)
result = face_detector.face_detection(images=[img])
img = img_to_array(img)
data = result[0] ['data']
bbox_upgrade = []
index = []
for j in range(len(data)):
# each bBox in the picture
left, right = int(data[j]['left']), int(data[j]['right'])
top, bottom = int(data[j]['top']), int(data[j]['bottom'])
bbox = (left, top, right, bottom)
if right > 1600 and bottom > 1600:
for k in range(len(bbox_buffer)):
if Iou(bbox, bbox_buffer[k]) > 0.1 and k not in index:
index.append(k)
break
bbox_upgrade.append((left, top, right, bottom))
else:
preds.append([left, top, right, bottom])
faces.append(img[top:bottom, left:right])
bbox_buffer = bbox_upgrade.copy()
if len(faces) > 0:
count = 0
for face in faces:
face = cv2.cvtColor(face, cv2.COLOR_BGR2RGB)
face = cv2.resize(face, (224.224))
face = img_to_array(face)
face = preprocess_input(face)
face = np.expand_dims(face, axis=0)
(mask, withoutMask) = maskNet.predict(face)[0]
lable = "Mask" if mask > withoutMask else "No Mask"
color = (0.255.0) if lable == "Mask" else (0.0.255)
lable = "{}:{:.2f}%".format(lable, max(mask, withoutMask) * 100)
cv2.putText(img, lable, (preds[count][0], preds[count][1] - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.45, color, 2)
cv2.rectangle(img, (preds[count][0], preds[count][1]), (preds[count][2], preds[count][3]), color, 2)
count += 1
cv2.imwrite(out_path + '/%d.jpg' % i, img)
print('Processing {} graph'.format(i))
GetFace('img'.'demo', masknet)
Copy the code
3. Implementation path
Algorithm flow chart
3.1 Data set and evaluation indicators
- Wearing a mask and not wearing a mask of a total of 5000 images, through the pretreatment of the image enhancement method to increase the number of samples again, testing and training for the proportion of 2:8, better make updating weighted model study, in order to prevent the classifier of fitting phenomenon, again with no trained pictures of 1000 model test. Model evaluation method: The evaluation index of model optimization adopts dual judgment of accuracy and precision.
Training data
Model to evaluate
3.2 Mask identification part
- First, a five-layer full connection is used to extract image features. Not only is the training speed slow, the feature extraction effect is not good, and the final test effect is only 0.62.
- After migrating VGG16 and MobileNetV2 to carry out feature extraction, both of them achieved good training results within 20 iterations, and the test effect on the test set also reached 0.95 accuracy. However, due to the large structure of VGG model, there may be a large requirement on hardware in the actual deployment, while the latter is small and fast. The feature can be deployed to the mobile terminal, and the latter is finally selected as the feature extraction model.
3.3 Image detection
- Using trained caffe model of face recognition in advance to face region of images to extract, incredible rate is 0.5, the first stage to detect human faces into the identification model of the training, through the model to predict later add a human face on the output image and the result of the identification of a specific and probability annotation, analysis the existing problems and advanced optimization model.
Testing part
3.4 Video detection
- Camera for testing to be obtained, using OpenCV API will be processed picture framing after merging the output, but limited to computer work force problem, real-time detection met the phenomenon of the card frame, so the data to replace the camera for a video for testing, the final testing effect is good, but for complex video information, Caffe cannot perform detection well, so the video module model is replaced with YOLO3 to achieve good detection results.
-
Video processing
4. Application value
- Basic application in public health prevention and control. Cross infection in order to avoid contact with air, wearing a mask is indispensable, but the hospital daily foot traffic, security and other inspectors often overwhelmed, masks identification can be used in place of artificial identification, wore the situation through the entrance guard of hospital personnel inspection, can reduce human input.
- The extension can be applied to abnormal detection in various safety directions, such as fire safety detection in factories, abnormal detection in mines, etc., which can timely find abnormal events and warn, and effectively avoid the occurrence of some accidents. It can also be trained again for the detection of criminal investigation, and cooperate with the real-time acquisition of cameras to detect suspects on the run, which can increase the detection rate and help the detection of cases.
- Especially during the epidemic period, the country has increased a lot of human and material resources in health prevention and control, and replacing manual detection with machine mask identification can greatly reduce the input of human resources, improve the efficiency of epidemic work, reduce economic expenditure on epidemic and reduce the spread of epidemic.
conclusion
The essence of the algorithm is feature detection in local areas, and it can output feedback through real-time capture and processing of video stream data to achieve the real-time detection effect of the target. In data processing, a variety of masks wearing data sets are used for training, which can detect and recognize masks of different shapes and colors. At the same time, faces with different manifestations are processed, which can recognize irregular faces such as side faces, as well as a large number of faces within the range.