Open source
Since yoloV5 has been open sourced, it has (for whatever reason) gained a lot of attention, and I’ve recently implemented major parts of it with TensorFlow. Possibly the first pure tensorfow2 release, please welcome Try and Star:
Github.com/LongxingTan…
I have come into contact with Yolov3 in my work before (running demo should be considered as contact), and the effect is amazing. I was a rookie in the field of vision (sadly, I was a middle-aged man and a rookie everywhere), so my skills were limited and my mistakes were inevitable. It was a good experience for me to implement it from scratch and work out some of the details.
As noted in the READme, the main features are as follows:
- Pure tensorFlow2 implementation
- Use yamL files to configure and control the model size
- Support custom data training
- Mosaic data enhancement
- Matches anchor by IOU or aspect ratio
- Adjacent positive samples are enhanced
- Support multi-GPU training
- Relatively detailed code comments
- Many shortcomings, huge room for improvement
The principle of
Briefly review the main principles and improvements with the code. Zhihu has a lot of very good analysis of the article can refer to, especially the following several. If possible, reading the code directly should be clearer and more detailed.
Jiang Baibai: Yolo series Yolov5 core basic knowledge complete explanation depth eye: attack after wave Yolov5 depth visualization analysis target detection: Yolov5 collection of hundred long
Model picture from @Jiang Dabai
Matches anchor by aspect ratio or IOU
In v3, anchor is allocated according to IOU, and in V4 / V5, anchor is allocated according to aspect ratio. The new matching is used to solve the sensitive problem when the object is close to the grid, and can increase points.
def assign_criterion_wh(self, gt_wh, anchors, anchor_threshold):
# return: please note that the v5 default anchor_threshold is 4.0, related to the positive sample augment
gt_wh = tf.expand_dims(gt_wh, 0) # => 1 * n_gt * 2
anchors = tf.expand_dims(anchors, 1) # => n_anchor * 1 * 2
ratio = gt_wh / anchors # => n_anchor * n_gt * 2
matched_matrix = tf.reduce_max(tf.math.maximum(ratio, 1 / ratio),
axis=2) < anchor_threshold # => n_anchor * n_gt
return matched_matrix
def assign_criterion_iou(self, gt_wh, anchors, anchor_threshold):
# by IOU, anchor_threshold < 1
box_wh = tf.expand_dims(gt_wh, 0) # => 1 * n_gt * 2
box_area = box_wh[..., 0] * box_wh[..., 1] # => 1 * n_gt
anchors = tf.cast(anchors, tf.float32) # => n_anchor * 2
anchors = tf.expand_dims(anchors, 1) # => n_anchor * 1 * 2
anchors_area = anchors[..., 0] * anchors[..., 1] # => n_anchor * 1
inter = tf.math.minimum(anchors[..., 0], box_wh[..., 0]) * tf.math.minimum(anchors[..., 1],
box_wh[..., 1]) # n_gt * n_anchor
iou = inter / (anchors_area + box_area - inter + 1e-9)
iou = iou > anchor_threshold
return iou
Copy the code
Positive sample enhancement
The balance of positive and negative samples is always a problem to be solved in the field of target detection. In V5, after the anchor is matched according to the aspect ratio, the nearest neighbor on the upper and lower or left and right sides of the matched grid is further enhanced to be a positive sample. The matched anchor and coordinates are the same as the grid initially matched
def enrich_pos_by_position(self, assigned_label, assigned_anchor, gain, matched_matrix, rect_style='rect4'): # using offset to extend more postive result, if x assigned_xy = assigned_label[..., 0:2] # n_matched * 2 offset = tf.constant([[0, 0], [1, 0], [0, 1], [-1, 0], [0, -1]], tf.float32) grid_offset = tf.zeros_like(assigned_xy) if rect_style == 'rect2': G = 0.2 # offset elif rect_style == 'rect4': Matched_up = matched_up = matched_up = matched_up = matched_up = matched_up 1] matched = (assigned_xy % 1. > (1 - g)) & (assigned_xy < tf.expand_dims(gain[0:2], 0) - 1.) matched_right = matched[:, 0] matched_down = matched[:, 1] assigned_anchor = tf.concat([assigned_anchor, assigned_anchor[matched_left], assigned_anchor[matched_up], assigned_anchor[matched_right], assigned_anchor[matched_down]], axis=0) assigned_label = tf.concat([assigned_label, assigned_label[matched_left], assigned_label[matched_up], assigned_label[matched_right], assigned_label[matched_down]], axis=0) grid_offset = g * tf.concat( [grid_offset, grid_offset[matched_left] + offset[1], grid_offset[matched_up] + offset[2], grid_offset[matched_right] + offset[3], grid_offset[matched_down] + offset[4]], axis=0) return assigned_label, assigned_anchor, grid_offsetCopy the code
Model size is controlled through yamL files
In this efficientdet method, the model size is controlled by two coefficients.
def parse_model(self, yaml_dict): anchors, nc, depth_multiple, width_multiple = yaml_dict['anchors'], yaml_dict['nc'], yaml_dict['depth_multiple'], yaml_dict['width_multiple'] num_anchors = (len(anchors[0]) // 2) if isinstance(anchors, list) else anchors output_dims = num_anchors * (nc + 5) layers = [] # # from, number, module, args for i, (f, number, module, args) in enumerate(yaml_dict['backbone'] + yaml_dict['head']): module = eval(module) if isinstance(module, str) else module # all component is a Class, initialize here, call in self.forward for j, arg in enumerate(args): try: args[j] = eval(arg) if isinstance(arg, str) else arg # eval strings, like Detect(nc, anchors) except: pass number = max(round(number * depth_multiple), 1) if number > 1 else number # control the model scale, s/m/l/x if module in [Conv2D, Conv, Bottleneck, SPP, DWConv, Focus, BottleneckCSP, BottleneckCSP2, SPPCSP, VoVCSP]: c2 = args[0] c2 = math.ceil(c2 * width_multiple / 8) * 8 if c2 ! = output_dims else c2 args = [c2, *args[1:]] if module in [BottleneckCSP, BottleneckCSP2, SPPCSP, VoVCSP]: args.insert(1, number) number = 1 modules = tf.keras.Sequential(*[module(*args) for _ in range(number)]) if number > 1 else module(*args) modules.i, modules.f = i, f layers.append(modules) return layersCopy the code
Loss function matching
class YoloLoss(object): def __init__(self, anchors, ignore_iou_threshold, num_classes, img_size, label_smoothing=0): self.anchors = anchors self.strides = [8, 16, 32] self.ignore_iou_threshold = ignore_iou_threshold self.num_classes = num_classes self.img_size = img_size self.bce_conf = tf.keras.losses.BinaryCrossentropy(reduction=tf.keras.losses.Reduction.NONE) self.bce_class = tf.keras.losses.BinaryCrossentropy(reduction=tf.keras.losses.Reduction.NONE, label_smoothing=label_smoothing) def __call__(self, y_true, y_pred): Iou_loss_all = obj_loss_all = class_loss_all = 0 balance = [1.0, 1.0, 1.0] if len(y_pred) == 3 else [4.0, 1.0, 0.4 0.1] # p3-5 or p3-6 for I, (pred, true) in enumerate(zip(y_pred, y_true)): batch_size * grid * grid * 3 * 6, pred: batch_size * grid * grid * clss+5 true_box, true_obj, true_class = tf.split(true, (4, 1, -1), axis=-1) pred_box, pred_obj, pred_class = tf.split(pred, (4, 1, -1), axis=-1) if tf.shape(true_class)[-1] == 1 and self.num_classes > 1: true_class = tf.squeeze(tf.one_hot(tf.cast(true_class, tf.dtypes.int32), depth=self.num_classes, axis=-1), -2) # prepare: Higher weights to smaller box, true_wh should be normalized to (0,1) box_scale = 2-1.0 * true_box[..., 2] * true_box[..., 3] / (self.img_size ** 2) obj_mask = tf.squeeze(true_obj, -1) # # obj or noobj, Batch_size * grid * grid * anchors_per_grid background_mask = 1.0 - obj_mask conf_focal = tf.squeeze(tf.math.pow(true_obj - pred_obj, 2), -1) # iou/giou/ciou/diou loss iou = bbox_iou(pred_box, true_box, xyxy=False, giou=True) iou_loss = (1 - iou) * obj_mask * box_scale # batch_size * grid * grid * 3 # confidence loss, Todo: multiply the iou conf_loss = self.bce_conf(true_obj, pred_obj) conf_loss = conf_focal * (obj_mask * conf_loss + background_mask * conf_loss) # batch * grid * grid * 3 # class loss # use binary cross entropy loss for multi class, so every value is independent and sigmoid # please note that the output of tf.keras.losses.bce is origial dim minus the last one class_loss = obj_mask * self.bce_class(true_class, pred_class) iou_loss = tf.reduce_mean(tf.reduce_sum(iou_loss, axis=[1, 2, 3])) conf_loss = tf.reduce_mean(tf.reduce_sum(conf_loss, axis=[1, 2, 3])) class_loss = tf.reduce_mean(tf.reduce_sum(class_loss, axis=[1, 2, 3])) iou_loss_all += iou_loss * balance[i] obj_loss_all += conf_loss * balance[i] class_loss_all += class_loss * self.num_classes * balance[i] # to balance the 3 lossCopy the code
The loss function part is not exactly the same as v5’s setup. Some optimizations are made in V5, such as the balance of different scales, such as the weight of target confidence loss, etc.
The effect
For the best results, I recommend the original PyTorch, as it is still being updated and the V4 and V5 authors are still working on it. If you’re interested in tensorFlow puzzles, or want to get to know yoloV5 in code, I think my version is a little clearer (at the cost of missing details or even getting things wrong), but you’re welcome to try it out. Effects on MNIST detection data:Effects on VOC2012 dataset (effects still to be improved) :I wouldn’t be able to run any larger data set, since only 1080Ti is available.
Github.com/LongxingTan…