This article mainly introduces how to use their own data set training DeepLabv3+ segmentation algorithm, code using official source code.
1. Introduction to code
The official source code for the current version of TensorFlow is used. It was chosen because the code is comprehensive. In addition to the code implementation, there is a lot of documentation to help understand and use, as well as the code implementation for model transformation.
Models /research/ Deeplab at Master · tensorflow/ Models
Next, the first code warehouse for a simple introduction, because I only care about the implementation of the training code in the use of the code warehouse, and ignore the other content, take a lot of detours, to the back of the discovery of the content I want, the warehouse has long (==).
In the current implementation, we support the following network backbone:
MobileNetv2
andMobileNetv3
: a fast networking architecture for mobile devicesXception
: powerful network structure for server-side deploymentResNet-v1-{50, 101}
: We offer originalResNet-v1
And its"Beta"
Variant, which is right"Stem"
Modifications were made for semantic segmentation.PNASNet
: a powerful network structure discovered through a neural architecture search.Auto-Deeplab
(Called in the codeHNASNet
) : subdivision specific network backbone found through a neural architecture search.
This directory contains the TensorFlow implementation. We provide code that allows users to train models, evaluate results against mIOU (mean intersection summation), and visualize segmentation results. Take PASCAL VOC 2012 and Cityscapes semantic segmentation benchmarks as examples.
Several important files in the code:
datasets/
: This folder contains the processing code for the training data set, mainly forPASCAL VOC 2012
andCityscapes
Processing of data sets.g3doc/
: This folder contains multiple filesMarkdown
Files, very useful, how to install, FAQs, etc.deeplab_demo.ipynb
: This file is given how to perform semantic segmentation of an image and display the results of the Demo.export_model.py
: The document provides the training to becheckpoint
Models to.pb
File code implementation.train.py
: Training code file. During training, you need to specify the provided training parameters.eval.py
: Validates the code and outputs the mIOU to evaluate the model.vis.py
: Visual code.
2, installation,
Deeplab relies on the following libraries:
- Numpy
- Pillow 1.0
- tf Slim (which is included in the “tensorflow/models/research/” checkout)
- Jupyter notebook
- Matplotlib
- Tensorflow
2.1 Adding libraries toPYTHONPATH
When running locally, tensorflow/models/research/directory should be appended to the PYTHONPATH, as follows:
# From tensorflow/models/research/
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim
# [Optional] for panoptic evaluation, you might need panopticapi:
# https://github.com/cocodataset/panopticapi
# Please clone it to a local directory ${PANOPTICAPI_DIR}
touch ${PANOPTICAPI_DIR}/panopticapi/__init__.py
export PYTHONPATH=$PYTHONPATH:${PANOPTICAPI_DIR}/panopticapi
Copy the code
Note: This command needs to run on each new terminal you start. If you want to avoid running this command manually, you can add it as a new line to the end of the ~ /.bashrc file.
2.2 Checking whether the installation is successful
Quick test by running model_test.py:
# From tensorflow/models/research/
python deeplab/model_test.py
Copy the code
Quickly run all code on PASCAL VOC 2012 dataset:
# From tensorflow/models/research/deeplab
sh local_test.sh
Copy the code
3. Data set preparation
Final goal: generate data in TFRecord format
The data set directory structure is as follows:
+dataset # dataset name +image +mask +index - train.txt - trainval.txt - val.txt + TFRecordCopy the code
- Image: Original image, RGB color image
- mask: Mask image with pixel value as category label, single channel, same as the name of the original image, suffix is
.jpg
and.png
Either is fine, as long as it reads consistently in the code. The VOC dataset default is.jpg
, mask image is.png
. - index: stores the image file name
txt
File (without suffix) - tfrecord: Store to convert
tfrecord
Format image data
Data set production process:
- Annotate data and make the required ones
mask
image - The data set is divided into training set, validation set and test set
- generate
TFRecord
Format data set
3.1 Annotation Data
The data of the training set consists of two parts, one is the original image, the other is the annotation value of the corresponding classification (called mask image in this paper).
How is the value of the mask image set? According to the classification number of image segmentation, the mask image corresponding to the original image is made. If there are a total of N categories (background as one), the value range of mask image is [0~N]. 0 value is used as background value, and the values of other segmentation categories are successively set as 1, 2…, n-1.
Note:
ignore_label
: is literally an ignored label, i.eignore_label
Refers to unlabeled pixels, that is, pixel values that do not need to be predicted, therefore, it does not participateloss
The value is computed inmask
The value is denoted in the graph255
.mask
The image is a single – channel grayscale image.mask
There is no limit on the image format, but all mask images adopt the same image format to facilitate data reading.
The values of mask images are divided into three categories:
- Background:
0
said - Category Category: Used
1, 2,... , N-1
said ignore_label
Value:255
said
If there are fewer categories, the generated mask image will look black, because the values of the categories are small and it is not easy to display within the range of 0~255.
3.2 Split the data set
This part is to divide the prepared data set into training set, verification set and test set. There is no need to divide the specific image files into three folders, only need to establish the index file of the image, by adding the corresponding path + file name can obtain the specific image.
Assume that the storage path of the original image and mask image is as follows:
- The original:
./dataset/images
mask
Image:./dataset/mask
: The storage here isSection 2.1Required format
The original image corresponds to the mask image one by one, including image size and image name (suffix can be different).
The index file is stored in the./dataset/index directory.
train.txt
trainval.txt
val.txt
In the index file, only the file name (no suffix) is recorded, depending on how the data set is loaded in the code.
3.3 Package data asTFRecord
format
TFRecord is a binary file format recommended by Google, which in theory can store information in any format. TFRecord uses the “Protocol Buffer” binary data encoding scheme internally, which only takes up a memory block and only needs to load a binary file at a time. It is simple and fast, especially friendly to large training data. Moreover, when the amount of training data is relatively large, the data can be divided into multiple TFRecord files to improve the processing efficiency.
So, how do you generate data into TFRecord format?
/datasets/build_voc2012_data.py in the project code. To the file is VOC2012 data set processing code, we just need to modify the input parameters.
Parameters:
image_folder
: Original image folder name,./dataset/image
semantic_segmentation_folder
: Split folder name,./dataset/mask
list_folder
: index folder name,./dataset/index
output_dir
: Indicates the generated output pathtfrecord
The location of the file,./dataset/tfrecord
Run the command:
python ./datasets/build_voc2012_data.py --image_folder=./dataset/image
--semantic_segmentation_folder=./dataset/mask
--list_folder=./dataset/index
--output_dir=./dataset/tfrecord
Copy the code
The generated file is as follows:
_NUM_SHARDS
4
_NUM_SHARDS
The core code of this file is as follows:
# dataset_split refers to train.txt, val.txt, etc
dataset = os.path.basename(dataset_split)[:4 -]
filenames = [x.strip('\n') for x in open(dataset_split, 'r')] # file name list
Print the tfRecord file name
output_filename = os.path.join(
FLAGS.output_dir,
'%s-%05d-of-%05d.tfrecord' % (dataset, shard_id, _NUM_SHARDS))
with tf.python_io.TFRecordWriter(output_filename) as tfrecord_writer:
for i in range(start_idx, end_idx):
image_filename = os.path.join(iamge_folder, filenames[i]+'. '+image_format)# Original path
image_data = tf.gfile.GFile(image_filename, 'rb').read() # Read the original file
height, width = image_reader.read_image_dims(image_data)
seg_filename = os.path.join(semantic_segmentation_folder,
filenames[i] + '. ' + label_format) # mask image path
seg_data = tf.gfile.GFile(seg_filename, 'rb').read() # Read the segmented image
seg_height, seg_width = label_reader.read_image_dims(seg_data)
# Determine whether the original image and mask image size match
ifheight ! = seg_heightorwidth ! = seg_width:raise RuntimeError('Shape mismatched between image and label.')
# Convert to tf example.
example = build_data.image_seg_to_tfexample(
image_data, filenames[i], height, width, seg_data)
tfrecord_writer.write(example.SerializeToString())
Copy the code
At this point, the production part of the dataset is complete!!
4, training,
4.1 Code Modification
To train your own data set, you need to modify the following files:
1 datasets/data_generator.py
: Adds the registration of the dataset
This file provides a wrapper for semantically partitioned data
In this file, you can see the data descriptions of PASCAL_VOC, CITYSCAPES, and ADE20K datasets as follows:
_PASCAL_VOC_SEG_INFORMATION = DatasetDescriptor(
splits_to_sizes={
'train': 1464.'train_aug': 10582.'trainval': 2913.'val': 1449,
},
num_classes=21,
ignore_label=255.)Copy the code
En, for example, adds descriptive information to our own data set as follows:
_PORTRAIT_INFORMATION = DatasetDescriptor(
splits_to_sizes={
'train': 17116.'trainval': 21395.'val': 4279,
},
num_classes=2.# Number of categories, including backgrounds
ignore_label=255.# Ignore pixel values
)
Copy the code
Take the portrait segmentation task for example, there are only two categories, namely foreground (portrait) and background (non-human).
After adding the description, you need to register the data set as follows:
_DATASETS_INFORMATION = {
'cityscapes': _CITYSCAPES_INFORMATION,
'pascal_voc_seg': _PASCAL_VOC_SEG_INFORMATION,
'ade20k': _ADE20K_INFORMATION,
'portrait_seg': _PORTRAIT_INFORMATION, # Add this sentence
}
Copy the code
Note: The dataset name must correspond to the previous one!
2 ./utils/train_utils.py
Modify the
In get_model_init_fn, change to the following code, increase the logits layer does not load the pretraining model weight:
# Variables that will not be restored.
exclude_list = ['global_step'.'logits']
if not initialize_last_layer:
exclude_list.extend(last_layers)
Copy the code
4.2 Main training parameters
The training files train.py and common.py contain all the parameters needed to train a split network.
model_variant
:Deeplab
Model variables, optional values visiblecore/feature_extractor.py
.- When using
mobilenet_v2
, set the variablestrous_rates=decoder_output_stride=None
; - When using
xception_65
orresnet_v1
When settingStrous_rates =[6,12,18](output stride 16), decoder_output_stride=4
.
- When using
label_weights
: This variable can set the weight value of labels. When category imbalance occurs in the data set, this variable can be used to specify the weight value of labels of each categoryLabel_weights = [0.1, 0.5]
This means that tag 0 has a weight of 0.1 and tag 1 has a weight of 0.5. If the value isNone
, all labels have the same weight1.0
.train_logdir
Deposit:checkpoint
andlogs
The path.log_steps
: Indicates the interval at which the log information is output.save_interval_secs
: This value indicates how often, in seconds, model files are saved to hard disk.optimizer
: optimizer, optional value['momentum', 'adam']
.learning_policy
: Learning rate policy, optional value['poly', 'step']
.base_learning_rate
: Basic learning rate, default value0.0001
.training_number_of_steps
: Number of iterations of model training.train_batch_size
: Number of batch images for model training.train_crop_size
: Image size used for model training, default'513, 513'
.tf_initial_checkpoint
: Pre-training model.initialize_last_layer
: Whether to initialize the last layer.last_layers_contain_logits_only
: Whether only the logical layer is considered as the last layer.fine_tune_batch_norm
: Fine tuningbatch norm
Parameters.atrous_rates
Default value:[6, 12, 18]
.output_stride
Default value:16
, the ratio of input and output spatial resolution- for
xception_65
If theoutput_stride=8
, use theatrous_rates=[12, 24, 36]
- if
output_stride=16
,atrous_rates=[6, 12, 18]
- for
mobilenet_v2
, the use ofNone
- Note: Different ones can be used during the training and validation phases
strous_rates
andoutput_stride
.
- for
dataset
: The split data set used, the same as the name used when the data set was registered.train_split
: Which data set to use for training, the optional value is the value of the data set at the time of registration, such astrain
.trainval
.dataset_dir
: Indicates the path where the data set is stored.
For training parameters, the following points need to be paid attention to:
-
As for whether to load the weight of the pre-training network, the following parameters need to be paid attention to in order to fine-tune the network on other data sets:
- Using pretraining network weights, set
initialize_last_layer=True
- Web-only
backbone
To set upinitialize_last_layer=False
andlast_layers_contain_logits_only=False
- Use all pre-training weights except
logits
To set upinitialize_last_layer=False
andlast_layers_contain_logits_only=True
Since my dataset classification is different from the default category number, the parameter values taken are:
--initialize_last_layer=false --last_layers_contain_logits_only=true Copy the code
- Using pretraining network weights, set
-
A few tips for training your own data set if your resources are limited:
- Set up the
output_stride=16
Or even32
(Also needs to be modifiedatrous_rates
Variables, for example, foroutput_stride=32
.atrous_rates=[3, 6, 9]
) - Use as much as possible
GPU
To change thenum_clone
Mark, and willtrain_batch_size
Set it as large as possible - Adjust the
train_crop_size
, you can make it smaller, for example513x513
(even321x321
), so you can use a bigger onebatch_size
- Use a smaller network backbone, such as
mobilenet_v2
- Set up the
-
About fine tuning BATCH_norm Set fine_tunE_batch_norm =True when the batch size train_batch_size is greater than 12 (preferably greater than 16). Otherwise, set fine_tune_batch_norm=False.
4.3 Pre-training model
Models /model_zoo.md at master · tensorflow/models
Pretraining models are provided on several data sets, including (1) PASCAL VOC 2012, (2) Cityscapes, and (3) ADE20K
Undecompressed items include:
- a
frozen inference graph
(forzen_inference_graph.pb
). By default, all frozen reasoning diagrams have an output step of 8, a single Eval Scale of 1.0, and no left/right flip unless otherwise specified. Based on theMobileNet-v2
The model does not include a decoder module. - a
checkpoint
(model.ckpt.data-00000-of-00001
.model.ckpt.index
)
Also provides the ability to pre-train during ImageNet checkpoints
Undecompressed files include: a model checkpoint (model.ckpt.data-00000-of-00001, model.ckpt.index)
Download according to their own situation!
4.4 Training model
python train.py \
--logtostderr \
--training_number_of_steps=20000 \
--train_split="train" \
--model_variant="xception_65" \
--train_crop_size="513513" \
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride=16 \
--decoder_output_stride=4 \
--train_batch_size=2 \
--save_interval_secs=240 \
--optimizer="momentum" \
--leraning_policy="poly" \
--fine_tune_batch_norm=false \
--initialize_last_layer=false \
--last_layers_contain_logits_only=true \
--dataset="portrait_seg" \
--tf_initial_checkpoint="./checkpoint/deeplabv3_pascal_trainval/model.ckpt" \
--train_logdir="./train_logs" \
--dataset_dir="./dataset/tfrecord"
Copy the code
4.5 Verifying the Model
The verification code is./eval.py
# From tensorflow/models/research/
python deeplab/eval.py \
--logtostderr \
--eval_split="val" \
--model_variant="xception_65" \
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride=16 \
--decoder_output_stride=4 \
--eval_crop_size="513513" \
--dataset="portrait_seg" \ Data set name
--checkpoint_dir=${PATH_TO_CHECKPOINT} \ # Pre-training model
--eval_logdir=${PATH_TO_EVAL_DIR} \
--dataset_dir="./dataset/tfrecord" Data set path
Copy the code
The results are as follows:
4.6 Visualization of training process
You can use Tensorboard to check the progress of training and evaluation efforts. With the recommended directory structure, Tensorboard can be run using the following command:
tensorboard --logdir=${PATH_TO_LOG_DIRECTORY}
# text log address
tensorboard --logdir="./train_logs"
Copy the code
5, reasoning
5.1 Model Export
During training, model files are saved to hard disk as follows:
TensorFlow
checkpoint
export_model.py
checkpoint
.pb
Export_model. py Main parameters:
checkpoint_path
: Training saved checkpoint filesexport_path
: Model export pathnum_classes
: Classification categorycrop_size
: Image size,[513, 513]
atrous_rates
:12, 24, 36
output_stride
:8
The. Pb file is as follows:
5.2 Inference on a single image
class DeepLabModel(object):
"""class to load deeplab model and run inference"""
INPUT_TENSOR_NAME = 'ImageTensor:0'
OUTPUT_TENSOR_NAME='SemanticPredictions:0'
INPUT_SIZE = 513
FROZEN_GRAPH_NAME= 'frozen_inference_graph'
def __init__(self, pretrained_weights):
"""Creates and loads pretrained deeplab model."""
self.graph = tf.Graph()
graph_def = None
# Extract frozen graph from tar archive
if pretrained_weights.endswith('.tar.gz'):
tar_file = tarfile.open(pretrained_weights)
for tar_info in tar_file.getmembers():
if self.FROZEN_GRAPH_NAME in os.path.basename(tar_info.name):
file_handle = tar_file.extractfile(tar_info)
graph_def = tf.GraphDef.FromString(file_handle.read())
break
tar_file.close()
else:
with open(pretrained_weights, 'rb') as fd:
graph_def = tf.GraphDef.FromString(fd.read())
if graph_def is None:
raise RuntimeError('Cannot find inference graph in tar archive.')
with self.graph.as_default():
tf.import_graph_def(graph_def, name=' ')
gpu_options = tf.GPUOptions(allow_growth=True)
config = tf.ConfigProto(gpu_options=gpu_options, log_device_placement=False)
self.sess = tf.Session(graph=self.graph, config=config)
def run(self, image):
"""Runs inference on a single image. Args: image: A PIL.Image object, raw input image. Returns: resized_image:RGB image resized from original input image. seg_map:Segmentation map of 'resized_iamge'. """
width, height = image.size
resize_ratio = 1.0 * self.INPUT_SIZE/max(width, height)
target_size = (int(resize_ratio*width), int(resize_ratio * height))
resized_image = image.convert('RGB').resize(target_size, Image.ANTIALIAS)
batch_seg_map = self.sess.run(
self.OUTPUT_TENSOR_NAME,
feed_dict={self.INPUT_TENSOR_NAME:[np.asarray(resized_image)]}
)
seg_map = batch_seg_map[0]
return resized_image, seg_map
if __name__ == '__main__':
pretrained_weights = './train_logs/frozen_inference_graph_20000.pb'
MODEL = DeepLabModel(pretrained_weights) # Load model
img_name = 'test.jpg'
img = Image.open(img_name)
resized_im, seg_map = MODEL.run(original_im) # get results
seg_map[seg_map==1] =255 # Set the pixel value of the portrait to 255
seg_map.save('output.jpg') # Save mask result image
Copy the code
At this point, the whole training process is over!!