Hello everyone, I am Yu Feng, today I want to share with you is [hand to hand teach you] the third talk – image segmentation, not professionals, if there is a mistake, welcome your criticism and correction.

Or the old saying, I am Yufeng, I hope that the article I share can help you and more friends. Welcome to forward or reprint ah!

Welcome to “Yufeng Code Word”

1 understanding image segmentation

Image segmentation refers to the image is divided into several non-intersecting regions according to gray scale, color, spatial texture, geometric shape and other features, so that these features in the same region show consistency or similarity, but in different regions show obvious differences. In simple terms, in an image, the object is separated from the background. For grayscale images, the pixels inside the region generally have grayscale similarity, while at the boundary of the region generally have grayscale discontinuity.

1.1 How is segmentation expressed in computers

Images exist in a computer in the form of a bunch of numbers, and for each object in a graph, the computer’s perception of them is accomplished by these numbers. As shown in the figure below, the time to mark the image is marked with number 1 for people, number 2 for packages, number 3 for leaves, number 4 for pavements and number 5 for buildings. The computer identifies each category by these numbers. Image segmentation is to detect the edges of these different numbers for segmentation.

1.2 Several sub-fields of image segmentation

Semantic segmentation: For an image, all objects (including background) can be segmented, but for the same category of objects, different individuals cannot be distinguished. Instance segmentation: to segment all objects in the image except the background, and to distinguish different individuals under the same category (for example, in the third picture, everyone is represented by different colors) panorama segmentation: on the basis of instance segmentation, background objects can be segmented. The relation between panorama segmentation and semantic segmentation is as follows: if all categories are stuff, then panorama segmentation is the same except metric and semantic segmentation. Compared with semantic segmentation, the difficulty of panoramic segmentation lies in enabling the network structure to distinguish different instances. The relation between panorama segmentation and instance segmentation is as follows: panorama segmentation is not allowed to overlap, but instance segmentation is allowed; In addition, instance segmentation requires the confidence probability of each segmentation, but panoramic segmentation does not.

1.3 Traditional segmentation methods

  1. Thresholding based segmentation method
  2. Region based image segmentation method
  3. Segmentation method based on edge detection
  4. Image segmentation method based on wavelet transform
  5. Image segmentation based on genetic algorithm
  6. Segmentation method based on active contour model

1.4 Segmentation method based on deep learning

FCN U-NET DeepLab DeconvNet SegNet PSPNet mask-RCNN These are some of the more important networks.

Figure from: Zhang Jikai, Zhao Jun, Zhang Ran, Lv Xiaoqi, Nie Junlan. A review of image instance segmentation methods based on deep learning [J]. Journal of Small Microcomputer Systems, 201,42(01):161-171.

Table cited from: Xu Hui, Zhu Yuhua, Zhen Tong, Li Zhihui. A Review of Deep neural Network image Semantic Segmentation Methods [J]. Computer Science and Exploration, 201,15(01):47-59.

2 image segmentation based on deep learning

There are many image segmentation based on deep learning. Here we introduce a simple and common code to explain some processes of image segmentation using deep learning, hoping to be helpful to readers.

2.1 Introduction to Oxford-IIIT Pet data set

The data set to be used in this tutorial is the Oxford-IIIT Pet data set, developed by Parkhi et al. To create. The dataset consists of an image, a label corresponding to the image, and a mask marked pixel by pixel. A mask is just a label for each pixel. Each pixel belongs to one of three categories: Category 1: A pixel is part of a pet.

Category 2: Pixels are pet silhouettes.

Category 3: None of the above are/peripheral pixels.

pip install -q git+https://github.com/tensorflow/examples.git
Copy the code
import tensorflow as tf
from tensorflow_examples.models.pix2pix import pix2pix
import tensorflow_datasets as tfds
tfds.disable_progress_bar()
from IPython.display import clear_output
import matplotlib.pyplot as plt
Copy the code

2.2 Oxford-IIIT Pet data set download

This dataset is already integrated into Tensorflow datasets and is just a download away. The image segmentation mask was only added in version 3.0.0, so we chose this version in particular.

dataset, info = tfds.load('oxford_iiit_pet:3.*.*', with_info=True)
Copy the code

The following code does a simple image rollover extension. Then, the image is normalized to [0,1]. Finally, as mentioned above, pixel points are marked as one of {1, 2, 3} in the image segmentation mask. For convenience, we subtracted all the segmentation masks by 1 to obtain the following labels: {0, 1, 2}.

def normalize(input_image, input_mask): Cast (input_image, tf.float32) / 255.0 input_mask -= 1 return input_image, input_mask @tf.function def load_image_train(datapoint): input_image = tf.image.resize(datapoint['image'], (128, Input_mask = tf.image.resize(datapoint['segmentation_mask'], (128, 128)) if tf.random. Uniform (()) > 0.5: input_image = tf.image.flip_left_right(input_image) input_mask = tf.image.flip_left_right(input_mask) input_image, input_mask = normalize(input_image, input_mask) return input_image, input_mask def load_image_test(datapoint): input_image = tf.image.resize(datapoint['image'], (128, 128)) input_mask = tf.image.resize(datapoint['segmentation_mask'], (128, 128)) input_image, input_mask = normalize(input_image, input_mask) return input_image, input_maskCopy the code

The dataset already contains the required test and training set partitions, so we continue with the same partitions.

TRAIN_LENGTH = info.splits['train'].num_examples
BATCH_SIZE = 64
BUFFER_SIZE = 1000
STEPS_PER_EPOCH = TRAIN_LENGTH // BATCH_SIZE
​
train = dataset['train'].map(load_image_train, num_parallel_calls=tf.data.experimental.AUTOTUNE)
test = dataset['test'].map(load_image_test)
​
train_dataset = train.cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE).repeat()
train_dataset = train_dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
test_dataset = test.batch(BATCH_SIZE)
Copy the code

Let’s take a look at an image in the dataset and its corresponding mask.

def display(display_list):
  plt.figure(figsize=(15, 15))
​
  title = ['Input Image', 'True Mask', 'Predicted Mask']
​
  for i in range(len(display_list)):
    plt.subplot(1, len(display_list), i+1)
    plt.title(title[i])
    plt.imshow(tf.keras.preprocessing.image.array_to_img(display_list[i]))
    plt.axis('off')
  plt.show()
​
for image, mask in train.take(1):
  sample_image, sample_mask = image, mask
display([sample_image, sample_mask])
Copy the code

2.3 Defining the Model

The model used here is a modified VERSION of U-NET. U-net consists of an encoder (downsampler) and a decoder (upsampler). In order to learn robust features and reduce the number of trainable parameters, a pre-training model can be used as an encoder. Therefore, the encoder in this task will use a pre-trained MobileNetV2 model whose intermediate output values will be used. The decoder will use the frequency rise sampling module implemented in the Pix2pix Tutorial in TensorFlow Examples. The number of output channels is 3 because each pixel has three possible labels. Think of this as a multi-category classification, with each pixel being placed into three categories.

OUTPUT_CHANNELS = 3
Copy the code

As mentioned earlier, the encoder is a pre-trained MobileNetV2 model that is ready for direct use in tF.keras.applications. The encoder contains some specific output from the middle layer of the model. Note that the encoder is not trained during model training.

base_model = tf.keras.applications.MobileNetV2(input_shape=[128, 128, 3], Include_top =False) # Set layer_names = ['block_1_expand_relu', # 64x64 'block_3_expand_relu', # 32x32 'block_6_expand_relu', # 16x16 'block_13_expand_relu', # 8x8 'block_16_project', Layers = [base_model.get_layer(name).output for name in layer_names tf.keras.Model(inputs=base_model.input, outputs=layers) down_stack.trainable = FalseCopy the code

Downloading data from storage.googleapis.com/tensorflow/… 9412608/9406464 [==============================] – 0s 0us/step

Decoder/frequency rise sampler is a simple series of frequency rise sampling modules that have been implemented in the TensorFlow examples.

up_stack = [
    pix2pix.upsample(512, 3),  # 4x4 -> 8x8
    pix2pix.upsample(256, 3),  # 8x8 -> 16x16
    pix2pix.upsample(128, 3),  # 16x16 -> 32x32
    pix2pix.upsample(64, 3),   # 32x32 -> 64x64
]
Copy the code
def unet_model(output_channels): inputs = tf.keras.layers.Input(shape=[128, 128, 3]) inputs skips = down_stack(x) x = reversed(skips[:-1]) skip in zip(up_stack, skips): x = up(x) concat = tf.keras.layers.Concatenate() x = concat([x, The skip]) # this is the last layer of the model last = tf. Keras. The layers. Conv2DTranspose (output_channels, 3, strides = 2, padding='same') #64x64 -> 128x128 x = last(x) return tf.keras.Model(inputs=inputs, outputs=x)Copy the code

2.4 Training Model

Now all that is left is to compile and train the model. The loss function used here is losses. Sparse_categorical_crossentropy. This loss function is used because the neural network tries to assign a label to each pixel, the same as multicategory prediction. In the correct segmentation mask, the value of each pixel point is one of {0,1,2}. The neural network also outputs three channels. Essentially, each channel is trying to learn to predict a category, and losses. Sparse_categorical_crossentropy is the recommended loss function in this case. According to the output value of the neural network, the label assigned to each pixel is the category represented by the channel with the highest output value. That’s what the create_mask function does.

model = unet_model(OUTPUT_CHANNELS)
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
Copy the code

Let’s try to run the model and see what it predicts before training.

def create_mask(pred_mask):
  pred_mask = tf.argmax(pred_mask, axis=-1)
  pred_mask = pred_mask[..., tf.newaxis]
  return pred_mask[0]
​
def show_predictions(dataset=None, num=1):
  if dataset:
    for image, mask in dataset.take(num):
      pred_mask = model.predict(image)
      display([image[0], mask[0], create_mask(pred_mask)])
  else:
    display([sample_image, sample_mask,
             create_mask(model.predict(sample_image[tf.newaxis, ...]))])
​
show_predictions()
Copy the code

Let’s look at how the model improves with training. To do this, we define a callback function.

class DisplayCallback(tf.keras.callbacks.Callback):
  def on_epoch_end(self, epoch, logs=None):
    clear_output(wait=True)
    show_predictions()
    print ('\nSample Prediction after epoch {}\n'.format(epoch+1))
​
EPOCHS = 20
VAL_SUBSPLITS = 5
VALIDATION_STEPS = info.splits['test'].num_examples//BATCH_SIZE//VAL_SUBSPLITS
​
model_history = model.fit(train_dataset, epochs=EPOCHS,
                          steps_per_epoch=STEPS_PER_EPOCH,
                          validation_steps=VALIDATION_STEPS,
                          validation_data=test_dataset,
                          callbacks=[DisplayCallback()])
Copy the code

Sample Prediction after 57/57 epoch 20 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] – 3 s 54 ms/step – loss: 0.1308 accuracy: 0.9401-VAL_loss: 0.3246-VAL_accuracy: 0.8903

loss = model_history.history['loss']
val_loss = model_history.history['val_loss']
​
epochs = range(EPOCHS)
​
plt.figure()
plt.plot(epochs, loss, 'r', label='Training loss')
plt.plot(epochs, val_loss, 'bo', label='Validation loss')
plt.title('Training and Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss Value')
plt.ylim([0, 1])
plt.legend()
plt.show()
Copy the code

2.5 Model prediction

Let’s make a few predictions. To save time, only a small number of epochs are used here, but you can set more to get more accurate results.

show_predictions(test_dataset, 2)
Copy the code

At this point, the end of today’s sharing, I hope that through the above sharing, you can learn the basic process of image segmentation, the basic process. It is strongly recommended that novices can follow the above steps step by step practice down, there will be harvest.

Today’s article comes from:

Tensorflow. Google. Cn/tutorials/k…

New entry partners can take a good look at this site, very basic, very suitable for novices.

Of course, these two sites are also recommended:

Tensorflow. Google. Cn/tutorials/k…

Keras. IO/en/I’m Yufeng, public account: Yufeng code word, welcome to hijack.

Hello, my friends, I am Yufeng, a non-professional programmer who entered the big factory from shuangfei dream and finally got his wish. Now I work as an algorithm engineer in a large factory. During my master’s degree, I devoted myself to all kinds of scientific research competitions, from little white to national champion. During my postgraduate study, I visited Tsinghua University and wrote seven invention patents and three scientific research articles. Welcome to pay attention to the public account “Yufeng Code Word”. We learn and make progress together. This official account will share some programming and scientific research articles, please look forward to. At the same time, I am also the chief of B station UP: Yufeng code word, daily share technical video, welcome to onlooker.