Pytorch Deep Learning tutorial (1) : Semantic segmentation foundation and environment building

Abstract

Since I haven’t updated my technical blog for a long time, I’ve dug a new hole for myself: the semantic segmentation series, which starts with the simplest semantic segmentation basics and development environment construction.

One, foreword

It’s been a while since I updated my technology blog, so dig yourself a new hole: the semantic segmentation series.

This series of articles covers:

Basic use of Pytorch
Semantic segmentation algorithm explanation

Start with the simplest semantic segmentation foundation and development environment construction.

Second, semantic segmentation

What is semantic segmentation?

Semantic segmentation: Labels each point in the target category of the image according to the “semantic”, so that different kinds of things can be distinguished on the image. It can be understood as a pixel-level classification task, which, frankly speaking, is to classify each pixel point.

In short, our goal is to give an RGB color image (height x width x3) or a grayscale image (height x width x1) and output a segmentation map that includes a category annotation for each pixel (height x width X1). The details are shown in the figure below:

Note: For visual clarity, the projection above is a low-resolution projection. In practice, the resolution of segmentation annotation should be the same as that of the original image.

The images are divided into five categories: Person, Purse, Plants/Grass, Sidewalk and Building/Structures.

Similar to standard categorical values, this creates a one-Hot coded target category annotation — essentially creating an output channel for each category. Because there are 5 categories in the figure above, the number of channels output by the network is also 5, as shown in the figure below:

As shown in the figure above, the predicted results can be combined into a segmentation map by taking an Argmax depth for each pixel. In turn, we can easily observe each object in an overlapping way.

The argmax approach is also easy to understand. As shown in the figure above, each channel has only 0 or 1. In the case of Person’s channel, the red 1 represents the pixel of Person and all other pixels are 0. The same is true for other channels, and there is no case where the same pixel is 1 in more than two channels. Therefore, argmax finds the maximum index channel value for each pixel. The final result is:

When only one layer of channels is superimposed on the original image, we call it a mask, indicating only the area where a particular category exists.

The high-resolution results are shown below, with different colors representing different categories:

Data set

Common semantic segmentation algorithms belong to supervised learning, so well-annotated data sets are essential.

There are many publicly available semantically segmented datasets. Currently, there are three benchmarks used for model training and testing.

The first commonly used data set was the Pascal VOC series. VOC2012 is currently popular in this series, and similar datasets such as Pascal Context are also useful.

The second commonly used dataset is Microsoft COCO. COCO has a total of 80 categories, and although there are very detailed pixel-level annotations, there is no official evaluation specifically for semantic segmentation. This data set is mainly used for instance level segmentation and image description. Therefore, COCO data sets are often used as additional training data sets for model training.

The third dataset, Cityscapes for assisted driving (autonomous driving) environments, uses 19 more common categories for evaluation.

There are many data sets that can be used for semantic segmentation training:

Pascal Voc 2012: More common object classification, 21 categories;
MS COCO: Sponsored by Microsoft, it has almost become a “standard” data set for evaluating the performance of image semantic understanding algorithms in 80 categories;
Cityscapes: 33 types of labeled objects from 50 European cities with different scenes, backgrounds and seasons;
Pascal-context: An extension of the PasCAL-VOC 2010 recognition Contest, with 59 categories;
KITTI: One of the most popular datasets for mobile robotics and autonomous driving research, in 11 categories;
NYUDv2:2.5-dimensional dataset, which contains 1449 indoor RGB-D images captured by Microsoft Kinect device;
Sun-rgbd: Obtained by four RGB-D sensors, containing 10000 RGB-D images, the size is consistent with PASCAL VOC;
ADE20K_MIT: A new data set for scenario understanding. This data set is available for free download and contains 151 categories.

There are many data sets, and this series of tutorials is not limited to specific data sets, but may also use data sets such as Kaggle contests, how to deal with each data set, what data sets format will be, and what data sets will be used in subsequent articles.

Iv. GPU machine

For semantic segmentation tasks, it is necessary to have a machine with a high-end GPU graphics card, otherwise training convergence will be slow.

The best development environment is Linux, because the daily work in the company is basically using Linux cloud server for model development, so it is beneficial to adapt to Linux operating system in advance.

For the Student Party, if the laboratory is engaged in deep learning research and the resources are complete, then the GPU server should still be available and there is no worry about the problem of GPU server.

However, due to limited conditions, the lab is not equipped with GPU server, and there are three ways to learn deep learning-related knowledge:

1. Free cloud server Google Colab

Google Colab is a free GPU server provided by Google. Its GPU computing power is ok, but its main problems are that it requires wall climbing and has little storage space. Google Colab gets its storage space by mounting Google Drive, which offers only 15 gigabytes of free storage, and you’ll have to pay to expand that space.

Want to use the free cloud server Google Colab, you can baidu tutorials.

2. Aliyun paid GPU cloud server

Ali Cloud provides GPU cloud server resources with two payment modes: monthly payment and payment by traffic. There are P4 servers, and even the V100 server. The performance is strong and the price is impressive, expensive in a word and not recommended for individual users. In addition to Ali Cloud to provide GPU cloud services, Tencent, Baidu, Huawei have corresponding services, but they are very expensive.

3, configure a computer host

You can configure a desktop host, is also an investment in their own. It costs about 6000 yuan to configure a good mainframe for deep learning training.

Deep learning training is very dependent on the performance of the graphics card, so it is necessary to configure a good N card, that is, NVIDIA graphics card, the skills to choose a graphics card is to see the graphics card ladder (click to view) :

The graphics ladder mainly includes a ranking of the most commonly used graphics cards in the market, excluding graphics cards like the V100 that cost $100,000.

Do not choose the AMD graphics card on the right. Although the performance is good, THE A card does not support CUDA.

According to their own budget, choose video card, video card memory as far as possible to choose more than 8G, deep learning model training eats video memory resources.

I bought MSI RTX 2060 Super, the price is 3399 yuan, the graphics card is not value, the price will be lower and lower over time.

Configuration computer can actually write a lot, such as CPU, computer motherboard, power supply, memory, radiator selection, etc., here will not expand. Do not have the energy to assemble their own desktop, can directly buy the desktop equipped with the corresponding graphics card, but the price is relative to their own assembly of the desktop, the price will be a few more expensive.

5. Development environment construction

If possible, it is recommended to use Ubuntu system configuration development environment. Ubuntu is one of the distributions of Linux, suitable for beginners, friendly interface, simple operation.

Since the computer motherboard I bought does not support the installation of Linux architecture system, Windows will be used as the development environment in the future, but this does not affect the explanation of algorithm principle and code.

My desktop configuration:

CPU: Intel I7 9700K

Graphics: RTX 2060 Super

System: Windows 10

After installing Windows OS and required drivers, install CUDA, Anaconda3, cuDNN, Pytorch- GPU, and Fluent Terminal (optional).

1, CUDA

CUDA is the computing platform launched by NVIDIA, a graphics card manufacturer. For example, RTX 2060 Super supports CUDA 10

Foolproof installation, very simple.

After installation, you need to configure the system environment variables, computer -> right mouse button -> Properties -> Advanced System Settings -> Environment variables ->Path:

To add my own NVSMI path to the environment variable, I used the default installation address:

Once configured, you can view the graphics card in CMD using the Nvidia-SMI command.

2, Anaconda3

Anaconda is Python’s package manager and environment manager, which makes it easy to install Python third-party libraries.

Download address: Click to see

Choose Python 3.7, which is easy to install as a foolproof next step.

Once installed, you need to add system environment variables in the same way you did when installing CUDA:

| |

D:\Anaconda

D:\Anaconda\Scripts

Change the path to the Anaconda path you installed.

After the configuration is complete, run conda -v in CMD. If no error message is displayed and the version information is displayed, the configuration is successful.

Install cuDNN and Pytorch

CuDNN is a GPU accelerated library for deep neural networks. It emphasizes performance, ease of use, and low memory overhead.

With Anaconda installed, cuDNN and Pytorch can be installed using Conda.

Start Anaconda Prompt, Anaconda Prompt, Anaconda Prompt, Anaconda Prompt, Anaconda Prompt, Anaconda Prompt In Anaconda Prompt, enter:

| |

conda create -n your_name jupyter notebook

Create a virtual environment named your_name with the additional installation of the Jupyter Notebook third-party library. You can change your_name to your own preferred name, which is the name of your virtual environment, for example jack.

Then, type y to install:

Once installed, you can view the existing environment by using the conda info -e command.

As can be seen from the figure above, there are two environments, one is Base, which comes with its own base environment, and the other is our newly created environment named Jack. The reason for creating a new environment is that we can manage our configured environment separately.

With the environment installed, we can activate the Jack environment and install cuDNN and the GPU version of Pytorch. Activate the environment named Jack:

As you can see, our environment has changed from Base to Jack. Install cuDNN in jack environment:

After installing cuDNN, install Pytorch and go to Pytorch’s official website

Choose according to their own environment, after the selection, the web page will automatically give the instructions to run. There may be a distinction to be made between the Python and CUDA versions.

To view the Python version, enter Python on the command line interface (CLI).

To view the CUDA version, enter nvidia-smi in the command line interface.

Once you’ve determined the version, you can install the GPU version of Pytorch using the instructions provided on the Pytorch website.

At this point, the basic environment setup is complete. Congratulations.

Terminal 4, Fluent

The base environment is ready for normal use.

But for those looking good, both the Windows and Anaconda command line tools are ugly.

Is there a nice Terminal that works well? The answer is yes, but you need to configure yourself, and there are some pits to step on slowly.

For example, Fluent Terminal is a modern and my preferred Terminal tool. It is dedicated to Windows platform, and the use of UWP technology to create a high level of appearance of the terminal emulator. Take a look at the level of appearance:

For those who like to toss, check out these articles:

Say goodbye to the ugly Windows terminal and start by revamping the look of PowerShell

Use Windows 10 Terminal like a MAC

There are many such tools for beautification, which need to be explored by ourselves. Since this article is not specifically aimed at Terminal beautification, there is no need to introduce these tools in too much space. Like toss, according to their own needs to baidu.

Six, the summary

This article introduces the basic knowledge of semantic segmentation and the construction of the development environment. The next article in this series will specifically explain the algorithm principle and training code of UNet.

The resources

1, blog.csdn.net/mou_it/arti…

2, blog.csdn.net/ShuqiaoS/ar…