Learning-based coding (II) : Loop filtering is performed on the intra frame using CNN

The algorithm in this paper comes from JVEt-O0157

introduce

In order to replace DBF in VVC loop filter, a down-sampled CNN filter is proposed. The CNN filter can reduce the complexity of neural network and keep the coding efficiency. It is a lightweight CNN network, which can effectively control the intensity of CNN filtering to cope with different quantization parameters. Used only for intra frames.

Filtering process

The following figure shows the loop filtering process after adding CNN to the intra frame. DBF is replaced by CNN. The inter Frame loop filtering is the same as in VTM.

 

The network structure

The following figure shows the network structure of CNN. Where N represents the lower sampling step, and M represents the number of channels in the convolution kernel. The core idea is to reduce the amount of computation through downsampling, and to control the intensity so that a single network can be used for different QPS.

 

 

The process of the CNN:

  1. (N,N,M) convolution layer (n-fold down-sampling) is used to extract features.
  2. Pass features to multiple residual blocks to enhance filtering.
  3. (3,3,NxN) the convolution layer outputs the feature map of NxN, and the DepthToSpace function maps the feature map to be consistent with the size of the input image.

Strength of the filter

The purpose of dividing and multiplying Qstep in the network is to control the intensity of filtering.

 

Round operation on different Qsteps results in different levels of distortion. Therefore, the reconstructed image is divided by the regularized Qstep to control the distortion level before being transmitted to CNN, and then multiplied by the regularized Qstep after CNN processing.

The regularized Qstep solution process is as follows:

 

training

A DIV2K dataset (800 images) was used to generate training data. First, transform each image from RGB to YUV. VTM5.0 loop filtering is then disabled to generate reconstructed images in an All Intra(AI) configuration. Then cut 800000 pieces randomly in Y component and 800000 pieces randomly in UV component to train two groups of parameters.

 

The experimental results

Here are the test results for N=4 and M=32:

 

Here’s a subjective comparison, where you can see that the block effect at the neck is eliminated.

 

This method replaces DBF with pre-trained CNN model in VTM5.0. Under AI configuration, THE BD-rate of Y, U and V is -1.44%, -2.51% and -3.39%, respectively. The decoding time is 1040%.

If you are interested, please pay attention to wechat public account Video Coding