Machine learning 021- Vector quantization of images using K-means

(Python libraries and versions used in this article: Python 3.5, Numpy 1.14, Scikit-learn 0.19, matplotlib 2.2)

In the previous article, we explained the definition method of k-means algorithm and used K-means to perform simple clustering analysis on data sets. Here we explain how to use K-means to carry out vector quantization operation on pictures.


1. Introduction to vector quantization

Vector Quantization (VQ) is a very important signal compression method, which plays an important role in image processing, speech signal processing and other fields.

Vector quantization is a lossy data compression method based on block coding rules, in JPEG image compression format and mpeg-4 video compression format are vector quantization in this step, the basic idea is: several scalar data set consists of a vector, and then in the vector space to the overall quantitative, so as to achieve data compression at the same time but don’t lose many information.

Vector quantization is actually a kind of approximation, its core idea and “round” is essentially the same, is to use a number to replace the other a number or a set of data, such as there are a lot of data (6.235, 6.241, 6.238, 6.238954, 6.24205…) If these data are rounded, one data 6.24 can be obtained, that is, one data (6.24) can represent many data.

With this basic idea in mind, we can look at the following example of one-dimensional vector quantization:

On the number axis, there are a lot of data, we can use less than – 2-3 to represent all the data, with the representative – 1-2 of data between 0 and 1 represents the data between 0 to 2, 3 on behalf of the greater than 2 data, so the whole wireless multiple data on the number axis, can use the four data (3, 1,1,3), We can encode these four numbers with only two bits, such as (-3=00, -1=01,1=10,3=11), so this is 1-dimensional, 2-bit VQ, whose quantization rate=2bits/dimension.

Let’s look at a slightly more complicated example of two-dimensional vector quantization:

Since it is two-dimensional, any point on the plane can be expressed in the coordinate form of (x,y). In the figure, we divide the whole two-dimensional plane into 16 regions with a solid blue line, so any data point will fall into one of these 16 regions. We can represent this region of the plane by some points on the plane, so we get 16 red points, and the coordinates of the 16 red points represent all the two-dimensional points in a region.

Furthermore, we use 4bit binary code to encode these 16 numbers, so this problem is 2-dimensional, 4-bit VQ, and its quantization rate is also 2bits/ Dimension.

The red stars shown here, the 16 representatives, are called Code Vectors, and the blue bounded regions are called Encoding regions, The set of all these coding vectors is called the code book, and the set of all coding regions is called the partition of the space.

For the image, each pixel in the image can be considered as a data, and k-means is used for clustering analysis of these data. For example, by clustering the whole image into K categories, k different centroids will be obtained (Understanding and intuitive feeling about centroids, You can refer to my last article [Stove AI] machine learning 020- Clustering analysis of data using K-means algorithm), or more generally, K different data representatives can be obtained, and these data representatives can represent the pixel values of all points in the whole image. So we just need to know this K data represent is ok (think of people’s congress will understand this truth), which can greatly reduce the image storage space (such as a BMP image might be 2-3 m, and the size of the compressed into JPG after only a few hundred K, process compressed into JPG, of course, there are other compression way, not just the vector quantization, But the general meaning is the same), of course, this process will cause a certain amount of image pixel distortion, the degree of distortion is the number of K. It can be expressed as:

(The above content is partly derived from Vector Quantization of blogs)


2. Use K-means to perform vector quantization operation on the image

According to the introduction of vector quantization in the first part above, we can carry out vector quantization compression on a certain picture. We can extract K pixel representatives from the picture, and then use these representatives to represent a picture. The specific code is:

from sklearn.cluster import KMeans
Construct a function to perform vector quantization of the image
def image_VQ(image,K_nums): # Seems to take a lot of time.
    # Construct a KMeans object
    kmeans=KMeans(n_clusters=K_nums,n_init=4)
    Use this KMeans object to train the data set, which in this case is the image
    img_data=image.reshape((- 1.1))
    kmeans.fit(img_data)
    centroids=kmeans.cluster_centers_.squeeze() # Center of mass for each category
    labels=kmeans.labels_ # Tags for each category
    return np.choose(labels,centroids).reshape(image.shape)
Copy the code

We first set up a function to complete the above image vector quantization compression operation, the first to establish a Kmeans object operation, and then use the Kmeans object to training image data, and then filed after the classification of each category of centroid and tag, and use the center of mass to directly replace the original image pixels, obtains the compressed image.

In order to view the original image and the compressed image, we draw the two images to a line, and draw the function as follows:

# Draw both the original image and the compressed image for easy comparison and viewing
def plot_imgs(raw_img,VQ_img,compress_rate):
    assert raw_img.ndim==2 and VQ_img.ndim==2."only plot gray scale images"
    plt.figure(12,figsize=(25.50))
    plt.subplot(121)
    plt.imshow(raw_img,cmap='gray')
    plt.title('raw_img')
    
    plt.subplot(122)
    plt.imshow(VQ_img,cmap='gray')
    plt.title('VQ_img compress_rate={:.2f}%'.format(compress_rate))
    plt.show()
Copy the code

For convenience, we can directly encapsulate the compressed image function and display image function into a more advanced function, which is convenient for us to call and run directly, as shown below:

import cv2
def compress_plot_img(img_path,num_bits):
    assert 1<=num_bits<=8.'num_bits must be between 1 and 8'
    K_nums=np.power(2,num_bits)
    
    # Calculate the compression rate
    compression_rate=round(100* (8-num_bits)/8.2)
# print('compression rate is {:.2f}%'.format(compression_rate))
    
    image=cv2.imread(img_path,0) # Read as grayscale image
    VQ_img=image_VQ(image,K_nums)
    plot_imgs(image,VQ_img,compression_rate)
Copy the code

With the various manipulation functions ready, we can directly call compress_plot_img() to compress and display the image. Here are three different compression bits to see the effect.

# # # # # # # # # # # # # # # # # # # # # # # # small * * * * * * * * * * and # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #

1. The essence of vector quantization compression of images is to divide image data into K different analogies. This idea is consistent with that of K-means, so vector quantization of images is an important application of K-means algorithm.

2. After obtaining the centroids of K categories of the image through k-means algorithm, K different centroids can be used to replace the image pixels, and then the compressed image with a little distortion can be obtained.

3. From the comparison of the above three images, it can be seen that the higher the image compression rate is, the more severe the image distortion is. The image with the last bit of 1 can be said to be a binarization map, and its pixel value is either 0 or 1, or 0.

# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #


Note: This part of the code has been uploaded to (my Github), welcome to download.

References:

1, Classic Examples of Python machine learning, by Prateek Joshi, translated by Tao Junjie and Chen Xiaoli