The target

  • Know how to use the Cv.kmeans () function for data clustering in OpenCV

Understand the parameters

The input parameters

  1. Sample: It should be of nP.FLOAT32 data type, and each function should be in a single column.

  2. Nclusters (K) : Number of clusters required to end the condition

  3. Criteria: This is the iteration termination condition. When this condition is satisfied, the algorithm iteration will stop. In fact, it should be a tuple of three arguments. They are (type,max_iter,epsilon) : a. The type of termination condition. It has three flags, as follows:

    • Cv. TERM_CRITERIA_EPS- Stops algorithm iteration if the specified precision EPsilon is reached.

    • Cv.term_criteria_max_iter – Stops the algorithm after the specified number of iterations max_iter.

    • Cv. TERM_CRITERIA_EPS + CV. TERM_CRITERIA_MAX_ITER- Iteration is stopped when any of the above conditions are met.

      B. Max_iter – an integer specifying the maximum number of iterations. C. Epsilon – Required accuracy

  4. Attempts: This flag is used to specify how many times the algorithm is executed using different initial tags. The algorithm returns the label that produces the best compactness. The compactness is returned as output.

  5. Flags: This flag specifies how to get the initial center. Typically, two flags are used for this: Cv. KMEANS_PP_CENTERS and Cv. KMEANS_RANDOM_CENTERS.

Output parameters

  1. Compactness: It is the sum of the square distances from each point to its corresponding center.
  2. Tags: This is an array of tags (same as “code” in the previous article), where each element is labeled “0”, “1”…..
  3. Hub: This is the array at the cluster center. Now, we’ll look at three examples to see how the K-means algorithm can be applied.

1. Single feature data

Consider that you have a set of data with only one feature, namely one dimension. For example, we can solve our T-shirt problem where you only use height to determine the size of the T-shirt. Therefore, we first create the data and draw it in Matplotlib

Import numpy as np import cv2 as CV from matplotlib import pyplot as PLT x = np.random. Randint (25,100,25) y = Np.random. Randint (175,255,25) z = np.hstack((x,y)) z = z.reshape((50,1)) z = np.random. Randint (175,255,25) z = np.hstack((x,y)) z = z.reshape((50,1)) z = np.float32(z) PLT. Hist (z, 256, [0256]), PLT. The show ()Copy the code

So we have “z”, which is an array of size 50 with values ranging from 0 to 255. I’m going to recast z as a column vector. It is even more useful if there are more than one function. I then made data of type NP. float32. We get the following image:

Now we apply the KMeans function. To do that, we need to specify standards. My standard is to stop the algorithm and return the answer every time I run 10 iterations of the algorithm or reach epsilon = 1.0 accuracy.

Criteria = (type, max_iter = 10, epsilon = 1.0) Criteria = (Cv. TERM_CRITERIA_EPS + Cv. TERM_CRITERIA_MAX_ITER, 10, # 1.0) set logo flags = CV. KMEANS_RANDOM_CENTERS # used k-means compactness, labels, centers = CV. Kmeans (z, 2, and None of the criteria, 10, flags)Copy the code

This gives us compactness, label and center. In this case, I get centers of 60 and 207. The size of the label will be the same as the size of the test data, where the center of mass of each data will be marked “0”, “1”, “2”, etc. Now we divide the data into clusters based on tags.

A = z[labels==0]
B = z[labels==1]Copy the code

Now let’s draw A in red, B in blue, and its center of mass in yellow.

# now draw with red 'A' and blue 'B', With yellow center PLT. Hist (A, 256, [0256], color = 'r') PLT. Hist (B, 256, [0256], color = 'B') PLT. Hist (centers, 32, [0256], color = 'y'  plt.show()Copy the code

The following results were obtained:

2. Multi-feature data

In the previous example, we only considered the height of the T-shirt problem. Here, we will consider both height and weight, the two characteristics. Remember, in the previous case, we made the data into a single column vector. Each feature is arranged in a column, and each row corresponds to an input test sample. For example, in this case, we set a test data size of 50×2, which is the height and weight of 50 people. The first column corresponds to the height of all 50 people, and the second column corresponds to their weight. The first line contains two elements, the first is the height of the first person and the second is his weight. Similarly, the remaining rows correspond to other people’s height and weight. Check out the images below:

Now, I’ll go straight to the code:

Import numpy as np import cv2 as CV from matplotlib import pyplot as PLT X = np.random. Randint (25,50,(25,2)) Y = Np.random. Randint (60,85,(25,2)) Z = np.vstack((X,Y)) # Criteria = (Cv. TERM_CRITERIA_EPS + Cv. TERM_CRITERIA_MAX_ITER, 10, 1.0) ret, label, center = CV. Kmeans (Z, 2, and None of the criteria, 10, CV. KMEANS_RANDOM_CENTERS) # separation data now, Note the flatten () A = Z [label. Ravel () = = 0] B = Z [label. Ravel () = = 1] # draw PLT data. The scatter (A [:, 0], A [:, 1)) plt.scatter(B[:,0],B[:,1],c = 'r') plt.scatter(center[:,0],center[:,1],s = 80,c = 'y', marker = 's') plt.xlabel('Height'),plt.ylabel('Weight') plt.show()Copy the code

We get the following results:

3. Color quantization

Color quantization is the process of reducing the number of colors in an image. One reason for this is to reduce memory. Sometimes, certain devices may be limited so that only a limited number of colors can be produced. Also in those cases, perform color quantization. Here, we use k-means clustering for color quantization.

There’s nothing new to explain here. There are three features, such as R,G, and B. Therefore, we need to reshape the image into an array of Mx3 size (M is the number of pixels in the image). After clustering, we apply the centroid value (also R,G,B) to all pixels so that the resulting image has a specified number of colors. Again, we need to reshape it into the shape of the original image. Here’s the code:

0 import Numpy as NP import Cv2 as CV IMG = CV. Imread ('home.jpg') Z = img. 0 ((-1,3)) # convert data to Np.float32z = 0 Criteria = (Cv. TERM_CRITERIA_EPS + Cv. TERM_CRITERIA_MAX_ITER, 10, 1.0) K = 8 ret, label, center = CV. Kmeans (Z, K, None of the criteria, 10, CV. KMEANS_RANDOM_CENTERS) # uint8 data can be converted to now, Center = np.uint8(Center) res = Center [label.flatten()] RES2 = Res.shape (0) cv.waitKey(0) cv.destroyAllWindows()Copy the code

We can look at K equals 8

Welcome to panchuangai blog: panchuang.net/

OpenCV: woshicver.com/

Welcome to docs.panchuang.net/