Python, OpenCV based on support vector machine SVM handwritten number OCR

In the previous section, handwriting OCR+ letter OCR based on KNN was introduced. This section will introduce handwriting OCR based on SUPPORT vector Machine SVM.

1. The rendering

The training effect of simple linear vector machine is shown as follows:There are four points in the graph, three tend to be white and one grayish-black, and you can see that the decision boundary of the dividing line is very clear.

The renderings of training data of nonlinear vector machine are as follows:

The green and blue dots are mixed together in the figure below. The decision boundary in the middle is nonlinear, but can be approximately linear. The point bounded by a gray circle isSupport vectorDepending on this small amount of dataThe decision boundary.

2. SVM and its principles

Supported Vector Machines

In order to understand SVM, it is necessary to first understand linear factorable data and linear non-factorable data. To put it simply, it means that there are a bunch of points for classification in plane or multidimensional, and whether they can be separated by a line to classify each other.

  • Linearly fractional data

    KNN needs to calculate the distance between test data and all points. When the amount of data is relatively large, large memory is needed to store it. Another way to think about it is to find a line f(x)=ax_1+bx_2+c that divides the data into two regions. When you get a new test_data X, just replace it with f(X). If f(X) is greater than 0, it belongs to the blue group, otherwise it belongs to the red group.

    Call this line the decision boundary, and it’s very simple and memory saving. Such data that can be bisected by a straight line (or higher dimensional hyperplane) is called linearly fractionable data.

  • Nonlinear separable data in low dimensional space is more likely to become linearly separable in high dimensional space.

You can see in the figure above that many of these lines are possible.Which one do you want? Pretty intuitive, you could say that this line should be as far away from all the points as possible.

Taking the longest route will provide more noise resistance. So what SVM does is to find a straight line (or hyperplane) with the maximum distance to the training sample.

  • To find this decision boundary, you don’t need all the data, just those close to the opposite group.

In this image, they are a blue solid circle and two red solid squares. We can call them support vectors, and the lines that go through them are calledSupport the plane. They’re enough to findThe decision boundary.

  • The weight vector determines the direction of the decision boundary, while the offset point determines its position.

2. The source code

2.1 Handwritten digit OCR of SVM

# Use SVM for handwritten data OCR

# Directly use pixel intensity as feature vector in KNN.
# HOG Histogram of Oriented Gradients is used as feature vector in SVM.
# Here, the second moment is used for anti-distortion of the image.
import cv2
import numpy as np

SZ = 20
bin_n = 16  # Number of bins

svm_params = dict(kernel_type=cv2.ml.SVM_LINEAR,
                  svm_type=cv2.ml.SVM_C_SVC,
                  C=2.67, gamma=5.383)

affine_flags = cv2.WARP_INVERSE_MAP | cv2.INTER_LINEAR


# Left image looks like the original image, right image looks like a slanted image.
def deskew(img) :
    m = cv2.moments(img)
    if abs(m['mu02']) < 1e-2:
        return img.copy()
    skew = m['mu11'] / m['mu02']
    M = np.float32([[1, skew, -0.5 * SZ * skew], [0.1.0]])
    img = cv2.warpAffine(img, M, (SZ, SZ), flags=affine_flags)
    return img


# (HOG Histogram of Oriented Gradients
def hog(img) :
    gx = cv2.Sobel(img, cv2.CV_32F, 1.0)
    gy = cv2.Sobel(img, cv2.CV_32F, 0.1)
    mag, ang = cv2.cartToPolar(gx, gy)

    Quantitative (0 #... 16) binvalues
    bins = np.int32(bin_n * ang / (2 * np.pi))

    # Divide into four sub-pieces
    bin_cells = bins[:10To:10], bins[10: :10], bins[:10.10:], bins[10:, 10:]
    mag_cells = mag[:10To:10], mag[10: :10], mag[:10.10:], mag[10:, 10:]
    hists = [np.bincount(b.ravel(), m.ravel(), bin_n) for b, m in zip(bin_cells, mag_cells)]
    hist = np.hstack(hists)
    return hist


img = cv2.imread('images/digits.png'.0)
print(img.shape)  # (1000200)

cells = [np.hsplit(row, 100) for row in np.vsplit(img, 50)]
print(len(cells))  50 * 100 #

# Half data for training, half for testing (first 50 columns, last 50 columns)
train_cells = [i[:50] for i in cells]
test_cells = [i[50:] for i in cells]

# cv2.imshow("img", train_cells[0][0])
# cv2.imshow("deskew", deskew(train_cells[0][0]))
# cv2.waitKey(0)

# Training data
deskewed = [list(map(deskew, row)) for row in train_cells]
hogdata = [list(map(hog, row)) for row in deskewed]

trainData = np.float32(hogdata).reshape(-1.64)
responses = np.repeat(np.arange(10), 250)[:, np.newaxis]
print('trainData: '.type(trainData), len(trainData))
print('responses: '.type(responses), responses.shape, len(responses))

print(responses[0])

svm = cv2.ml.SVM_create()
svm.setGamma(svm_params['gamma'])
svm.setC(svm_params['C'])
svm.setKernel(cv2.ml.SVM_LINEAR)
svm.setType(cv2.ml.SVM_C_SVC)
svm.train(trainData, cv2.ml.ROW_SAMPLE, responses)

Save training data and models
svm.save('images/svm_data.dat')

# Test data
deskewed = [list(map(deskew, row)) for row in test_cells]
hogdata = [list(map(hog, row)) for row in deskewed]
testData = np.float32(hogdata).reshape(-1, bin_n * 4)
result = svm.predict(testData)[1]

print('result: '.type(result))
print('responses: '.type(responses))

# Check accuracy
mask = result == responses
correct = np.count_nonzero(mask)
print('correct: ', correct)

The accuracy of SVM is 93.8%, higher than KNN's 91.76%
print(correct * 100.0 / len(list(result)))
Copy the code

2.2 Nonlinear SVM

from __future__ import print_function

import random as rng

import cv2 as cv
import numpy as np

NTRAINING_SAMPLES = 100  # Number of training samples per class
FRAC_LINEAR_SEP = 0.9  # Fraction of samples which compose the linear separable part

# Visualize window size
WIDTH = 512
HEIGHT = 512
I = np.zeros((HEIGHT, WIDTH, 3), dtype=np.uint8)

Generate training data randomly
trainData = np.empty((2 * NTRAINING_SAMPLES, 2), dtype=np.float32)
labels = np.empty((2 * NTRAINING_SAMPLES, 1), dtype=np.int32)

rng.seed(100)  # Generate category tags randomly

Set linear separation for training data
# Set up the linearly separable part of the training data
nLinearSamples = int(FRAC_LINEAR_SEP * NTRAINING_SAMPLES)

# Generate random points for classification 1
trainClass = trainData[0:nLinearSamples, :]
# x in,0.4 [0]
c = trainClass[:, 0:1]
c[:] = np.random.uniform(0.0.0.4 * WIDTH, c.shape)
# y at [0, 1]
c = trainClass[:, 1:2]
c[:] = np.random.uniform(0.0, HEIGHT, c.shape)

Generate random points for category 2
trainClass = trainData[2 * NTRAINING_SAMPLES - nLinearSamples:2 * NTRAINING_SAMPLES, :]
# x at [0.6, 1]
c = trainClass[:, 0:1]
c[:] = np.random.uniform(0.6 * WIDTH, WIDTH, c.shape)
# y at [0, 1]
c = trainClass[:, 1:2]
c[:] = np.random.uniform(0.0, HEIGHT, c.shape)

Generate random points for classification 1,2 of the test dataset
trainClass = trainData[nLinearSamples:2 * NTRAINING_SAMPLES - nLinearSamples, :]
# x in [0.4, 0.6]
c = trainClass[:, 0:1]
c[:] = np.random.uniform(0.4 * WIDTH, 0.6 * WIDTH, c.shape)
# y in [0, 1]
c = trainClass[:, 1:2]
c[:] = np.random.uniform(0.0, HEIGHT, c.shape)

Set category tags 1 and 2
labels[0:NTRAINING_SAMPLES, :] = 1  Classification of # 1
labels[NTRAINING_SAMPLES:2 * NTRAINING_SAMPLES, :] = 2  Classification of # 2

# Start training, first set support vector machine SVM parameters
print('Starting training process')
# initialization
svm = cv.ml.SVM_create()
svm.setType(cv.ml.SVM_C_SVC)
svm.setC(0.1)
svm.setKernel(cv.ml.SVM_LINEAR)
svm.setTermCriteria((cv.TERM_CRITERIA_MAX_ITER, int(1e7), 1e-6))

# training SVM
svm.train(trainData, cv.ml.ROW_SAMPLE, labels)

# End training
print('Finished training process')

# Show decision area (draw blue, green) Category 1 is green, category 2 is blue
green = (0.100.0)
blue = (100.0.0)
for i in range(I.shape[0) :for j in range(I.shape[1]):
        sampleMat = np.matrix([[j, i]], dtype=np.float32)
        response = svm.predict(sampleMat)[1]
        if response == 1:
            I[i, j] = green
        elif response == 2:
            I[i, j] = blue

# Show test data
thick = -1

# Category 1 Green
for i in range(NTRAINING_SAMPLES):
    px = trainData[i, 0]
    py = trainData[i, 1]
    cv.circle(I, (px, py), 3, (0.255.0), thick)

# Category 2 Blue
for i in range(NTRAINING_SAMPLES, 2 * NTRAINING_SAMPLES):
    px = trainData[i, 0]
    py = trainData[i, 1]
    cv.circle(I, (px, py), 3, (255.0.0), thick)

# show support vectors
thick = 2
sv = svm.getUncompressedSupportVectors()

for i in range(sv.shape[0]):
    cv.circle(I, (sv[i, 0], sv[i, 1]), 6, (128.128.128), thick)

cv.imwrite('non_linear_svms_result.png', I)  # Save images
cv.imshow('SVM for Non-Linear Training Data', I)  # Show the results
cv.waitKey()
Copy the code

reference

  • Docs.opencv.org/3.0-beta/do…