3. K-nearest Neighbor method

K nearest neighbor algorithm

K-nearest Neighbor (K-NN) : Given a training data set, for a new input instance, find k instances closest to the instance in the training data set, and most of the K instances belong to a certain class, and then classify the input instance into this class. The model used by KNN actually corresponds to the division of feature space, and there is no explicit training process.

K nearest neighbor model

model

The model has three basic elements: distance measure, k value selection and classification decision rule. When these three elements are determined, a unique and deterministic classification can be given for any new input instance. The illustration here is more clear. For each sample in the training set, all the points that are closer to the point than the other points form an area called a unit. Each sample has a unit, and the units of all samples eventually constitute the division of the entire feature space, and for each sample, its label is the label of all points in the unit. In this way, the label of the sample point of each unit is uniquely determined.

The three elements

Distance metric
- PPP distance: The distance between two instance points in the feature space is a reflection of the degree of superiority of two instance points. Let the Lp distance of input instances x∈Rn,xi and Xj Lpx \in \mathbb{R}^{n}, L_{I} and x_{j} L_{p}x∈Rn,xi and xj be defined as: Lp (xi, xj) = (∑ l = 1 n ∣ xil – XJL ∣ pp) L_ (1 p) {p} \ left (x_ {I}. x_{j}\right)=\left(\sum_{l=1}^{n}\left|x_{i}^{l}-x_{j}^{l}\right| p^{p}\right)^{\left(\frac{1}{p}\right)}\qquad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quadLp(xi , xj) = (∑ l = 1 n ∣ ∣ ∣ xil – XJL ∣ ∣ ∣ pp) (p1)
- Euclidean distance: p=2p=2p=2 special case. L2 (xi, xj) = (∑ l = 1 n ∣ xi (l) – xj ∣ (l) 2) 12 l_ {2} \ left (x_ {I}. X_ {j} \ right) = \ left (\ sum_ ^ {l = 1} {n} \ left | x_ {I} ^ {} (l) – x_ ^ {j} {(l)} \ right | ^ {2} \ right) ^ {\ frac {1} {2}} L2 (xi, xj) = (∑ l = 1 n ∣ ∣ ∣ ∣ xi – (l) Xj ∣ (l) ∣ ∣ ∣ 2) 21
- Manhattan distance: p=1p=1p=1 special case. L1 (xi, xj) = ∑ l = 1 n ∣ xi (l) – xj (l) ∣ L_ {1} \ left (x_ {I}. X_ {j} \ right) = \ sum_ ^ {l = 1} {n} \ left | x_ {I} ^ {} (l) – x_ ^ {j} {(l)} \ right | L1 (xi, xj) = ∑ l = 1 n ∣ ∣ ∣ ∣ xi (l) – xj ∣ (l) ∣ ∣ ∣
- Chebyshev distance: special case of p=∞p={\infty}p=∞. L up (xi, xj) = Max ⁡ ∣ xi (L) – xj L (L) ∣ L_ {\ infty} \ left (x_ {I}. X_ \ right) = {j} _ {l} \ \ Max left | x_ {I} ^ {} (l) – x_ ^ {j} {} (l) | l \ right up (xi, xj) = maxl ∣ ∣ ∣ ∣ xi (l) – xj ∣ (l) ∣ ∣ ∣
The choice of k values
- A smaller k value indicates that the overall model becomes complex, and the classification results are easily affected by noise points, and overfitting is easy to occur. A larger k value means that the overall model becomes simple and easy to underfit. In the application, the k value is generally taken as a relatively small value, and the optimal k value is usually selected by cross validation method.
Classification decision rule
- The classification decision rule of k-nearest neighbor method is usually majority voting, that is, the majority classes of k adjacent training instances of the input instance determine the class of the input instance.
- Majority voting rule If the loss function of classification is 0−10-10−1, the classification function is F :Rn→ C1, C2… ^, CKF: \ mathbf {R} {n} \ rightarrow c_ {1}, c_ {2}, \ ldots, c_ {k} f: Rn – c1 and c2,… Ck the misclassification probability P (Y indicates f (X)) = 1 – P (Y = f (X)) P (Y \ neq f (X)) = 1 – P (Y = f (X)) P (Y  = f (X)) = 1 – P (Y = f (X)) for a given instance of X ∈ χ 1 X \ \ chi_ in {1} X ∈ 1 of its nearest neighbor KKK training instance points constitute a set Nk(x)N_{K}(x)Nk(x). If cover Nk N_ (x)} {k Nk (x) (x) of regional category is cjc_ {j} cj, ∑ xi the misclassification rate is 1 k ∈ Nk (x) I (yi indicates cj) = 1 – k ∑ xi ∈ Nk (x) I (yi = cj) \ frac {1} {k} \ sum_ {x_ {I} \ in N_{k}(x)} I\left(y_{i} \neq c_{j}\right)=1-k \sum_{x_{i} \in N_{k}(x)} I \ left (y_ {I} = c_ {j} \ right) k1 ∑ xi ∈ Nk (x) I (yi  = cj) = 1 – k ∑ xi ∈ Nk (x) I (yi = cj) to make the minimum the empirical risk minimum misclassification rate, Make ∑ xi ∈ Nk (x) I (yi = cj) \ sum_ {x_ {I} \ in N_ (x)} {k} \ I left (y_ {I} = c_ {j} \ right) ∑ xi ∈ Nk (x) I (yi = cj) is the largest, so the majority voting rules equivalent to the empirical risk minimization.

K nearest neighbor algorithm implementation: KD tree

And kd tree

Input: K dimensional spatial data set: T = {x1, x2,…, xN} T = \ left \ {x_ {1}, x_ {2}, \ \ cdots, x_ \} {N} \ right T = {x1, x2,…, xN}, Xi = (xi (1), xi (2),…, xi (k)) Tx_ {I} = \ left (x_ {I} ^ {} (1), x_ {I} ^ {(2)}, \ \ cdots, X_ {I} ^ ^ {(k)} \ right) {T} xi = (xi (1), xi (2),…, xi (k)) T output: kd tree
1. Start: Construct the root node. X (1)x^{(1)}x(1) was selected as the coordinate axis, and the median of all the data x(1)x^{(1)}x(1) coordinates in the training set was taken as the segmentation point. The hyperrectangular region was cut into two sub-regions, and the segmentation point was taken as the root node. The left and right child nodes with depth of 1 are generated from the root node. The corresponding coordinates of the left node are smaller than the sharding point, and the corresponding coordinates of the right node are larger than the sharding point.
2. Heavy compound to a depth of JJJ node and select x (I) x ^ {} (I) x (I) as the segmentation axes, I = j (mod) k + 1 I = j (\ bmod k) + 1 + 1, I = j (modk) The region is divided into two sub-regions by taking the median coordinates of all instances x(L)x^{(L)}x(L) in the node region as the point of segmentation. Generate the left and right children of depth j+1j+1j+1. The corresponding coordinate of the left node is less than the sharding point, and the corresponding coordinate of the right node is greater than the sharding point.
3. Stop until there are no instances of the two subregions.
Example Input: Training set: $T = \ {(2, 3), (5, 4), (9, 6), (4, 7), (8, 1), (7, 2) \}$ Output: KD tree

X (1) : 2,4,5,7,8,9 x ^ {} (1) : ,4,5,7,8,9 \ quad 2 x (1) : 2,4,5,7,8,9 begins: select x (1) x ^ {} (1) x (1) as the axis, the median was $777, namely (7, 2), (7, 2) (7, 2) for the segmentation point, dividing the whole area

Divide the region again: to $x^{(2)}$ Is the coordinate axis, select the median, and the left area is $4$ And the region on the right is $6$ . Therefore, the segmentation point of the left region is $(5, 4)$ , the coordinates of the region segmentation point on the right are $(9, 6)$

Divide the left area: to $x^{(1)}$ Is the coordinate axis, select the median, and the upper region is $4$ And the lower area is $2$ . Therefore, the upper region is segmented at $(4, 7)$ , the coordinates of the segmentation point of the lower region are $(2, 3)$

Divide the area on the right: to $x^{(1)}$ Is the coordinate axis, select the median, the upper region has no instance point, and the lower region is $8$ . Therefore, the coordinates of the segmentation point of the lower region are $(8, 1)$ Final partition result

Kd tree

So now the algorithm is done

Search the kd tree

Search with the nearest neighbor of the KD tree
- Find the “current nearest point” finds the “current nearest point” of the target point with the nearest child.
- Traceback Traces and iterates along the root of the tree at the distance between the target point and the “current closest point.”
- Input: constructed KD tree, target point x output: nearest neighbor of X
  - Find the “current closest point”
    - Starting from the root node, recursively access the KD tree to find the leaf node containing X; This leaf node is “current nearest point”;
  - back
    - If the node is closer to the target point than the current nearest point, update the current nearest point.
    - The current nearest point must exist in the region corresponding to a child of this node. Check whether the region corresponding to another child of the child’s parent node has a closer point.
  - When you fall back to the root, the search ends, and the last “current nearest point” is x’s nearest neighbor.
For example, enter kd tree $X = (2,4.5)$ ; Output: nearest neighbor point

First retrospective

Second backtrace, nearest neighbor: $(2, 3)$

If the instance points are randomly distributed, the average computational complexity of kd tree search is O(log⁡N)O(\log N)O(logN), where NNN is the number of training instances. Kd tree is more suitable for k-nearest neighbor search when the number of training instances is much larger than the dimension of space. When the spatial dimension committee approaches the number of training instances, its efficiency decreases rapidly and almost approaches linear scanning.

code

import torch
import random
import matplotlib.pyplot as plt


class DrawTool() :
    """ drawing class """

    # draw points [data set, point X, the point closest to point X]
    def drawPoint(self, points, x, nearestPoint) :
        XMax = max(points[:, 0])  # x-range
        YMax = max(points[:, 1])  # y-range
        precision = max(XMax, YMax) // 10 + 1  # Coordinate axis accuracy
        # plt.rcparams ['font. Sans-serif '] = ['SimHei'] #
        plt.scatter(points[:, 0], points[:, 1], label="data")
        plt.scatter(x[0], x[1], c='c', marker=The '*', s=100, label="x(input)")
        plt.scatter(nearestPoint[0], nearestPoint[1], c='r', label="nearest")
        plt.xticks(torch.arange(0, XMax, precision))  # set the X-axis
        plt.yticks(torch.arange(0, YMax, precision))  # set the Y-axis
        plt.legend(loc='upper left')
        plt.show()


class DataLoader() :
    """" Data loading class """

    [creat: artificial dataset, random: random dataset]
    def __init__(self, kind="creat") :
        self.points = None
        if (kind == "creat"):
            self.x = [2.5.9.4.8.7]
            self.y = [3.4.6.7.1.2]
        elif kind == "random":
            nums = random.randint(20.40)
            self.x = random.sample(range(0.40), nums)
            self.y = random.sample(range(0.40), nums)

    # Processing data
    def getData(self) :
        self.points = [[self.x[i], self.y[i]] for i in range(len(self.x))]
        return self.points

    # Get a point that does not duplicate the dataset as point X
    def getRandomPoint(self) :
        points = torch.tensor(self.points)
        x, y, i = -1, -1.0
        while x == -1 or y == -1:
            if x == -1 and i not in points[:, 0]:
                x = i
            if y == -1 and i not in points[:, 1]:
                y = i
            i += 2
        return x, y


class KDNode() :# binary tree
    """" Node class """

    def __init__(self, point) :
        self.point = point
        self.left = None
        self.right = None


class KDTree() :
    "" "" "KD tree"

    def __init__(self) :
        self.root = None
        self.nearestPoint = None
        self.nearestDis = float('inf')

    # Create and search KD tree [dataset, x]
    def creatAndSearch(self, points, x) :
        self.root = self.creatTree(points)
        self.searchTree(self.root, x)

    # Create KD tree [data set, dimension]
    def creatTree(self, points, col=0) :
        if len(points) == 0:
            return None
        points = sorted(points, key=lambda point: point[col])
        mid = len(points) >> 1
        node = KDNode(points[mid])
        node.left = self.creatTree(points[0:mid], col ^ 1)
        node.right = self.creatTree(points[mid + 1:len(points)], col ^ 1)
        return node

    # search kd-tree [kD-tree, x, dimension]
    def searchTree(self, tree, x, col=0) :
        if tree == None:
            return

        # step 1 in the corresponding algorithm
        if x[col] < tree.point[col]:
            self.searchTree(tree.left, x, col ^ 1)
        else:
            self.searchTree(tree.right, x, col ^ 1)

        disCurAndX = self.dis(tree.point, x)
        if disCurAndX < self.nearestDis:
            self.nearestDis = disCurAndX
            self.nearestPoint = tree.point

        # judge whether the current minimum round fellowship with other areas, namely the judgment | x () according to the shaft read value - node (the axis read value) | < recent values (the radius of the circle)
        # (b) in step 3 of the corresponding algorithm
        if abs(tree.point[col] - x[col]) < self.nearestDis:
            if tree.point[col] < x[col]:
                self.searchTree(tree.right, x, col ^ 1)
            else:
                self.searchTree(tree.left, x, col ^ 1)

    Distance between two points [point A, point B]
    def dis(self, a, b) :
        return sum([(a[i] - b[i]) ** 2 for i in range(len(a))]) ** 0.5# Euclidean distance

    [kd-tree]
    def printTree(self, root) :
        ifroot ! =None:
            print(root.point)
            self.printTree(root.left)
            self.printTree(root.right)


if __name__ == '__main__':
    drawTool = DrawTool()
    dataLoader = DataLoader("random")
    kdTree = KDTree()

    points = dataLoader.getData()
    x = dataLoader.getRandomPoint()

    kdTree.creatAndSearch(points, x)
    drawTool.drawPoint(torch.tensor(points), x, kdTree.nearestPoint)

Copy the code

K nearest neighbor algorithm

K nearest neighbor model

model

The three elements

Distance metric

The choice of k values

Classification decision rule

K nearest neighbor algorithm implementation: KD tree

And kd tree

Search the kd tree

code

Related Posts

From origins to specific algorithms, this is a deep learning overview paper for everyone to read

Why is the effect of two-layer 3*3 convolution kernels better than that of one-layer 5*5 convolution kernels?

Machine learning 009- Solving multiple classification problems with logistic regression classifiers

Why is the effect of two-layer 33 convolution kernels better than that of one-layer 55 convolution kernels?