Abstract: K-Neighbor (K-NN) algorithm is a supervised machine learning algorithm, also known as k-NN algorithm, proposed by Cover and Hart in 1968, can be used to solve classification problems and regression problems.
#1. Why learn k-nearest Neighbor algorithm
K-nearest neighbor algorithm, also known as KNN algorithm, is a very suitable algorithm for starting
Has the following features:
● Extremely simple thinking
● Little knowledge of applied mathematics (near zero)
● For all the developers out there, many of you are not good at math, and KNN algorithms use almost no math expertise
Low effect is good
The algorithm is simple but surprisingly effective –
○ There are drawbacks, which will be explained later
● It can explain many details in the process of using machine learning algorithms
○ We will use KNN algorithm to get through the process of using machine learning algorithm and study the details in the process of using machine learning algorithm
● More complete description of machine learning application process
○ Differences between classical algorithms
○ Learning the KNN algorithm using PANDAS and NUMpy
#2. What is the K-nearest Neighbor algorithm
The data points in the figure above are distributed in a feature space, usually we use a two-dimensional spatial demonstration
The horizontal axis shows tumor size and the vertical axis shows time of discovery.
Malignant tumors are shown in blue and benign tumors are shown in red.
There was a new patient
How do we tell if a new patient (the green dots) has a benign tumor or a malignant one?
The k-nearest neighbor algorithm is as follows:
Take a value k=3.
The k-nearest neighbor determines the green point by finding the three nearest points among all points, and then voting the category to which the nearest points belong. We find that the three nearest points are all blue, so the patient’s corresponding color should also be blue, that is, malignant tumor.
Nature: Two samples are similar enough that they have a higher probability of belonging to the same category.
However, if you only look at one, it may not be accurate, so you need to look at K samples. If most of the K samples belong to the same category, the predicted samples are likely to belong to the corresponding category. Here similarity is measured by example.
Here’s another example
● In the figure above, the point closest to the green point contains two red points and one blue point. Here, the number ratio of red points to blue points is 2:1, so the green point has the highest probability of being red, and the final judgment result is benign tumor.
● According to the above findings, k-nearest neighbor algorithm is good at solving classification problems in supervised learning
Click to follow, the first time to learn about Huawei cloud fresh technology ~