1 / introduction
In this article, let's take a look at how to quickly implement kNN algorithm based on SciKit-Learn. Scikit-learn has so many data sets built into it that we don't have to make up our own fake data sets. Here we use data sets for iris and handwritten number recognition. Sklearn import datasets from sklearn. Neighbors import KNeighborsClassifier # KNN from sklearn Sklearn.model_selection import train_test_split # Import numpy as NP get iris = datasets.load_iris() Iris_x = iris. # data. The data attribute, is to get the characteristics of the data set, data format for [] [], [], []... The target attribute is the tag part of the data set in the format [x,x,x,x......]. Use the train_test_split() function to split the training set from the test set, X_train,x_test, y_test = train_test_split(iris_x, iris_y, test_size=0.2, Random_state = 666) then initialize the kNN classifier and perform the FIT operation. Since the kNN algorithm is relatively simple, the FIT operation just fills in the training data. The more complex FIT function has more processing for the data. KNN = KNeighborsClassifier() KNN = KNeighborsClassifier() KNN = KNeighborsClassifier() Then you can use the Score method to see the results of the classification, Y_predict = knn.predict(x_test) knn.score(x_test) Y_test) # knn.score(y_predict,y_test)Copy the code
2/ Complete code
# Iris recognition
from sklearn import datasets
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
import numpy as np
# Iris recognition
iris = datasets.load_iris()
iris_x = iris.data
iris_y = iris.target
x_train,x_test,y_train,y_test = train_test_split(iris_x,
iris_y,
test_size=0.2,
random_state=Awesome!)
knn = KNeighborsClassifier()
knn.fit(x_train, y_train)
y_predict = knn.predict(x_test)
knn.score(x_test, y_test) # To see how well the model works
# Handwritten number recognition
# Same recipe, Same taste, next look at handwritten number recognition
# The Iris dataset has 150 sets of data, while handwritten number recognition has over 2000 sets of data available.
from sklearn import datasets
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
import numpy as np
digits = datasets.load_digits()
digits_x = digits.data
digits_y = digits.target
x_train,x_test,y_train,y_test = train_test_split(digits_x,
digits_y,
test_size=0.2,
random_state=Awesome!)
knn = KNeighborsClassifier()
knn.fit(x_train, y_train)
y_predict = knn.predict(x_test)
knn.score(x_test, y_test)
Copy the code
3/ Parameter adjustment
There are two kinds of parameters, namely hyperparameter and model parameter. Hyperparameters: Parameters that need to be specified before running the machine learning algorithm. A circular search method can be used to select the best hyperparameters. KNN has no model parameters. So here we just need to adjust the hyperparameters. One of the hyperparameters of k-nearest neighbor (kNN) is the selection of k-value and the other is the weight of distance. We take the reciprocal of distance as the weight of distance. In SKLearn, k-nearest Neighbor algorithm has a weights parameter whose default value is UNIFORM. At this time, the weight of distance is not consideredCopy the code
<1> Try to find the best k
Get the highest score k value by traversing
best_score = 0.0
best_k = -1
for k in range(1.11) :
knn_clf = KNeighborsClassifier(n_neighbors=k)
knn_clf.fit(X_train, y_train)
score = knn_clf.score(x_test, y_test)
if score > best_score:
best_k = k
best_score = score
Get the highest k score
print("best_k = " + str(best_k))
print("best_score = " + str(best_score))
Copy the code
<2> Try to find the best K under unform and distance
best_method = ""
best_score = 0.0
best_k = -1
for method in ["uniform"."distance"] :
for k in range(1.11) :
knn_clf = KNeighborsClassifier(n_neighbors=k, weights=method)
knn_clf.fit(X_train, y_train)
score = knn_clf.score(x_test, y_test)
if score > best_score:
best_k = k
best_score = score
best_method = method
print("best_k = " + str(best_k))
print("best_score = " + str(best_score))
print("best_method = " + best_method)
Copy the code