Today, we’ll take a closer look at and implement eight top-level Python machine learning algorithms.

Let’s begin our tour of machine learning algorithms in Python programming.

8 Python Machine Learning algorithms – You must Learn

Here are the Python machine learning algorithms:

1. Linear regression Linear regression is one of the supervised Python machine learning algorithms that looks at continuous features and predicts results. Depending on whether it runs on a single variable or on many features, we can call it simple linear regression or multiple linear regression.

This is one of the most popular and often underrated Python ML algorithms. It assigns optimal weights to variables to create lines ax + b to predict output. We often use linear regression to estimate actual values, such as house calls and house costs based on continuous variables. The regression line is the best line to fit Y = a * X + b and represents the relationship between independent variables and dependent variables.

Are you familiar with the Python machine learning environment setup?

Let’s plot this for the diabetes dataset.

Import matplotlib.pyplot as PLT import NUMpy as NP from sklearn import data set linear_model from sklearn.metrics import mean_squared_error, R2_score Diabetes = data set. Load_diabetes () diabetes_X = diabetes.data [:, np.newaxis, 2] diabetes_X_train = diabetes_X [: Diabetes_X_test = diabetes_X [-30:] diabetes_y_train = diabetes.target [: #splitting targets are divided into training and test sets diabetes_Y_test = diabetes.target [-30:] regr = linear_model. LinearRegression () # LinearRegression object regr. Fit (diabetes_X_train, diabetes_y_train) #Use training set to train model LinearRegression (copy_X = True, fit_intercept = True, N_jobs = 1, normalize = False)

Diabetes_y_pred = regr. Coef_ array ([941.43097333]) #Make diabetes_X_test

Mean_squared_error (diabetes_Y_test, diabetes_Y_pred) 3035.0601152912695

R2_score (diabetes_Y_test, diabetes_Y_pred) #Variance 0.410920728135835

PLT. Scattering (diabetes_X_test diabetes_y_test, color = ‘lavender’) < matplotlib. Collections. PathCollection object in 0 x0584ff70 >

PLT. <matplotlib.lines.Line2D object at 0x0584FF30>]

PLT. Xticks (()) ([], <a 0 of text xticklabel objects>)

PLT. Yticks (()) ([], <a 0 of text yticklabel objects>)

PLT. Show () 8 top Python machine learning algorithms – You must learn Python machine learning algorithms – linear regression

Logistic regression is a supervised classification Python machine learning algorithm that can be used to estimate discrete values such as 0/1, yes/no and true/false. This is based on a given set of independent variables. We use logical functions to predict the probability of events, which gives an output between 0 and 1.

Although it says’ regression ‘, this is actually a classification algorithm. Logistic regression fits data into logit functions, also known as Logit regression. Let’s picture it.

Import matplotlib.pyplot as PLT from sklearn import Linear_model XMIN, XMAX = -7, 7 #TEST set; The gaussian noise line n_samples = 77 NP.random. Seed (0) x = Np.random. Normal (size = n_samples) y = (x> 0). Astype (np.float) x [x> 0] * = 3 x + =. 4 * np. The random. Normal (size = n_samples) x = x [:, Np.newaxis] CLF = linear_model. LogisticRegression (C = 1e4) #Classifier CLF Suitable for (x, y) PLT. Figure (1, figsize = (3, 4)) < figsize and 300×400 0 axis >

PLT. CLF () PLT. Scattering (X. Break yarn (), Y, color = ‘lavender’, ZORDER = 17) < matplotlib. Collections. PathCollection object in 0 x057b0e10 >

X_test = np. Linspace (-7, 7, 277) def model (x) : returns 1 / (1 + NP. E to the minus x.

Loss = model (x_test * clF.coef_ + clF.intercept_). Ravel () PLT. Plot (x_test, loss, color = ‘pink’, lineWidth = 2.5) [<matplotlib.lines.Line2D object located at 0x057BA090>]

Ols = linear_model. LinearRegression () ols. LinearRegression for (x, Y) LinearRegression (copy_X = True, FIT_Intercept = True, n_Jobs = 1, Normalize = False)

PLT. Plot (x_test, ols.coef_ * X_test + ols.intercept_, lineWidth = 1) [<matplotlib.lines.Line2D object at 0x057BA0B0>]

PLT. Axhline (. 4, color = “0.4′) <matplotlib.lines.Line2D object at 0x05860E70>

PLT. Ylabel (‘y’) text (0,0.5, ‘y’)

PLT. Xlabel (‘x’) text (0.5,0, ‘x’)

PLT. Xticks (range (-7, 7)) PLT. Yticks ([0, 0.4, 1]) PLT. Ylim (-. 25,1.25) (-0.25,1.25)

PLT. XLIM (-4,10) (-4,10)

PLT. Legend ((‘Logistic regression ‘, ‘linear regression’), loc = ‘lower right’, fontsize = ‘small’) <matplotlib.legend.Legend object located at 0x057C89F0>

PLT. Show () 8 top Python machine learning algorithms – you must learn machine learning algorithms – Logistic Regreesion

3. Decision tree Decision trees are part of supervised Python machine learning and are used for classification and regression – although primarily for classification. The model takes an instance, traverses the tree, and compares important features to identified conditional statements. Whether you descend to the left subbranch or the right subbranch depends on the outcome. Often, the more important functions are closer to the root.

This Python machine learning algorithm can work on classification and continuous dependent variables. Here we divide the population into two or more homogeneous sets. Let’s look at the algorithm

Cross_validation import train_test_split from sklearn.tree import DecisionTreeClassifier from sklearn.metrics import Accuracy_score comes from sklearn.metrics import Classification_report def importData () : #Importing data balance_data = PD. Read_csv (‘ archive.ics.uci.edu/ml/machine-… Database/balance-scale/balance-scale.data’, databases/balance-scale.data’, Header = None) print (len (balance_data)) print (balance_data.shape) Print (balance_data). Header ()) return balance_data

Def splitdataset (balance_data) : # splitdataset x = balance_data. Values [:, 1:5] y = balance_data. Values [: , 0] x_train, x_test, y_train, y_test = train_test_split (x, y, test_size = 0.3, random_state = 100) Y_train, y_test

Def train_using_gini (x_train, x_test, y_train) : # DecisionTreeClassifier (Criterion = “gini”, random_state = 100, max_depth = 3, Min_samples_leaf = 5) clf_gini. Fit (x_train, y_train) return clf_gini

Def train_using_entropy (x_train, x_test, y_train) : #Training with entropy clf_entropy = DecisionTreeClassifier (criterion = “entropy”, random_state = 100, Max_depth = 3, min_samples_leaf = 5) clf_entropy. Fit (x_train, y_train) returns clf_entropy

Def forecast (x_test, clf_object) : # make forecast y_pred = clf_object Print (f “prediction: {y_pred}”) returns y_pred

Def cal_accuracy (y_test, y_pred) : Print (confusion_matrix (y_test, y_pred)) print (accuracy_score (y_test, Y_pred) * 100) print (classification_report (y_test, y_pred)

Data = importData () 625

(625, 5)

0, 1, 2, 3, 4

0 B 1 1 1 1

1 R 1 1 1 2

2 R 1 1 1 3

3 R 1 1 1 4

4 R 1 1 1 5

X, y, x_train, x_test, y_train, y_test = splitdataset (data) clf_gini = train_using_gini (x_train, x_test, Y_train) clf_entropy = train_using_entropy (x_train, x_test, y_train) y_pred_gini = prediction (x_test, Clf_gini) 8 top Python machine learning algorithms – You must learn Python machine learning algorithms – decision tree

Cal_accuracy (y_test, y_pred_gini) [[0 6 7]

[18] 0 67

19 71 [0]]

73.40425531914893

8 Top Python Machine learning algorithms – You must learn Python Machine learning algorithms – Decision tree

Y_pred_entropy = prediction (X_test, clf_entropy) 8 top Python machine learning algorithms – you must learn Python machine learning algorithms – decision tree

Cal_accuracy (y_test, y_pred_entropy) [[0 6 7]

[22] 0 63

[0, 20, 70]]

70.74468085106383

8 Top Python Machine learning algorithms – You must learn Python Machine learning algorithms – Decision tree

4. Support Vector machine (SVM) SVM is a supervised classification Python machine learning algorithm that draws a line dividing different categories of data. In this ML algorithm, we compute vectors to optimize lines. This is to ensure that the nearest points in each group are farthest from each other. Although you’ll almost always find that this is a linear vector, it might not be that.

In this Python machine learning tutorial, we plot each data item as a point in n-dimensional space. We have n features, and each feature has a value of some coordinate.

First, let’s draw a data set.

Samples_generator import make_blobs X, y = make_blobs (n_samples = 500, centers = 2, Random_state = 0, cluster_std = 0.40)

Import matplotlib.pyplot as PLT PLT. [: scatter (x, 0], [:, 1), x = y c, s = 50, cmap = ‘plasma’) in 0 x04e1bbf0 < matplotlib. Collections. PathCollection object >

PLT. Show () 8 top Python Machine learning algorithms – You must learn Python Machine Learning algorithms – SVM

Import numpy as np xfit = NP. Linspace (-1, 3 0.5) PLT. [: scatter (X, 0], [:, 1), X = Y c, s = 50, cmap = ‘plasma’) < matplotlib. Collections. PathCollection object in 0 x07318c90 >

For M, B, d in [(1, 0.65, 0.33), (0.5, 1.6, 0.55), (-0 0.2, 2 0.9, 0.2)] : yfit = M * xfit + B PLT. Plot (xfit, yfit, ‘-k’) PLT. Fill_between (xfit, yfit -d, yfit + d, edgecolor = ‘none’, color = ‘# AFFEDC’, Alpha = 0.4) [<matplotlib.lines.Line2D object at 0x07318FF0>]

< matplotlib. Collections. PolyCollection object in 0 x073242d0 >

[<matplotlib.lines.Line2D object at 0x07318B70>]

< matplotlib. Collections. PolyCollection object in 0 x073246f0 >

[<matplotlib.lines.Line2D object at 0x07324370>]

< matplotlib. Collections. PolyCollection object in 0 x07324b30 >

PLT. XLIM (-1,3.5) (-1,3.5)

PLT. Show () 8 top Python Machine learning algorithms – You must learn Python Machine Learning algorithms – SVM

Naive Bayes is a classification method based on Bayes’ theorem. This assumes independence between the predictors. Naive Bayes classifiers will assume that features in a class are independent of any other features. Consider a fruit. This is an apple if it is round, red and 2.5 inches in diameter. The naive Bayes classifier will say that these characteristics independently contribute to the probability of a fruit becoming an apple. This is true even if the functions are interdependent.

For very large data sets, it is easy to construct naive Bayesian models. This model is not only very simple, but also performs better than many highly complex classification methods. So let’s set this up.

Naive_bayes imports GaussianNB from sklearn. Naive_bayes imports MultinomialNB from sklearn. Metrics import Confusion_matrix comes from sklearn.model_selection import train_test_split iris = dataset. Load_iris () x = iris.data y = iris.target x_train, x_test, y_train, y_test = train_test_split (x, y, test_size = 0.3, GNB = GaussianNB () MNB = MultinomialNB () y_pred_gnb = GNB. Fit (x_train, y_train). Cnf_matrix_gnb = confusion_matrix (y_test, y_pred_gnb)

,18,0 [0],

[0,0,11], dtype = int64)

Y_pred_mnb = MNB. Fit (x_train, y_train). Prediction (x_test) CNf_matrix_mnb = confusion_matrix (y_test, y_pred_mnb)

,0,18 [0],

[0,0,11], dtype = int64)

6. KNN (K-nearest Neighbors) is a Python machine learning algorithm for classification and regression – primarily for classification. This is a supervised learning algorithm that considers different centers of mass and uses the usual Euclidean functions to compare distances. It then analyzes the results and categorizes each point into groups to optimize it to place all the closest points. It uses the majority vote of its neighbor K to classify new cases. The case where it is assigned to a class is the most common of its K nearest neighbors. To do this, it uses the distance function.

I. Train and test the entire data set

Datasets import load_iris iris = load_iris () x = iris.data y = iris.target from sklearn.linear_model import LogisticRegression logreg = LogisticRegression () logreg. Fit (x, y) LogisticRegression (C = 1.0, class_weight = None, dual = False, fit_Intercept = True,

Intercept_scaling = 1, max_iter = 100, multi_class =’ OVr ‘, n_jobs = 1,

Penalty =’l2′, random_state = None, solver =’liblinear’, tol = 0.0001,

Verbose = 0, warm_start = False)

Logreg. Prediction (x) array ([0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,

0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,

0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1

2,1,1,1,2,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,1,1,

1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,

2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,1,2,2,

2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2]]

Y_pred = logreg. Prediction (x) Len (y_pred) 150

Import metrics metrics from Sklearn. Accuracy_score (Y, y_pred) 0.96

KNN = KNeighborsClassifier (n_neighbors = 5) KNN KNeighborsClassifier (algorithm =’auto’, leaf_size = 30, metric =’minkowski’,

Metric_params = None, n_jobs = 1, n_neighbors = 5, p = 2,

Weight = “even”)

Y_pred = KNN. Prediction (X) indicator. Accuracy_score (y, y_pred) 0.96666666666667

KNN = KNeighborsClassifier (n_neighbors = 1) KNN. KNeighborsClassifier (algorithm =’auto’, leaf_size = 30, metric =’minkowski’,

Metric_params = None, n_jobs = 1, n_neighbors = 1, p = 2,

Weight = “even”)

Y_pred = KNN. Prediction (X) indicator. Accuracy_score (Y, y_pred) 1.0

II. Split into trains/tests

X.s hape (150, 4)

Y.s hape (150).

Cross_validation import train_test_split x.shape from sklearn.cross_Validation import train_test_split x.shape

Y.s hape (150).

Cross_validation import train_test_split x_train, x_test, y_test, y_test = train_test_split (x, y, x) Test_size = 0.4, random_state = 4) x_train. Shape (90,4)

X_test. Shape (60, 4)

Y_train. Shape (90).

Y_test. Shape (60)

Logreg = LogisticRegression () logreg. Fit (x_train, y_train) y_pred = KNN. Prediction (X_test) indicators. Accuracy_score (y_test, Y_pred) 0.96666666666667

KNN = KNeighborsClassifier (n_neighbors = 5) KNN. For (x_train, y_train) KNeighborsClassifier (algorithm =’auto’, leaf_size = 30, metric =’minkowski’,

Metric_params = None, n_jobs = 1, n_neighbors = 5, p = 2,

Weight = “even”)

Y_pred = KNN. Prediction (X_test) indicators. Accuracy_score (y_test, Y_pred) 0.96666666666667

K_range = Range (1, 26) Score = [] for K in k_range: KNN = KNeighborsClassifier (N_Neighbors = K) KNN. Fit (x_train, y_train) y_pred = KNN. Prediction (X_test) score. Append (index. Accuracy_score (y_test, y_pred)

Scores [0.95, 0.95, 0.96666666666667, 0.96666666666667, 0.96666666666667, 0.983333333333333333, 0.983333333333333333, 0.9833333333333333, 0.9833333333333333, 0.9833333333333333, 0.9833333333333333, 0.9833333333333333, 0.9833333333333333, 0.983333333333333333, 0.9833333333333333, 0.9833333333333333, 0.983333333333333333, 0.966666666667, 0.9833333333333333, 0.96666666666667, 0.96666666666667, 0.96666666666667, 0.96666666666667 0.95, 0.95]

Import matplotlib.pyplot as PLT PLT. Plot (k_range, score) [< matplotlib.lines.line2d object at 0x05FDECD0>]

PLT. Xlabel (‘k for kNN’) text (0.5,0, ‘k for kNN’)

PLT. Ylabel (‘ test accuracy ‘) text (0,0.5, ‘test accuracy’)

PLT. KNN (K-nearest Neighbors) – kNN (K-nearest Neighbors)

Read Python statistics – P values, correlation, T tests, KS tests

7. K-means K-means is an unsupervised algorithm that can solve the clustering problem. It uses a number of clusters to sort the data. Data points in a class are isomorphic and heterogeneous with a homogeneous group.

Import numpy as NP import matplotlib.pyplot as PLT from matplotlib import style style. Import KMeans X = [1, 5, 1 0.5, 8, 1, 9] Y = [2, 8, 1.7, 6, 0 0.2, 12] PLT from sklearn.cluster using (‘ggplot’). Scatter (x, y) < matplotlib. Collections. PathCollection object in 0 x0642af30 >

X = np. The array ([[1, 2], [5, 8], [1.5, 1 0.8], [8, 8], [1, 0 0.6], [9, 11]]) Kmeans = Kmeans (N_clusters = 2) Kmeans. KMeans (algorithm =’auto’, copy_x = True, init =’k-means ++’, max_iter = 300,

N_clusters = 2, N_init = 10, n_jobs = 1, Precompute_Accommodate =’auto’,

Random_state = none, tol = 0.0001, verbose = 0)

Centroids = kmeans. Cluster_centers_ Labels = Kmeans. Labels_ Centroids array ([1.16666667,1.46666667],

/ 7.33333333, 9. ])

Tag array ([0,1,0,1,0,1])

Colors = [‘g. ‘, ‘r. ‘, ‘c. ‘, ‘eh.’] for I in range (len (x)) : print (x [I], labels [I]) PLT. Plot (x [I] [0], x [I] [1], colors [Labels [I]], MarkerSize = 10) [1.2.] 0

[<matplotlib.lines.Line2D object at 0x0642AE10>]

[5. 8.] 1

[<matplotlib.lines.Line2D object at 0x06438930>]

[1.5 1.8] 0

[<matplotlib.lines.Line2D object at 0x06438BF0>]

[8. 8.] 1

[<matplotlib.lines.Line2D object at 0x06438EB0>]

[0.6] 1. 0

[<matplotlib.lines.Line2D object at 0x06438FB0>]

[9. 11.] 1

[<matplotlib.lines.Line2D object at 0x043B1410>]

PLT. Scatter (Centroids [:, 0], Centroids [: = ‘x’, 1], marker, s = 150, linewidths = 5, zorder = 10) < matplotlib. Collections. PathCollection object in 0 x043b14d0 >

PLT. Show () 8 top Python machine learning algorithms – You must learn 8. Random Forest A Random Forest is a collection of decision trees. To classify each new object according to its attributes, trees vote for classes – each tree provides a classification. The category with the most votes is at Random

To win.

Import numpy as NP and import PyLab as pl x = NP.random. Uniform (1, 100, 1000) y = NP. Log (x) + Np.random. Normal (0,. 3, 1000) pl. Scatter (x, y, s = 1, label = ‘log (x) with noise’) < matplotlib. Collections. PathCollection object, located at 0 x0434ec50 >

Pl. Plot (NP. Popularity index (1, 100), NP. Log (NP. Popularity index (1, 100)), c = ‘B’, flag = ‘log (x) function true’) [< matplotlib.lines.line2d object at 0x0434EB30>]

Pl. Xlabel (‘x’) text (0.5,0, ‘x’)

Pl. Ylabel (‘f (x) = log (x) ‘) text (0,0.5, ‘f (x) = log (x) ‘)

Pl. Legend (loc = ‘best’) <matplotlib.legend.Legend object, located at 0x04386450>

Pl. Title (‘ Basic Logging function ‘) text (0.5,1, ‘Basic logging function’)

Pl. Show () 8 top Python machine learning algorithms – You must learn Python machine learning algorithms –

Datasets import load_iris from sklearn. Ensemble import RandomForestClassifier import PANDAS as pd import numpy as np iris = load_iris () df = pd. DataFrame (iris.data, columns = iris.feature_names) df [‘is_train’] = np.random Uniformly (0, 1, LEN (DF)) <=. 75 df [‘species’] = pd.categorical. From_codes (iris.target, iris.target_names) df Sepal length (cm) Sepal width (cm)… Is_train species

0 5.1 3.5… The real setosa

1 4.9 3.0… The real setosa

2, 4.7 3.2… The real setosa

3 4.6 3.1… The real setosa

4 5.0 3.6… False setosa

[5 rows x 6 columns]

Train, test = df [df [‘is_train’] == True], df [df [‘is_train’] == False] features = df.columns [: 4] CLF = RandomForestClassifier (n_jobs = 2) y, _ = pd Factorize (train [‘species’]) CLF. Class_weight = None, criterion =’gini’, RandomForestClassifier (bootstrap = True, class_weight = None, criterion =’gini’,

Max_depth = None, max_features =’auto’, max_leaf_nodes = None,

Decrease = 0.0, min_impurity_split = none,

Min_samples_leaf = 1, min_samples_split = 2,

Min_weight_fraction_leaf = 0.0, n_estimators = 10, n_jobs = 2,

Oob_score = False, random_state = None, verbose = 0,

Warm_start = FALSE)

Preds = iris.target_names [CLF. Predict (test [feature])] pd. Crossovers (test [‘species’], preds, Rownames = [‘actual’], colnames = [‘preds’]) preds setosa versicolor virginica

The actual

setosa 12 0 0

versicolor 0 17 2

virginica 0 1 15

So, that’s the Python machine learning algorithm tutorial. I hope you like it.

So today we’ve discussed eight important Python machine learning algorithms. Which one do you think has the most potential? Hope you pay more attention to, more wonderful articles to bring you! If you are interested in big data, you can follow my wechat official number: Big data Technical Engineer

Inside will share some wonderful articles every day, but also big data basis and project actual combat, Java interview skills, Python learning materials and so on to provide everyone to learn for free, reply to the keyword can receive oh