“This is the third day of my participation in the First Challenge 2022, for more details: First Challenge 2022”.
preface
The so-called machine learning, in most of the time, is to take the existing model to make some simple modifications and then start “alchemy”, the main job is to adjust the parameter, so the people’s lake called “adjustment paramete” or “alchemy”. Therefore, I would like to sort out and summarize some commonly used machine learning models, on the one hand, as personal learning notes, and on the other hand, for the convenience of friends who click in to copy the code can directly start “refining”, and strive to “out of the box”.
View before hint: the level is limited, small dish chicken is here first to each big guy compensate not π.
The order of carding is basically based on time, which generally conforms to the development process of machine learning algorithm. All models will provide Pytorch implementation and briefly introduce its principle. This paper introduces perceptron, the originator of neural network. Below begins the text π
Preparation for perceptrons
Perceptron, also known as “artificial neuron” or “naive Perceptron”, is the basic unit of neural network. This paper first introduces the basic principle of Perceptron, and then gives the Pytorch realization of Perceptron model combined with specific classification tasks.
1.Rosenblatt
Rosenblatt is the originator of neural network. He proposed the theory of Perceptron in 1957. In 1960, he built a neural network based on hardware. However, this achievement was questioned by Marvin Minksy and Seymour Papert, which made Perceptron quiet for nearly 20 years. It was not until Hinton invented BP algorithm in 1980s that it became popular.
2. Fundamentals
Assuming that the input space (eigenspace) is xβRnx\in R^n xβRn and the output space is yβ{1,β1}y\in\{1,-1\}yβ{1,β1}, then the function from the input space to the output space is: F (x) = sign (wx + b) f (x) = sign (wx + b) f (x) = sign (wx + b) is known as the perceptron. Where WWW is called weight or weight vector, BBB is called bias, and signsignSign is a sign function:
A given data set T = {(x1, y1), (x2, y2),…, xn, yn)} T = \ {(x_1, y_1), (x_2, y_2), \ cdots, (x_n, y_n) \} T = {(x1, y1), (x2, y2),…, xn, yn)}, Then, the classification learning process using perceptron is equivalent to solving the following minimization problem:
Where MMM is a collection of misclassification points, that is to say, the perceptron is driven by misclassification points. For WWW and BBB updates, random gradient descent (SGD) is adopted:
Where, Ξ·\etaΞ· is called the learning rate.
The single-layer perceptron model classifies the toy data
- Guide package
import numpy as np
import matplotlib.pyplot as plt
import torch
%matplotlib inline
Copy the code
- Load the data
data = np.genfromtxt('.. /data/perceptron_toydata.txt', delimiter='\t') X, y = data[:, :2], data[:, 2] y = y.astype(np.int) print('Class label counts:', np.bincount(y)) print('X.shape:', X.shape) print('y.shape:', y.shape)Copy the code
The output is π
Class label counts: [50 50]
X.shape: (100, 2)
y.shape: (100,)
Shuffled the data and randomly divided the training and test sets
Shuffle_idx = np.arange(y.shape[0]) shuffle_rng = NP.random.randomState (123) Shuffle_idx X, y = X[shuffLE_IDx], y[shuffle_IDx] X_train, shuffLE_idx, shuffle_idx, shuffle_idx, shuffle_idx, shuffle_idx, shuffle_idx, shuffle_idx X_test = X[shuffle_idx[:70]], X[shuffle_idx[70:]] y_train, y_test = y[shuffle_idx[:70]], y[shuffle_idx[70:]]Copy the code
After z-Score standardization, the mean value and variance of the standardized data are 0 and 1, and the distribution of characteristic data does not change after standardization.
Linear models generally require data normalization/standardization processing, such as KNN(K Nearest Neighbor), K-means clustering, perceptron and SVM.
Ensemble learning models such as Boosting and Bagging, which are based on decision tree and decision tree, are not sensitive to the size of feature values, such as tree models such as Random forest, XGBoost and LightGBM, as well as naive Bayes. Generally, these models do not need data normalization/standardization.
# Normalize (mean zero, unit variance)
mu, sigma = X_train.mean(axis=0), X_train.std(axis=0)
X_train = (X_train - mu) / sigma
X_test = (X_test - mu) / sigma
Copy the code
Data scatter chart π, it can be clearly divided into two categories.
plt.scatter(X_train[y_train==0, 0], X_train[y_train==0, 1], label='class 0', marker='o')
plt.scatter(X_train[y_train==1, 0], X_train[y_train==1, 1], label='class 1', marker='s')
plt.xlabel('feature 1')
plt.ylabel('feature 2')
plt.legend()
plt.show()
Copy the code
- The model definition
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
def custom_where(cond, x_1, x_2):
return (cond * x_1) + ((~cond) * x_2)
class Perceptron():
def __init__(self, num_features):
self.num_features = num_features
self.weights = torch.zeros(num_features, 1,
dtype=torch.float32, device=device)
self.bias = torch.zeros(1, dtype=torch.float32, device=device)
def forward(self, x):
linear = torch.add(torch.mm(x, self.weights), self.bias)
predictions = custom_where(linear > 0., 1, 0).float()
return predictions
def backward(self, x, y):
predictions = self.forward(x)
errors = y - predictions
return errors
def train(self, x, y, epochs):
for e in range(epochs):
for i in range(y.size()[0]):
# use view because backward expects a matrix (i.e., 2D tensor)
errors = self.backward(x[i].view(1, self.num_features), y[i]).view(-1)
self.weights += (errors * x[i]).view(self.num_features, 1)
self.bias += errors
def evaluate(self, x, y):
predictions = self.forward(x).view(-1)
accuracy = torch.sum(predictions == y).float() / y.size()[0]
return accuracy
Copy the code
- Model training
ppn = Perceptron(num_features=2)
X_train_tensor = torch.tensor(X_train, dtype=torch.float32, device=device)
y_train_tensor = torch.tensor(y_train, dtype=torch.float32, device=device)
ppn.train(X_train_tensor, y_train_tensor, epochs=10)
print('Model parameters:')
print('Weights: %s' % ppn.weights)
print('Bias: %s' % ppn.bias)
Copy the code
The output is π
Model parameters:
Weights: tensor([[1.2734], [1.3464]])
Bias: tensor([-1.])
- Model to evaluate
X_test_tensor = torch.tensor(X_test, dtype=torch.float32, device=device)
y_test_tensor = torch.tensor(y_test, dtype=torch.float32, device=device)
test_acc = ppn.evaluate(X_test_tensor, y_test_tensor)
print('Test set accuracy: %.2f%%' % (test_acc*100))
Copy the code
The output is π
The Test set accuracy: 93.33%
rendering
w, b = ppn.weights, ppn.bias
x_min = -2
y_min = ( (-(w[0] * x_min) - b[0])
/ w[1] )
x_max = 2
y_max = ( (-(w[0] * x_max) - b[0])
/ w[1] )
fig, ax = plt.subplots(1, 2, sharex=True, figsize=(7, 3))
ax[0].plot([x_min, x_max], [y_min, y_max])
ax[1].plot([x_min, x_max], [y_min, y_max])
ax[0].scatter(X_train[y_train==0, 0], X_train[y_train==0, 1], label='class 0', marker='o')
ax[0].scatter(X_train[y_train==1, 0], X_train[y_train==1, 1], label='class 1', marker='s')
ax[1].scatter(X_test[y_test==0, 0], X_test[y_test==0, 1], label='class 0', marker='o')
ax[1].scatter(X_test[y_test==1, 0], X_test[y_test==1, 1], label='class 1', marker='s')
ax[1].legend(loc='upper left')
plt.show()
Copy the code
Multilayer Perceptron model & Handwritten Number Recognition
- Guide package
import time import numpy as np from torchvision import datasets from torchvision import transforms from torch.utils.data import DataLoader import torch.nn.functional as F import torch if torch.cuda.is_available(): torch.backends.cudnn.deterministic = TrueCopy the code
- Parameter Settings
# Device
device = torch.device("cuda:3" if torch.cuda.is_available() else "cpu")
# Hyperparameters
random_seed = 1
learning_rate = 0.1
num_epochs = 10
batch_size = 64
# Architecture
num_features = 784
num_hidden_1 = 128
num_hidden_2 = 256
num_classes = 10
Copy the code
- Load the data
train_dataset = datasets.MNIST(root='data',
train=True,
transform=transforms.ToTensor(),
download=True)
test_dataset = datasets.MNIST(root='data',
train=False,
transform=transforms.ToTensor())
train_loader = DataLoader(dataset=train_dataset,
batch_size=batch_size,
shuffle=True)
test_loader = DataLoader(dataset=test_dataset,
batch_size=batch_size,
shuffle=False)
# Checking the dataset
for images, labels in train_loader:
print('Image batch dimensions:', images.shape)
print('Image label dimensions:', labels.shape)
break
Copy the code
Transforms.totensor () scales the input image to the 0-1 range, and the output is π
Image batch dimensions: torch.Size([64, 1, 28, 28])
Image label dimensions: torch.Size([64])
- The model definition
class MultilayerPerceptron(torch.nn.Module): def __init__(self, num_features, num_classes): super(MultilayerPerceptron, self).__init__() ### 1st hidden layer self.linear_1 = torch.nn.Linear(num_features, Num_hidden_1) # Weight initialization, by default, Initialize self.linear_1.weigh.detach ().normal_(0.0, 0.1) self.linear_bias.detach().zero_() # self.linear_1_BN = torch.nn.BatchNorm1d(num_hidden_1) ### 2nd Hidden Layer Linear_2 = torch. Nn.Linear(num_hidden_1, num_hidden_2) self.linear_weight.detach ().normal_(0.0, Self.linear_bias.bias. Zero_ () ### Output layer self.linear_Out = Torch.nn.Linear(num_hidden_2, Linear_out.weight.detach ().normal_(0.0, 0.1) self.linear_out.bias. Detach ().zero_() def forward(self, linear_out.bias. x): out = self.linear_1(x) out = F.relu(out) #out = self.linear_1_bn(out) out = self.linear_2(out) out = F.relu(out) #out = F.dropout(out, p=dropout_prob, training=self.training) logits = self.linear_out(out) probas = F.log_softmax(logits, dim=1) return logits, probas torch.manual_seed(random_seed) model = MultilayerPerceptron(num_features=num_features, num_classes=num_classes) model = model.to(device) optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)Copy the code
In the # notation in the code above, BatchNorm speeds up deep network training by reducing the internal covariable offset, and Dropout uses samples from Bernoulli distributions to randomly zero some elements of the input tensor with probability P, Is a common way to deal with overfitting.
- Model training
def compute_accuracy(net, data_loader):
net.eval()
correct_pred, num_examples = 0, 0
with torch.no_grad():
for features, targets in data_loader:
features = features.view(-1, 28*28).to(device)
targets = targets.to(device)
logits, probas = net(features)
_, predicted_labels = torch.max(probas, 1)
num_examples += targets.size(0)
correct_pred += (predicted_labels == targets).sum()
return correct_pred.float()/num_examples * 100
Copy the code
Calculation accuracy β
start_time = time.time()
minibatch_cost = []
epoch_acc = []
for epoch in range(num_epochs):
model.train()
for batch_idx, (features, targets) in enumerate(train_loader):
features = features.view(-1, 28*28).to(device)
targets = targets.to(device)
### FORWARD AND BACK PROP
logits, probas = model(features)
cost = F.cross_entropy(logits, targets)
optimizer.zero_grad()
cost.backward()
### UPDATE MODEL PARAMETERS
optimizer.step()
### LOGGING
minibatch_cost.append(cost)
if not batch_idx % 50:
print ('Epoch: %03d/%03d | Batch %03d/%03d | Cost: %.4f'
%(epoch+1, num_epochs, batch_idx,
len(train_loader), cost))
with torch.set_grad_enabled(False):
acc = compute_accuracy(model, train_loader)
epoch_acc.append(acc)
print('Epoch: %03d/%03d training accuracy: %.2f%%' % (
epoch+1, num_epochs, acc))
print('Time elapsed: %.2f min' % ((time.time() - start_time)/60))
print('Total Training Time: %.2f min' % ((time.time() - start_time)/60))
Copy the code
Visualization of training process
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
plt.plot(range(len(minibatch_cost)), minibatch_cost)
plt.ylabel('Train loss')
plt.xlabel('Minibatch')
plt.show()
plt.plot(range(len(epoch_acc)), epoch_acc)
plt.ylabel('Train Acc')
plt.xlabel('Epoch')
plt.show()
Copy the code
β is wrong because every element of minibatch_cost is a tensor with a gradient. You can’t translate it into Numpy.
minibatch_cost = [a.detach().numpy() for a in minibatch_cost]
Copy the code
The loss and accuracy changes of running 50 epochs are shown below π
- Model to evaluate
Accuracy on the test set
print('Test accuracy: %.2f%%' % (compute_accuracy(model, test_loader)))
Copy the code
The results are as follows π
The Test accuracy: 98.04%
for features, targets in test_loader:
break
_, predictions = model.forward(features[:4].view(-1, 28*28))
predictions = torch.argmax(predictions, dim=1)
predictions = predictions.tolist()
fig, ax = plt.subplots(1, 4)
for i in range(4):
ax[i].imshow(features[i].view(28, 28), cmap=matplotlib.cm.binary)
ax[i].set_title("Predicted:" + str(predictions[i]))
plt.show()
Copy the code
β€οΈ thank you
Thank you all for reading this, if you find it helpful:
- Please support it, so that more people can see this content (no one likes the dish chicken can be too difficult π€‘, big guys do not spray -_-)
- Share your thoughts with me in the comments section, and record your thought process in the comments section.
Thank you again for your encouragement and support πΉπΉπΉ