@[toc]
define
Neural networks are not very complicated, and the simplest neural networks involve simple generalized linear regression and gradient descent.
Application: The application of neural network is very extensive, the simplest is data prediction, also can be used for data prediction, and because of the operation topology of neural network can also be used for classification. For example, if we feature an image and then predict and classify the extracted features, then we can achieve simple image recognition and so on (of course there are convolutional neural networks and so on).
This post is about the simplest neural network algorithm and implementing one manually (based on Python)
The basic principle of
The basic transition
In fact, the simplest way to describe how a neural network works is by brute force.
To take a simple example (using gradient descent to demonstrate unary linear regression) suppose we have a set of numbers
x = [1 2 3 4 5]
The output collection
y = [1 2 3 4 5]
Let’s say we have a relationship y is equal to w times x plus b
Now all we need to do is know the values of w and b.
The easiest thing to do, as you probably know, is to randomly generate the values of w and b, and then we can modify the values of w and b by error. So in order to calculate the error here we need a way to calculate the error here and modify the corresponding value of w b, for example we calculate D(y actual, y predicted) by variance.
So let’s say x is equal to 1 and w is equal to 2 and b is equal to 1 and then after we calculate (and of course we’re actually using the average of x and the average of y)
Y prediction =3 variance = 4
And then we can apply the gradient descent algorithm
This variance is called the loss function Loss = (w*x + b – y)^2
Take the partial of w and b, and then we get
W ‘= w – w partial derivative x step
B prime is equal to the partial derivative of b minus the x step
And then we go into the loop and we end up with w = 1 and b = 0
So it looks like I’m just showing you how to use gradient descent, but if we use a flow chart you can see this thing, right
This is essentially a “neural network” with only one node in the middle.
Basic neural network structure
In this step we can introduce the concept of neural network, we found ourselves in front of the computing when w b is a random initial value (generally can also, of course, the default starting from 0) through a node, then the problem to the fitting for complex, predicted that w b value of the initial result may be different, And one node doesn’t seem to be rigorous enough. So, if we look at biology (actually, I prefer to say that two heads are better than one), we can divide multiple nodes, and then we can take the x value apart according to the weight and throw it into the nodes and then calculate it, and then combine it back according to the weight, And then we end up with the values of W and b for each of the nodes and we end up putting them together again.
So it gets a little bit complicated and it looks like this:
And then you realize that’s not enough and you see a lot of online scaremongers
or
But even though this is the case, the general principle is similar.
implement
Instead of writing my own Dome, I used a simple Dome that someone else had implemented. After all, a hand-written device is still not as good as a ready-made framework. There are a lot of details to consider, but the basic principle is the same.
The target
The neural network that we’re going to simulate this time is
It should also be clear from the previous example that the first thing we need to do is to select a suitable (or guess) fitting equation as the hidden layer.
So it’s called the activation function here
We choose the most common
def sigmoid (x) :
return 1/ (1+numpy.exp(-x))
Copy the code
This is also an activation function that I used directly in mathematical modeling (of course, I made it with MATLAB toolbox).
Use the activation function
Now we’ve chosen an activation function, so what we’re going to do is we’re going to divide X by weight, we’re going to plug in the activation function, and we’re going to print it by weight. Of course, we don’t know the weight, but our training is to determine the weight, determine the weight of each layer, and eventually we can derive a very complicated equation. (Yes, I do the same thing in mathematical modeling, because there is no multi-objective optimization, so directly through the neural network to fit the weight relationship between each target, get a very complex equation and run genetic. Reference: www.nnetinfo.com/text/show/4…
This is what the code looks like in that picture
def feedforward(self,x) :
h1 = x[0]*self.w1+x[1]*self.w2+self.b1
h1f = sigmoid(h1)
h2 = x[0]*self.w3+x[1]*self.w4+self.b2
h2f = sigmoid(h2)
o1 = h1f*self.w5+h2f*self.w6+self.b3
of = sigmoid(o1)
Copy the code
And then we initialize by giving a random value first and then training later
class nerualnetwo() :
def __init__(self) :
self.w1 = numpy.random.normal()
self.w2 = numpy.random.normal()
self.w3 = numpy.random.normal()
self.w4 = numpy.random.normal()
self.w5 = numpy.random.normal()
self.w6 = numpy.random.normal()
self.b1 = numpy.random.normal()
self.b2 = numpy.random.normal()
self.b3 = numpy.random.normal()
Copy the code
Loss function
So this is just comparing the previous example, how do I know that this thing w b is good, so I need to fix it, and I’m using the variance again
def mse_loss(y_tr,y_pre) :
return((y_tr - y_pre)**2).mean()
Copy the code
Feedback optimization (based on gradient descent)
In fact, it is not necessary to use gradient descent to optimize the weight of each node, but this is the easiest to implement, and the best to understand. (Which I can write)
This is where gradient descent comes in. But there’s a difference here
We have two levels of gradient descent:
Input to the hidden layer
Hidden layer to output
So we have two functions that we want to differentiate
def der_sigmoid(x) :
return sigmoid(x)*(1-sigmoid(x))
der_L_y_pre = -2*(y_tr-y_pre)
Copy the code
And notice here that minus 2 times y_tr-y_pre is actually y_pre which is the partial of the y prediction
Because there are two layers and the gradient descent is going to go like this
self.w1 -= learn_rate * der_L_y_pre * der_y_pre_h1 * der_h1_w1
self.b1 -= learn_rate * der_L_y_pre * der_y_pre_h1 * der_h1_b1
Der_h1_b1 = der_sigmoid(valcell[0])
*der_h1_w1 = der_sigmoid(valcell[0])x[0]
** Then this treatment is called partial derivative chain rule!! **
That’s the partial derivative of each with respect to w b
And valcells are H1, H1F, H2, H2F, O1,of
Want to!
Gradient correction
That’s easy to say, but the whole thing is going to go into the cycle, but I’m just going to do it 1,000 times
And this is the heart of the operation
def train(self,data,all_y_tr) :
epochs = 1000
learn_rate = 0.1
for i in range(epochs):
for x , y_tr in zip(data,all_y_tr):
valcell = self.feedforward(x)
y_pre = valcell[5]
der_L_y_pre = -2*(y_tr-y_pre)
der_y_pre_h1 = der_sigmoid(valcell[4])*self.w5
der_y_pre_h2 = der_sigmoid(valcell[4])*self.w6
der_h1_w1 = der_sigmoid(valcell[0])*x[0]
der_h1_w2 = der_sigmoid(valcell[0])*x[1]
der_h2_w3 = der_sigmoid(valcell[2])*x[0]
der_h2_w4 = der_sigmoid(valcell[2])*x[1]
der_y_pre_w5 = der_sigmoid(valcell[4])*valcell[1]
der_y_pre_w6 = der_sigmoid(valcell[4])*valcell[3]
der_y_pre_b3 = der_sigmoid(valcell[4])
der_h1_b1 = der_sigmoid(valcell[0])
der_h2_b2 = der_sigmoid(valcell[2])
# reassign weights and offsets
self.w1 -= learn_rate * der_L_y_pre * der_y_pre_h1 * der_h1_w1
self.w2 -= learn_rate * der_L_y_pre * der_y_pre_h1 * der_h1_w2
self.w3 -= learn_rate * der_L_y_pre * der_y_pre_h2 * der_h2_w3
self.w4 -= learn_rate * der_L_y_pre * der_y_pre_h2 * der_h2_w4
self.w5 -= learn_rate * der_L_y_pre * der_y_pre_w5
self.w6 -= learn_rate * der_L_y_pre * der_y_pre_w6
self.b1 -= learn_rate * der_L_y_pre * der_y_pre_h1 * der_h1_b1
self.b2 -= learn_rate * der_L_y_pre * der_y_pre_h2 * der_h2_b2
self.b3 -= learn_rate * der_L_y_pre *der_y_pre_b3
Output the current loss value every 10 steps
if i % 10= =0 :
y_pred = numpy.apply_along_axis(self.simulate,1,data)
loss = mse_loss (all_y_tr , y_pred)
print(i,loss)
Copy the code
At this point, we know the weight of each layer after the training, so we can participate in the operation
Operation function
def simulate (self,x) :
h1 = x[0]*self.w1+x[1]*self.w2+self.b1
h1f = sigmoid(h1)
h2 = x[0]*self.w3+x[1]*self.w4+self.b2
h2f = sigmoid(h2)
o1 = h1f*self.w5+h2f*self.w6+self.b3
of = sigmoid(o1)
Copy the code
This is actually the model of the operation, and if you train the model and you know the weights, you plug in the equation.
The overall code
import numpy
def sigmoid (x) :
return 1/ (1+numpy.exp(-x))
def der_sigmoid(x) :
return sigmoid(x)*(1-sigmoid(x))
def mse_loss(y_tr,y_pre) :
return((y_tr - y_pre)**2).mean()
class nerualnetwo() :
def __init__(self) :
self.w1 = numpy.random.normal()
self.w2 = numpy.random.normal()
self.w3 = numpy.random.normal()
self.w4 = numpy.random.normal()
self.w5 = numpy.random.normal()
self.w6 = numpy.random.normal()
self.b1 = numpy.random.normal()
self.b2 = numpy.random.normal()
self.b3 = numpy.random.normal()
def feedforward(self,x) :
h1 = x[0]*self.w1+x[1]*self.w2+self.b1
h1f = sigmoid(h1)
h2 = x[0]*self.w3+x[1]*self.w4+self.b2
h2f = sigmoid(h2)
o1 = h1f*self.w5+h2f*self.w6+self.b3
of = sigmoid(o1)
return h1,h1f,h2,h2f,o1,of
def simulate (self,x) :
h1 = x[0]*self.w1+x[1]*self.w2+self.b1
h1f = sigmoid(h1)
h2 = x[0]*self.w3+x[1]*self.w4+self.b2
h2f = sigmoid(h2)
o1 = h1f*self.w5+h2f*self.w6+self.b3
of = sigmoid(o1)
return of
def train(self,data,all_y_tr) :
epochs = 1000
learn_rate = 0.1
for i in range(epochs):
for x , y_tr in zip(data,all_y_tr):
valcell = self.feedforward(x)
y_pre = valcell[5]
der_L_y_pre = -2*(y_tr-y_pre)
der_y_pre_h1 = der_sigmoid(valcell[4])*self.w5
der_y_pre_h2 = der_sigmoid(valcell[4])*self.w6
der_h1_w1 = der_sigmoid(valcell[0])*x[0]
der_h1_w2 = der_sigmoid(valcell[0])*x[1]
der_h2_w3 = der_sigmoid(valcell[2])*x[0]
der_h2_w4 = der_sigmoid(valcell[2])*x[1]
der_y_pre_w5 = der_sigmoid(valcell[4])*valcell[1]
der_y_pre_w6 = der_sigmoid(valcell[4])*valcell[3]
der_y_pre_b3 = der_sigmoid(valcell[4])
der_h1_b1 = der_sigmoid(valcell[0])
der_h2_b2 = der_sigmoid(valcell[2])
self.w1 -= learn_rate * der_L_y_pre * der_y_pre_h1 * der_h1_w1
self.w2 -= learn_rate * der_L_y_pre * der_y_pre_h1 * der_h1_w2
self.w3 -= learn_rate * der_L_y_pre * der_y_pre_h2 * der_h2_w3
self.w4 -= learn_rate * der_L_y_pre * der_y_pre_h2 * der_h2_w4
self.w5 -= learn_rate * der_L_y_pre * der_y_pre_w5
self.w6 -= learn_rate * der_L_y_pre * der_y_pre_w6
self.b1 -= learn_rate * der_L_y_pre * der_y_pre_h1 * der_h1_b1
self.b2 -= learn_rate * der_L_y_pre * der_y_pre_h2 * der_h2_b2
self.b3 -= learn_rate * der_L_y_pre *der_y_pre_b3
if i % 10= =0 :
y_pred = numpy.apply_along_axis(self.simulate,1,data)
loss = mse_loss (all_y_tr , y_pred)
print(i,loss)
if __name__ == "__main__":
data = numpy.array([[-2, -1], [25.6], [17.4], [...15, -6]])
all_y_trues = numpy.array([1.0.0.1])
ner = nerualnetwo()
ner.train(data,all_y_trues)
Copy the code
conclusion
In fact, this is what the basic neural network looks like up here, and at the heart of it are two things and of course there are a lot of details.
The activation function
Auto correct weights
Select the appropriate activation function, activate it, and then use self-correcting methods to minimize the loss function. So as the number of layers of the neural network increases, it becomes more complex, and of course the accuracy is not necessarily proportional to the number of layers, which I tested.
Reference:
zhuanlan.zhihu.com/p/58964140
Blog.csdn.net/Syuhen/arti…
www.nnetinfo.com/text/show/4