directory

1. Perceptron

Example of perceptron

Weight and threshold

4. Decision model

Five, vectorization

Six, the operation process of neural network

Examples of neural networks

8. Continuity of output


The hottest technology right now is definitely artificial intelligence.

The underlying model of artificial intelligence is a “neural network”. Many complex applications (such as pattern recognition, automatic control) and advanced models (such as deep learning) are based on it. Learning artificial intelligence must start with it.

What is a neural network? Popular explanations seem to be lacking online.

A few days ago, I read Michael Nielsen’s open source textbook Neural Networks and Deep Learning and found it very easy to understand. Now, I will introduce what a neural network is according to this book.

1. Perceptron

Historically, scientists have wanted to mimic the human brain and build machines that can think. Why can man think? The reason, scientists have found, lies in the body’s neural networks.

  • Through nerve endings, external stimuli are converted into electrical signals, which are transduced to nerve cells (also called neurons).
  • Countless neurons make up the nerve center.
  • The nerve center combines signals to make judgments.
  • The body responds to external stimuli according to instructions from the nerve center.

Since thinking is based on neurons, if we can make artificial neurons, we can form artificial neural networks to simulate thinking. In the 1960s, one of the earliest models of artificial neurons, called the Perceptron, was developed and is still used today.

 

The circle in the figure above represents a perceptron. It accepts multiple inputs (x1, x2, x3…) To produce an output, just like a nerve terminal senses various changes in the external environment and finally generates an electrical signal.

To simplify the model, we agree that each input has only two possibilities: 1 or 0. If all the inputs are 1, that means all the conditions are true, and the output is 1; If all the inputs are 0, that means none of the conditions are true, and the output is 0.

Example of perceptron

Let’s look at an example. The city is holding the annual game and animation exhibition, Xiao Ming is not sure whether to go to visit this weekend.

He decided to consider three factors.

  • Weather: Will it be sunny this weekend?
  • Companion: Can you find someone to go with?
  • Price: Is admission affordable?

This constitutes a perceptron. The above three factors are the external inputs, and the final decision is the output of the perceptron. If all three factors are Yes (denoted by 1), the output is 1 (visit); If both are No (denoted by 0), the output is 0 (do not visit).

Weight and threshold

Looking at this, you must ask: If some factors are true and others are not, what is the output? For example, the weather is fine on the weekend and the tickets are not expensive, but Xiao Ming can’t find a companion. Should he go to visit it?

In reality, factors are rarely of equal importance: some are decisive, others secondary. Therefore, these factors can be assigned weights to represent their different importance.

  • Weather: Weight is 8
  • Companion: The weight is 4
  • Price: the weight is 4

The weights above indicate that weather is the determining factor, while company and price are secondary factors.

If all three factors are 1, the sum of their weights is 8 + 4 + 4 = 16. If the weather and price factors are 1 and the companion factors are 0, the sum becomes 8 + 0 + 4 = 12.

In this case, you also need to specify a threshold. The perceptron prints 1 if the sum is greater than the threshold, 0 otherwise. Assuming the threshold is 8, then 12 > 8, Xiao Ming decides to visit. The high and low threshold value represents the strong will, the lower the threshold value means the more want to go, the higher the less want to go.

The above decision making process is expressed mathematically as follows.

 

In the above formula, X represents various external factors and W represents the corresponding weights.

4. Decision model

The individual perceptrons form a simple decision model that is ready to be used. In the real world, the actual decision model is much more complex, consisting of a multi-layer network of perceptrons.

In the figure above, the bottom perceptron receives external input, makes a judgment, and then sends a signal as the input of the top perceptron until the final result is obtained. (Note: The perceptron output is still one, but can be sent to multiple targets.)

In this diagram, the signal is unidirectional, that is, the output of the lower perceptron is always the input of the upper perceptron. In reality, it’s possible to have A recurrent neural network — A thing that passes from A to B, which passes from B to C, which passes from C to A. It’s not covered in this article.

 

Five, vectorization

For the sake of later discussion, the above model needs to be treated mathematically.

  • External factorsx1,x2,x3As the vector<x1, x2, x3>For short,x
  • The weightw1,w2,w3I’ll also write it as a vector(w1, w2, w3)For short,w
  • Define operationsW ⋅ x = ∑ wx, i.e.,w 和 xIs equal to the sum of the products of factors and weights
  • definebIs equal to a negative thresholdb = -threshold

The perceptron model looks like this.

 

Six, the operation process of neural network

To build a neural network, three conditions need to be met.

  • Input and output
  • The weights (w) and thresholds (b)
  • The structure of multilayer perceptron

In other words, you need to draw the picture that appears above.

The most difficult part is determining the weights (w) and thresholds (b). So far, both of these values are subjective, but it’s hard to estimate them in practice, and there has to be a way to find out.

This method is called trial and error. All other parameters being equal, a small change in w (or B) is denoted as δ w (or δ b), and then observe what happens to the output. We repeat this process until we get the set of w and B that correspond to the most accurate output, which is what we want. This process is called model training.

 

Thus, the neural network operates as follows.

  • Determine the inputs and outputs
  • Find one or more algorithms that can derive outputs from inputs
  • Find a data set with known answers to train the model and estimatewandb
  • Once the new data is generated, input into the model, results can be obtained simultaneously onwandbFor correction

As you can see, the whole process requires a lot of computing. As a result, neural networks have only become useful in the last few years, and you can’t do it on a normal CPU, so you have to do it on a GPU that’s made specifically for machine learning.

 

Examples of neural networks

The following is an example of automatic license plate recognition to explain the neural network.

Automatic license plate recognition is when a camera on a highway takes a picture of a license plate and a computer recognizes the number in the picture.

 

In this example, the license plate photo is the input, the license plate number is the output, and the sharpness of the photo can be set to the weight (W). Then, one or more image comparison algorithms are found to act as perceptrons. The result of the algorithm is a probability, such as a 75% probability that the number 1 can be determined. This requires setting a threshold (b) (e.g. 85% confidence) below which results are invalid.

A group of recognized license plate photos were input into the model as training set data. Constantly adjust various parameters until the combination of parameters with the highest accuracy is found. When we get the new photos, we can give the results directly.

8. Continuity of output

 

One problem with the above model is that it is assumed that the output has only two outcomes: 0 and 1. However, a small change in w or B required by the model causes a change in the output. If the output is only 0 and 1, it is too insensitive to guarantee correct training, so the “output” must be transformed into a continuity function.

This requires a bit of simple mathematical manipulation.

First, the calculated result wx + b of the perceptron is denoted as Z.

z = wx + b
Copy the code

Then, calculate the following formula, denoting the result as sigma (z).

σ(z) = 1 / (1 + e^(-z))
Copy the code

This is because if z goes to positive infinity z goes to +∞ (indicating a strong perceptron match) then σ(z) goes to 1; If z approaches negative infinity z → -∞ (indicating a strong perceptron mismatch), then σ(z) → 0. That is, as long as you use sigma (z) as the output, the output becomes a continuity function.

The original output curve looked something like this.

And now it looks like this.

 

In fact, it can be proved that δ sigma satisfies the following formula.

That is, the relationship between δ σ and δ W and δ B is linear, and the rate of change is the partial derivative. And that’s good for figuring out exactly what w and b are.