Hello and welcome to our deep learning section.

Our previous machine learning topic is over, and we have gone over the algorithms and models commonly used in machine learning, as well as their principles and implementations. Although there are some technologies, such as Markov, hidden Markov, conditional random field and so on are not covered. But these contents are relatively weak and not very frequently used, so we won’t go into them all, but if you are interested, you can do your own research. I imagine GBDT, SVM these models can understand, those models must be no problem.

Introduction to Deep Learning

Deep learning has become so popular in recent years that many people associate it with artificial intelligence. The depth here does not mean the depth of learning, nor does it mean that the concept is very deep, but simply refers to the “deep” neural network composed of multiple layers of neural network.


In fact, whether it is neural network or deep neural network or deep learning algorithms, in fact, there have been many decades of history. For example, back propagation algorithm, the well-known core algorithm of deep learning, was first proposed in 1989, which has been more than 30 years ago. It’s not a short history, but the field has had its ups and downs before, and while it has had its ups and downs, it’s only in the last few years that it’s taken off and become almost a household word.

The reason for this is simply that the complexity of training neural networks is simply too large to be supported by previous computing resources. So back then, deep learning was a niche field. There are few people doing research, and funding is limited. Until the last few years, the rapid development of computing resources, coupled with the invention of dark technologies using GPU-accelerated neural network training, has made the impossible possible. In addition, the epoch-making events like AlphaGo’s victory over Lee Sedol have attracted a lot of resources and attention for a while, and the industry has naturally been thriving.

Since deep learning is about neural network technology, the content of this special article will not be divorced from the scope of neural network. Today we’re going to start with the most basic neuron of neural networks, the perceptron.

perceptron

Perceptron is the smallest component of neural network. We know that in natural organisms, the smallest unit of a nerve is a neuron, and a neuron is a cell, and it looks something like this. I think most of you have seen a picture of a neuron in your biology textbook.


Perceptrons can be understood as the bionic structure of neurons. As we all know, one neuron can connect multiple nerve cells, and these neurons form a huge network with each other. Neurons can send electrical signals and chemicals to each other, creating a complex set of reactions. The human brain can be understood as an extremely large and complex neural network, but exactly how the human brain and biological networks of neurons work is still unclear.

But deep learning models built by computer scientists to simulate neurons and neural networks have achieved some remarkable results. The perceptron can be thought of as an abstraction of a neuron, and we know that one neuron gets signals from several other neurons, and it sends signals to another neuron. We take the incoming signal as the input and the outgoing signal as the output, and we get the structure of the perceptron.


Here we make the structure as simple as possible, we treat the input signals as floating point numbers, each signal has its own weight, all the signals will add up, and finally through a mapping function output signal.

Let’s look at the simplest perceptron for an example:


The perceptron has only two inputs, which we can easily get. Where y is a floating point number, we can apply a sign function to y. The so-called sign function even classifies according to thresholds. Let’s say we set a thresholdIf the, then, otherwise.

It’s a neuron and a perceptron and all that, but the principle is very simple, and let’s write down the formula:Here,calledActivate function. The activation function in our example above is the sign function, which classifies by threshold. There are three kinds of activation functions commonly used in neural networks, one is sigmoid function which we are familiar with before, one is RELu function, and the other is TANH function. Let’s take a look at each of them.


The equation of the RELu function is simple:. But it works very well, tooOne of the most commonly used activation functions in neural networks. It converges faster than sigmoID, and the reason for that is very simple, because the sigmoid function is very, very smooth on both sides of the graph, so the derivative is very close to zero, and the natural convergence is slow as we go down the gradient. We mentioned this earlier when we introduced the sigmoid function.


So if you look graphically, the tanh function is very similar to the sigmoid function, but one slight difference is that they have different ranges, so sigmoid is 0 to 1, and tanh is -1 to 1. Although it seems that only the range is different, the difference is still quite big. On the one hand, the sensitive interval between the two is different. Tanh has a larger sensitivity range, and the sigmoID output is all positive, which can cause problems in some scenarios.

As you can see, the perceptron is really a linear equation with an activation function on it. Logistic regression model can also be viewed as a perceptron to some extent. One of the things you might wonder about is, why do you need an activation function at the end of a linear equation? How about if you don’t?

The answer is no, and the reason is simple, because when we put multiple neurons together to form a network. If the computation of each neuron is purely linear, then the whole neural network is actually equivalent to a linear operation. This can be proved mathematically, so we need to add something to the perceptron to make it not purely linear.

Perceptrons and logic circuits

Finally, let’s look at a practical example. The simplest example, of course, is the and or gate in a logic circuit. And gates, or gates, not gates are actually similar, there are two inputs and one output.

Let’s take the and gate as an example:


We’re all familiar with binary and operations, where you have to have both numbers 1 to get a 1. It’s also very easy to write a perceptron like this:

def AND(x1, x2):
    x = np.array([x1, x2])
    w = np.array([0.5.0.5])
    theta = 0.7
    return w.dot(x) > theta
Copy the code

In the same way we can write or gates or not gates, and that’s not too difficult. But there’s also one case that we can’t solve, and that’s the xOR case. Let’s start with the truth table for xOR:


We cannot segment xOR data through a perceptron, which was introduced in the SVM model before, because xor data is not linearly separable, that is to say, we cannot segment it through a dividing line.


However, the xOR problem is not unsolvable, we can not separate the perceptron, but we can connect the perceptron in series to form a simple neural network, the problem becomes solvable. Suppose we have implemented the and gate, or gate, and not gate once. We can implement an XOR gate like this:

def XOR(x1, x2):
    # gate
    s1 = NAND(x1, x2)
    s2 = OR(x1, x2)
    return AND(s1, s2)
Copy the code

The structure of the perceptron looks like this:


TensorFlow website provides a web application called Playground, which allows us to intuitively feel the composition of neural network and its training process. For example, in the xOR problem, we can set the number of layers of the neural network and how many neurons each layer contains. Moreover, the activation function of neurons can be set, so that the training process of neural network and the function of various parameters can be intuitively experienced.


You can click on the link below to jump to the experiment by yourself, that is to say, the simple perceptron of course has weak fitting ability, but when multiple neurons are combined to form a neural network, it becomes very powerful, and you can deeply learn the nonlinear relationship in the data. The learning of nonlinear relationship has always been a very troublesome problem in the era of machine learning, which has been well solved by neural network. The reason why neural network is better than ordinary machine learning model is that, on the one hand, in addition to its more parameters and stronger fitting ability, it is very important that it can learn nonlinear relations in data very well.

TensorFlow-Playground

So that’s pretty much the end of the perceptron. Perceptron is the basis of neural network, in fact, there is not much content, both in structure and form is very similar to the logical regression introduced before machine learning, there are not too many technical difficulties and points, you just need to intuitively feel it.

I sincerely wish you all a fruitful day. If you still like today’s content, please join us in a three-way support.

Original link, ask a concern

This article is formatted using MDNICE

– END –