Sequence article above [the category of the article on the quick reference machine learning (Python code)] (www.jianshu.com/p/fbe59dc46)… Logistic regression was mentioned and briefly introduced. This paper will be from neuron to logistic regression model structure, and extend it to deep deep network model.

First, talk about wisdom

As for the exploration of the mystery of human wisdom, people of different times and academic backgrounds have different ideas on the understanding of wisdom and its realization methods. Some people advocate using explicit logic system to build artificial intelligence system, namely symbolism. Connectionism, which is the direction of our discussion in this article, has been advocated for using mathematical models to simulate the composition of the brain in order to achieve intelligence.

So why does the brain think? The reason, scientists found, lies in the body’s neural networks, which are basically made up of neurons:

1. External stimuli are converted into electrical signals through the nerve endings of neurons and transmitted to neurons.

2. The dendrites of neurons receive electrical signals, which are processed by neurons to determine whether the activation threshold is reached and then output excitatory or inhibitory signals. Finally, axons transmit signals to other cells.

Countless neurons make up the nerve center. The nerve center combines signals to make judgments.

The human body responds to external stimuli according to the instructions of the nerve center.

Ii. Neurons

Since the basis of intelligence is neuron, and it is because of these characteristics of neuron that the brain has a strong “operation and decision-making ability”, scientists invent the mathematical model of artificial neuron on this principle, and combine artificial neural network model on the basis of neuron. (Note: The neurons mentioned below refer specifically to artificial neurons.)

The basic structure of an artificial neuron is shown above. It can input input of a certain dimension (e.g., 3-dimensional input, X1, x2, x3), and each input should be multiplied by the corresponding weight value (e.g., W0,w1,w2). The function multiplied by each weight value can be regarded as the weighting of each input, that is, the neuron of each input attaches different importance to it.

The neuron then multiplizes each input of the weight to make a sum (i.e. a weighted sum), and adds the intercept term (the intercept term B can be regarded as a direct adjustment to the neuron threshold), and finally the activation function (f) non-linear conversion to the final output value.

There are many types of activation functions, such as Sigmoid, TANh, Sign, Relu, Softmax, etc. (activation functions will be discussed in the next topic). The function of the activation function is to implement a nonlinear operation on neurons, in accordance with the general universal approximation theorem — “If a feedforward neural network has a linear output layer and at least one hidden layer, as long as the network is given a sufficient number of neurons, It can be realized to approximate any continuous function on ℝn Compact subset with high enough precision “, which shows that activation function is the premise of deep neural network learning to fit any function.

Third, neurons to logistic regression

When a single neuron and its activation function is sigmoID, it is not only the model structure of logistic regression known to us.Logistic regression is a generalized linear classification model and its model structure can be regarded asA single layer neural network, which consists of an input layer and an output layer of neurons with only one SigmoID activation function, but no hidden layer. (Note: the number of network layers is not included in the input layer)

In the structure of logistic regression model, we input data feature X, and through the activation function σ (SigmoID function) in the output layer, the input feature is nonlinear converted into the value of 0~1 by sigmoID (Wx + B). The learning and training process is to learn the model Y= Sigmoid (wx + B) of the appropriate parameter W through gradient descent, which minimizes the error between the model output value Y and the actual value Y.

Fourth, logistic regression to deep neural network

Based on the previous introduction, it can be known that neural network is a network formed by hierarchical connections of neurons. Logistic regression is a single-layer neural network. When we connect at least one hidden layer to the single-layer neural network with only input layer and output layer, it is a deep neural network.Deep neural network consists of three network layers: input layer, hidden layer and output layer.

  • Input layer: is the data feature input layer. The number of input data features corresponds to the number of neurons in the network.
  • Hidden layer: the middle layer of the network. The hidden layer can be 0 or many layers. Its function is to accept the network output of the previous layer as the current input value, and calculate and output the current result to the next layer. The hidden layer is the key to the performance of neural networks, which is usually composed of neurons with activation functions to further process high-level abstract features to enhance the nonlinear expression of networks. The number of hidden network layers directly affects the fitting effect of the model.
  • Output layer: The network layer for the output of final results. The number of neurons in the output layer represents the number of classification labels (Note: In the binary classification, if the activation function of the output layer adopts sigmoid, the number of neurons in the output layer is 1; If softmax classifier is used, the number of neurons in the output layer is 2)

Data feature (X) is input from the input layer, and the calculation results of each layer are passed from the previous layer to the next layer, and finally output to the output layer. Each network layer is composed of a certain number of neurons, which can be regarded as a calculation unit and weighted sum of inputs. Therefore, the calculation results are directly controlled by the weight contained in the neurons (namely, the model parameter W), and the neurons can also contain activation functions. The result of the weighted sum can be further nonlinear, such as sigmoid(Wx + b).