Cerebral neuron

The human brain may have more than 100 billion neurons, each of which may be connected in several directions, creating an extremely large network of neurons and connections. We have all kinds of thoughts and consciousness because of these networks.

Brain neurons are brain nerve cells, including cell bodies, dendrites, axons, synapses and so on.

  • The cell body, consisting of the nucleus, cytoplasm, and membrane. It is the center of neuronal metabolism, the part that receives and processes information.
  • Dendrites, the dendritic fibers that extend outward from the cell body, are the input channels for neurons, receiving information from other neurons.
  • The axon, the longest and thickest branch of the cell body that extends out, the nerve fiber, is the output channel of the neuron. Axons come in two forms, myelinated and unmyelinated, and they carry information at different speeds. The axon ends in a number of branching fibers called nerve endings, which are the output of information from neurons.
  • A synapse, the contact between the nerve endings of a neuron and the dendrite or cell body of another neuron. Each neuron is connected to other neurons through synapses, and the connections between cells are made through synapses to carry information. Each neuron has about 103 to 104 synapses.

Simulation of the brain

Neural networks are people’s attempts to simulate the inner workings of the brain, and the origins of this model in computing are fairly early, going back to the mid-1940s, when computers were actually new.

In 1943, McCulloch and Pitts published the paper logical Calculus of The Inner Concepts of Neural Activity, which proposed for the first time a mathematical method to represent the learning function of the human brain.

As shown in the figure above, X is like an axon, and the output of other neurons is connected to the dendrites through synapses, and then input to the cell body after the dendrites perform certain operations, and then output to the axon through the activation function.

This simple model is the basic model of neural network in machine learning, which makes us exclaim that our brain is so simple, but the fact is not so, human’s understanding of the brain is too little, it can be said that there is almost no substantial progress. And the brain in this model is actually so much worse than the brain that if God made us, he wouldn’t want you to guess how he made us.

Although the brain’s neurons had been mathematically modeled, we didn’t know if the model was correct, and there was no clear way to adjust the weight parameters.

Perceptron model

By the 1950s, one of the simplest models of artificial neurons had been developed. Perception machine? It sounds like a real thing, just like a computer. It should be a tangible machine. It’s true that the first hardware implementation came out in the 1960s, and they just called the whole thing perceptron, but then the perceptron was changed to an algorithm, so it was actually an algorithm.

Based on the idea of the predecessors, the perceptron proposed a feedback loop learning mechanism, which adjusted the weight by calculating the error between the sample output result and the correct result.

The general process is as follows:

  • Initialize weight parameters with a random number.
  • You pass an input vector into the network.
  • The output y’ of the network is calculated according to the specified input vector and weight value, and the function of the perceptron is shown as follows:

  • If y’≠y, all connection weights WI are adjusted by delta W =yxi increment.
  • Return to step 2.

Introduce gradient descent

Unlike the perceptron learning mechanism, ADALINE is another algorithm for training neural network models and is more advanced than the perceptron because it introduces gradient descent.

The general process is as follows:

  • Initialize weights with random numbers.
  • You pass an input vector into the network.
  • Compute the output y’ of the neural network according to the specified input vector and weight.
  • The final output values are summed using the formula,

  • Calculate the error, compare the output value of the model with the correct label O,

  • Recursively adjust the weights using the following gradient descent,

  • Return to Step 2.

Limitations of earlier models

It can be seen that perceptron and ADALINE have possessed the basic elements of neural network model. Both are single-layer neural networks, which are mainly used for dichotomous problems and can realize dichotomous functions through learning.

For early neural models, it was actually very limited, and in some ways, it was useless. Minsky and Papet published perceptrons in 1969, which showed that perceptrons can only deal with linearly separable problems and are completely helpless for other complex problems.

For example, for XOR functions, there is no line that can correctly divide them, and the perceptron has this awkward situation, this linear and indivisible situation, and the perceptron can’t correctly separate the two categories. At this point, the neural network into the winter.

Multilayer perceptron

Since the perceptron of a single neuron cannot solve nonlinear problems, can it be extended to multiple neurons to form multiple neural network layers? Groups of neurons are then connected, and the output of one neuron can be fed into another.

In the propagation of the multi-layer network, data is input into the first layer, and will flow from each neuron to the corresponding neuron in the next layer. It is then summed and passed in the hidden layer, and finally reaches the output layer for processing. However, learning multi-layer networks requires the support of back propagation algorithm. Multi-layer networks increase the complexity of learning, and form a long function nesting from input to final output, which increases the difficulty of learning. But with the help of the chain rule, things are a lot easier.

The general process is as follows:

  • Calculate the feedforward signal from input to output.
  • The output error E is calculated from the predicted and target values.
  • They are weighted by the weights in the previous layer and by the gradient of the associated activation function.
  • The gradient of the parameters is calculated based on the backpropagation error signal and the input feedforward signal

  • The calculated gradient is used to update the parameters, and the formula is

Type of problem

Neural networks can be used for regression problems and classification problems. The common structural difference is in the output layer. If we want a real result, we should not use a normalized function, such as Sigmoid’s function. Because normalized functions limit our output to a certain range, sometimes what we really want is continuous numerical results.

  • Regression/function approximation problems, such problems can use the least square error function, the output layer using linear activation function, hidden layer using s-shaped activation function.
  • In dichotomous problems, cross entropy cost function is usually used, and S-shape activation function is used for both output layer and hidden layer.
  • For multi-classification problems, cross entropy cost function is usually used, softmax function is used for output layer and SigmoID activation function is used for hidden layer.

Deep neural network

In the second decade of the 21st century, deep learning became the most prominent research in artificial intelligence. In 2011 the Google X lab took 10 million images from YouTube and fed them to a Deep learning-enabled Google Brain, which found the cat on its own three days later without human help. In 2012 Microsoft used deep learning to do real-time speech recognition and translation for speakers, which means simultaneous translation.

Although deep learning emerged in the 1980s, it was not effective due to the lack of hardware capabilities and data resources at that time. It wasn’t until 2009 that Hinton and his students continued to work in the field that they found unexpected success, when they applied deep learning to speech recognition and broke the world record for 25% fewer errors than before. Deep learning is catching on.

The reason for the big performance boost in deep learning is that it resembles a deep neural network like the human brain, which better mimics the work of the human brain.

Convolutional neural network

Convolutional neural networks were developed primarily to solve the problem of human vision, but are now used in other directions as well. The development process is mainly from Lenet5->Alexnet->VGG->GooLenet->ResNet, etc.

The convolutional layer was invented in the 1980s, but it could not build complex networks due to hardware limitations. It was not put into practice until the 1990s.

In 1998, LeCun proposed the combination of convolution layer, pooling layer and complete connection layer to solve the problem of handwritten digit recognition. At this point, the results are pretty good, compared to other classical machine learning models. The architecture is as follows, a 32 x 32 input, features extracted by convolution, and then downsampled, again convolution and downsampled, followed by full join and Gaussian join. Lenet5 namely.

The exponential growth in the availability of structured data and processing power has further enhanced the model, especially with the advent of Imagenet, an open source dataset of millions of tagged and classified images.

In 2012 LSVRC Challenge, Hinton and his student Alex Krizhevsky developed Alexnet deep convolutional network. The structure is similar to Lenet5, but the convolutional layer is deeper and the total number of parameters reaches tens of millions. Multiple convolutional layers and each network is hundreds deep. Alexnet namely.

A strong contender in the 2014 LSVRC Challenge, the VGG model developed by the Visual Geometry Organization of the University of Oxford. Compared with Alexnet, it mainly reduces the convolution kernel to 3×3. The general structure is the same, but the convolution configuration can be different. ReLU is used for activation function, Max pooling is used for pooling, and softmax is used to output each probability.

In 2014, the GoogLenet network model won the LSVRC Challenge, a series that was first successfully entered by a large company and has been won by large companies with large budgets ever since. GoogLenet is mainly composed of nine Inception modules. The number of GoogLenet parameters has been reduced to more than 10 million, and the accuracy is better than Alexnet, with the error reduced from 16.4% to 6.7%.

In 2015, with the publication of the article “Rethinking Inception Architecture for Computer Vision”, Google researchers released a new Inception architecture that mainly addresses the covariance shift problem, in which normalization is applied to the original input and the output values of each layer. In addition, the size of the convolution kernel is also changed, and the total depth of the network and the decomposition of the convolution are increased.

ResNet was proposed in 2015 by Dr. Kaiming He, a former Microsoft Research scientist who is now a Facebook AI research scientist. ResNet had a great record, taking five first places that year.

Recurrent neural network

Recurrent neural network, or recurrent neural network, is proposed mainly to process serial data. What is serial data? That is, the input in front and the input behind are related. For example, in a sentence, the words before and after are related, “I am hungry and ready to go to XX”. Judging from the input in front, “XX” is probably “eat”. This is the sequence data.

There are many variations of recurrent neural networks, such as LSTM and GRU.

For the traditional neural network, from the input layer to several hidden layers and then to the output layer, all layers are fully connected, and the nodes between layers are not connected. This network model is basically powerless to predict sequence data.

The recurrent neural network is good at processing sequence data. It will remember the previous information and participate in the calculation of the current output. In theory, the recurrent neural network can process sequence data of any length.

For example, character-level prediction can be made, as shown in the figure below. If there are only four types of characters and the sample is the word “hello”, input H predicts the next character to be e, then E produces L, l produces L, and finally L produces O.

————- Recommended reading ————

Summary of my open Source projects (Machine & Deep Learning, NLP, Network IO, AIML, mysql protocol, Chatbot)

Why to write “Analysis of Tomcat Kernel Design”

2018 summary data structure algorithms

2018 Summary machine learning

2018 Summary Java in Depth

2018 Summary of natural language processing

2018 Summary of deep learning

2018 summary JDK source code

2018 Summary Java concurrency Core

2018 summary reading passage


Talk to me, ask me questions:

Welcome to: Artificial intelligence, reading and feeling, talk about mathematics, distributed, machine learning, deep learning, natural language processing, algorithms and data structures, Java depth, Tomcat kernel and other related articles