What is a neural network?

In the housing price prediction model, the tracing point part can be fitted with a straight line, but since the price cannot be negative, the “corrected linear unit” is used.
R e L U ReLU
Function size
> ->
Price mapping. If you have a lot of inputs like size (number of rooms, zip code, economy level), you stack individual neurons together to form a larger neural network.On the left is the input layer, which we input; In the middle is the hidden unit, the connection number is very high, and the neural network itself decides what each node is. On the right is the output layer, and neural networks are very good at calculating accurate mapping functions from x to Y.

2. Supervised learning

So far, almost all of the economic value created by neural networks has been based on a type of machine learning called supervised learning.

(1) Common neural network model

  • NN: Neural network (prediction, etc.)
  • CNN: Convolutional Neural Network (Image processing)
  • RNN: Recurrent neural network (good at processing one-dimensional sequence data, including time component)

(2) Structured and unstructured data

  • Unstructured data is easier for humans to understand
  • Structured data is easier for machines to understand
  • Based on deep learning, it is becoming easier for machines to understand unstructured data

The rise of deep learning

(1) Scale!

The vertical axis represents the performance effect of deep learning, and the horizontal axis represents the data scale of deep learning.

  • As data gets bigger and bigger, deep learning gets better and better
  • As neural networks get bigger and bigger, deep learning performs better and better
  • When the training set is small, the advantages and disadvantages of various algorithms are not clear. In the field of big data, neural networks are steadily ahead of other algorithms

(2) Three influencing factors

  • The Data of Data
  • Computation ability
  • Algorithms algorithm

from
s i g m o i d sigmoid
Function to
R u L U RuLU
Function:
s ( x ) = 1 1 + e x . s(x)=\frac{1}{1+e^{-x}}.

The problem of this function in machine learning is that the slope of images on both sides is 0, that is, the gradient is 0, so the learning rate will be very slow. Change the activation function to a modified linear element
R u L U RuLU
Function:
f ( x ) = m a x ( 0 . x ) . f(x)=max(0,x).
So the learning rate is faster with gradient descent.