An overview of the

It is said that neural network is a universal function fitter, how to understand this sentence? Let’s do some experiments to get a more intuitive understanding. In order to be intuitive and easy to understand, we use neural network to fit the meta-function, that is, y=f(x), y=f(x), y=f(x), y=f(x).

The experiment

1. The function
y = x y=x

The training sample

As shown in the figure:

  • The blue dots represent the training samples, which are sampled from the function y=xy=xy=x
  • The orange line represents the function represented by the neural network, which has not been trained so far and deviates greatly from the sample

Train of thought

To fit a straight line, what structural neural network do we need to fit it? To understand this, we need to understand individual neurons.

The form of a single neuron is: y=σ(wx+b)y = \sigma(wx+b)y=σ(wx+b)

  • WWW and BBB are parameters to be determined
  • σ\sigma Sigma is the activation function

If you remove σ\sigma sigma, the form is y=wx+by =wx+by =wx+b, which is exactly a line. In other words, we can fit the function by using a neuron with no activation function.

The experiment

As shown in the figure above, using a single output neuron, the neural network fits the target function well after 20 steps of training. The obtained parameters are shown in the figure below:

The corresponding function is y=1.0x+0.1y=1.0x+0.1y=1.0x+ 0.1y, which is very close to the objective function, and can be even closer with more steps of training.

2. The function y = | | x

The training sample

The function is a piecewise function


y = { x x p 0 x x < 0 y = \begin{cases} x & x \ge 0 \\ -x & x < 0 \end{cases}

Train of thought

And since this isn’t a line, you need a nonlinear activation function, which bends the line. Since no curve is involved, ReLU is a more appropriate activation function:

If you look at the curve of the ReLU function, you have a horizontal line on one side and an oblique line on the other. If you can get 2 ReLU curves and stack them backwards, will you get the target curve?

The final results are as follows:

Among them, two hidden neurons are:


  • y 1 = R e L U ( x ) y_1=\mathrm{ReLU}(-x)

  • y 2 = R e L U ( x ) y_2=\mathrm{ReLU}(x)

The output neuron is: y=y1+ y2Y = Y_1 + y_2y=y1+y2, which just gets the target curve.

(The above results are directly obtained by manually setting parameters without parameter training)

3. The function


y = { x + 3 3 Or less x < 0 3 x 0 Or less x < 3 0 o t h e r y = \begin{cases} x+3 & -3 \le x < 0 \\ 3-x & 0 \le x < 3 \\ 0 & other \end{cases}

The number of hidden neurons required increased to 4.

4. The function
y = 1.8 sin ( 3 x ) / x ) Y = 1.8 * \sin(3 * x)/x)

The network is more complex, and the fitting curve is no longer perfect.

conclusion

As the target function becomes more complex:

  • The corresponding neural network is also more complex
  • The amount of training data required is also greater
  • The training is getting harder and harder
  • Less and less intuitive, more and more difficult to explain

Conversely, more complex neural networks, more data volumes, can be used to fit more complex functions. In theory, you can fit any function, but of course, the network has to be infinitely large, and so does the amount of data.

Reference software

The neural network