An overview of the
In deep learning, we often see news of a model that hits the charts. A major breakthrough in a neural network algorithm task depends first on data set and second on model structure.
The ImageNet data set is responsible for the breakthrough in the image field, which is why the data set is important because it records information about the objective function. However, as we know from the previous section, the data set is defective. It cannot completely record the information of the objective function, and some of it will be lost. The quality of a data set lies in its retention rate of information about the target function.
The task of the training stage is to restore the objective function O (x)o(\mathbf x)o(x) by using the information in the dataset (dataset function d(x)d(\mathbf x)d(x)). Due to various constraints, we will only get a function f(x)f(\mathbf x)f(x) that approximates the objective function.
Good model, its corresponding function form is closer to the target function, so it can make up for the defect of data set better and get better results.
Visual display
Two points
To cite the example from the previous section, this data set has two points and the information retention rate of this data set is very low.
Data set:
Different assumptions are made on its functional form:
Linear form:
Parabolic form:
By adjusting the parameter WWW, we can obtain an infinite number of parabolas, all of which perfectly simulate the data set function d(x)d(\mathbf x)d(x)
There are many other forms of function, and the forms themselves are infinite. Other functional forms can perfectly simulate the data set function d(x)d(\mathbf x)d(x)
Three points
Linear form:
Polynomial form:
Five points
Linear form:
Polynomial form:
As the number of data points increases, the functional form is moving closer to the straight line, but there are still infinitely many possibilities.
conclusion
The information in the form of objective function cannot be obtained from data, which has infinite possibilities.
Design the structure of the neural network
The information in the data set is insufficient, and we need to obtain additional information from other places to guide the structural design of the neural network, so as to make up for the lack of information.
Special structure
Good models in all kinds of deep learning tasks use highly specialized structures. Such as:
- Image task: TWO dimensional CNN
- Text tasks: Embedding, ONE-DIMENSIONAL CNN, RNN, CRF, Transformer, etc
Many specialized structures are designed with reference to the processing of the objective function. CNN, for example, simulates the organizational structure of the visual nerve. Although the specific form of the objective function is unknown, people can often get part of its information and get better results by simulating the processing process of the objective function.
The structure design
The structure of neural network is the skeleton of the algorithm, which directly determines the final potential of the algorithm. The skeleton is poorly designed, and no matter how much you train it, it will end up just so-so.
Improper structure:
Proper structure:
The structure of a neural network cannot be trained and usually needs human design. There are also algorithms for architecture search, that is, to try a variety of different architectures and select the best one according to the final training effect, which requires a huge amount of computing power and is only suitable for a few institutions with strong financial resources.
To design a good neural network structure manually, it is necessary to:
- Have a certain understanding or reasonable guess of the objective function corresponding to the task;
- Familiar with the common structure of neural network and understand its principle;
- These structures are used to assemble structures similar to the objective function.
conclusion
- The data set cannot provide enough information, so the structure of neural network is needed to make up for the lack of information
- The information of neural network structure comes from the understanding and simulation of the processing process of objective function
The problem
Why design the structure of a neural network when it can simulate any function?
Reference software
For more content and interactive version, please refer to App:
Neural networks and deep learning
Download from the App Store, Mac App Store, Google Play.