Writing in the front



In the first picture, you clearly recognize the relationship between artificial intelligence, machine learning and deep learning. After understanding these concepts, we will carry on the following discussion.

Elements of deep learning

In order to define deep learning and understand how it differs from other machine learning methods, we first need to know what machine learning algorithms are doing. There is a general understanding that machine learning will discover rules to perform a data-processing task given an example containing the expected result. Therefore, we need the following three elements for machine learning.

  • Input data point. For example, if your task is speech recognition, those data points might be voice files that record people speaking. If your task is to tag images, these data points could be images.
  • An example of expected output. For speech recognition tasks, these examples might be text generated by people collating from sound files. For an image tagging task, the expected output might be tags such as “dog” or “cat”.
  • A way to measure the effectiveness of an algorithm. This measure is used to calculate the difference between the current output of the algorithm and the expected output. The measured result is a feedback signal that regulates the way the algorithm works. This adjustment step is what we call learning.

Machine learning models transform input data into meaningful outputs, a process of “learning” from known examples of inputs and outputs. Thus, the core problem of machine learning and deep learning is to meaningfully transform data — in other words, to learn representations of input data that make them more closely resemble their intended output.

What does deep learning look like?

Deep learning is a subfield of machine learning: it is a new way of learning representations from data, with an emphasis on learning from successive layers that correspond to increasingly meaningful representations. The word “depth” in “deep learning” does not refer to the deeper level of understanding achieved with this approach, but rather to a series of successive layers of presentation. How many layers are contained in the data model is called the depth of the model. Other terms in the field include layered representations learning and hierarchical representations learning.

Modern deep learning often consists of dozens or even hundreds of successive presentation layers, all of which are learned automatically from training data. In contrast, other machine learning approaches tend to focus on learning only one or two layers of data representation, and are sometimes referred to as shallow learning. To get an idea of deep learning, look at the graph below.

Use three diagrams to understand how deep learning works

By now you know that machine learning is mapping inputs (such as images) to targets (such as tags “cats”) by looking at many examples of inputs and targets. You also know that deep neural networks perform a series of simple data transformations(layers)To achieve this input-to-target mapping, and these data transformations are learned by observing examples. Let’s look at how this learning process happens. In the neural network, the specific operation of each layer on the input data is saved in the layerWeight (weight), its essence is a string of numbers.In technical terms, the transformation implemented by each layer is parameterized by its weight.Weights are sometimes referred to as the parameters of the layer. In this context,Learning means finding a set of weight values for all layers of the neural network so that the network can properly map each example input to its target. But here’s the thing: a deep neural network can have tens of millions of parameters. Finding the right values for all parameters can be a daunting task, especially considering that changing the value of one parameter affects the behavior of all the other parameters. The following is a visual description of the legend.



To control something, you first need to be able to observe it. To control the output of a neural network, you need to be able to measure the distance between the output and the expected value. This is a neural networkLoss FunctionThe function is also calledObjective Function. The input of the loss function is the network predicted value and the real target value (that is, what you want the network to output), and then a distance value is calculated to measure how well the network works in this example. The following is a visual description of the legend.



The basic skill of deep learning is to use this distance value as a feedback signal to fine-tune the weight value so as to reduce the loss value corresponding to the current example. This regulation is mediated byOptimizerTo do it. It does what’s calledBackpropagation algorithmThis is the core algorithm of deep learning. The following is a visual description of the legend.



The weights of the neural network are assigned randomly at first, so the network just implements a series of random transformations. The output is naturally far from the ideal value, and the loss value is correspondingly high. But with more and more examples of network processing, the weight value is gradually fine-tuned in the right direction, and the loss value is gradually reduced. This is a training loop, which is repeated enough times (typically dozens of iterations over thousands of examples) to produce a weight value that minimizes the loss function. The network with the least loss, whose output value is as close as possible to the target value, is a trained network. Again, this is a simple mechanic that, once scaled up enough, works like magic.