The author | livid

Deep learning is essentially a deep artificial neural network. It is not an isolated technology, but a combination of mathematics, statistical machine learning, computer science and artificial neural network. The understanding of deep learning is inseparable from the most basic mathematical analysis (advanced mathematics), linear algebra, probability theory and convex optimization in undergraduate mathematics. The mastery of deep learning technology is inseparable from hands-on practice with programming as the core. Without a solid foundation in mathematics and computer science, deep learning is a castle in the air.

Therefore, beginners who want to be successful in deep learning techniques need to understand the significance of these basic knowledge for deep learning. In addition, our professional path will also introduce the beginning of deep learning from the theoretical dimension of structure and optimization, and analyze the advanced path based on the practice of deep learning framework.

Finally, this paper will share the practical experience of deep learning and the experience of acquiring cutting-edge information of deep learning.


Mathematical basis

If you can read the mathematical formulas in deep learning papers smoothly and can derive new methods independently, you have the necessary mathematical foundation.

To master the mathematical knowledge of the four mathematics courses of mathematical analysis, linear algebra, probability theory and convex optimization, and to be familiar with the basic theories and methods of machine learning are the prerequisite for the introduction of deep learning technology. This is because solid mathematical and machine learning foundations are indispensable for understanding the operations and gradient derivation of each layer in deep networks, formalizing problems or deriving loss functions.

  • Mathematical analysis: In the advanced mathematics courses offered by engineering majors, calculus is the main learning content. For general deep learning research and application, it is necessary to focus on reviewing basic knowledge such as function and limit, derivative (especially derivative of compound function), differential, integral, power series expansion, differential equation and so on. In the optimization process of deep learning, solving the first derivative of a function is the most basic work. You shouldn’t just feel deja vu when it comes to the mean value theorem, Taylor’s formula and Lagrange multipliers. Tongji University is recommended here the fifth edition of “advanced mathematics” teaching material.

  • Linear algebra: Operations in deep learning are often represented as vector and matrix operations. Linear algebra is such a branch of mathematics that studies vectors and matrices. Focus on the review of vectors, linear space, linear equations, matrices, matrix operations and their properties, vector calculus. When it comes to Jacobian and Hessian matrices, you need to know the exact math; When you give a loss function in matrix form, you can easily solve for the gradient. Here we recommend the sixth edition of Linear Algebra from Tongji University.

  • Probability Theory: Probability theory is a branch of mathematics that studies the quantitative laws of random phenomena. Random variables have many applications in deep learning, including random gradient descent, parameter initialization methods (such as Xavier), and Dropout regularization algorithms. In addition to master the basic concept of random phenomena (such as random test, the sample space and probability, conditional probability, etc.), random variable and its distribution, also need to law of large Numbers and central limit theorem, parameter estimation and hypothesis testing, we can also learn more a bit further random process, the content of the markov chain of random. Here recommend zhejiang university edition “probability theory and mathematical statistics”.

  • Convex optimization: combined with the above three basic mathematics courses, convex optimization can be said to be an application course. However, for deep learning, as the commonly used optimization methods of deep learning often only use the first-order gradient information for stochastic gradient descent, practitioners do not need much “advanced” convex optimization knowledge. Understand convex sets, convex function, the basic concept of convex optimization, master the dual problem of general concepts, mastering the common unconstrained optimization methods such as gradient descent method, stochastic gradient descent method, Newton method, understand a little equality constraints and inequality constraints optimization method, optimization method can meet the depth of understanding on the study theory. Here’s a recommended book, “Convex Optimization” by Stephen Boyd.

  • Machine learning: At the end of the day, deep learning is just one approach to machine learning, while statistical machine learning is the de facto methodology in the field of machine learning. Taking supervised learning as an example, you are required to master typical machine learning techniques such as linear model regression and classification, support vector machine and kernel method, random forest method, etc., as well as model selection and model reasoning, model regularization technique, model integration, Bootstrap method, probability graph model, etc. To go further, you need to know specialized techniques such as semi-supervised, unsupervised and reinforcement learning. The Elements of Statistical Learning is a classic textbook.


Computer Fundamentals

Deep learning should be a hero in actual combat. Therefore, knowledge of hardware selection of GPU server, proficiency in operating Linux system and Shell programming, and familiarity with C++ and Python languages are the necessary conditions for a deep learning expert in actual combat. At present, there is a term called “full-stack deep learning engineer”, which also reflects the degree of requirements for practitioners of deep learning: they need to have a strong theoretical foundation of mathematics and machine learning, and they need to be proficient in computer programming and necessary knowledge of architecture.

  • Programming languages: the two most used programming languages in deep learning are C++ and Python. To date, C++ is still the preferred language for implementing high-performance systems. Several of the most widely used deep learning frameworks, including Tensorflow, Caffe, and MXNet, are all written in C++ without exception. The scripting language at the upper level is Python, which is used for data preprocessing, defining network models, performing training procedures, data visualization, etc. At present, there are also extension packages for Lua, R, Scala, Julia and other languages in the MXNet community, showing a trend of flowering. Two recommended textbooks are C++ Primer 5th edition and Python core programming 2nd edition.

  • Linux operating system: Deep learning systems usually run on the open source Linux operating system. Currently, Ubuntu is the most commonly used Linux distribution in the deep learning community. For Linux, the main requirements are the Linux file system, basic command line operations and Shell programming, as well as proficiency in a text editor, such as VIM. The basics need to be mastered, and you don’t need to rush to the search engine when you need to batch replace a string in a file, or when you need to copy a file between two machines using SCP. Here I recommend a reference book “Bird brother’s Linux private dishes”.

  • CUDA programming: Deep learning requires GPU parallel computing, and CUDA is an important tool. CUDA development suite is a SET of GPU programming suite provided by NVidia. Cuda-blas library is widely used in practice. Here recommend NVidia’s official online documentation at http://docs.nvidia.com/cuda/.

  • Other basic computer knowledge: To master deep learning technology, we should not just be satisfied with using Python to call several mainstream deep learning frameworks. To understand the underlying implementation of deep learning algorithms from source code is the only way to advance. This time, the master data structures and algorithms (especially the figure algorithm) knowledge, distributed computing (understand the commonly used distributed computing model), and the necessary knowledge of GPU and server hardware (such as the number of PCI – E channel when I speak of the CPU and GPU bottleneck of data exchange, you can got the message), you will be better.


Introduction to Deep Learning

Next, I will introduce the introduction of deep learning from two perspectives of theory and practice.

  • Introduction to Deep learning Theory: We can use a graph (Figure 1) to review the key theories and methods in deep learning. Starting from MCP neuron model, it is necessary to master basic structural units such as convolutional layer and Pooling layer, activation functions such as Sigmoid, loss functions such as Softmax, and classical network structures such as perceptron and MLP. Next, master network training methods, including BP, Mini-Batch SGD and LR Policy. Finally, we need to understand two important theoretical problems in deep network training: gradient disappearance and gradient overflow.

Taking convolutional neural network as an example, we use Figure 2 to show the knowledge needed to get started. The starting point is Hubel and Wiesel’s research on the visual cortex of cats, and then to the Japanese scholar Kunihiko Fukushima’s neurocognitive machine model (convolution structure has appeared), but the first CNN model was born in 1989, and LeNet, later known as LeNet, was born in 1998. With the advent of ReLU and Dropout, and the historic opportunities presented by Gpus and big data, CNN saw a historic breakthrough in 2012 with the birth of AlexNet network architecture. After 2012, THE evolution path of CNN can be summarized into four aspects: 1. Deeper network; 2. 2. Enhance the function of convolution mode and the fusion of ResNet and various variants of appeal two ideas; 3. From classification to detection, the latest progress is ICCV 2017’s Best Paper Mask R-CNN; 4. Add new function modules.

  • Deep learning practice introduction: To master the use of an open source deep learning framework and further study of the code is the only way to master deep learning technology. The most widely used deep learning frameworks include Tensorflow, Caffe, MXNet, and PyTorch. There is no shortcut to learning the framework. Follow the step by Step configuration and operation of the official website, participate in the discussion of GitHub community, and timely Google is a good way to get started quickly.

Once you have an initial grasp of the framework, further improvement depends on specific research problems. A quick and easy strategy is to start by checking Benchmark, an authority in your field. For example, LFW and MegaFace in the field of face recognition, ImageNet and Microsoft COCO in the field of image recognition and object detection, and Pascal VOC in the field of image segmentation. By replicating or improving others’ methods, practicing data preparation, model training and parameter tuning by hand, I can achieve the current best results on the Benchmark of my field. The part of getting started with practice is basically completed.

The subsequent progress, it is necessary to constantly explore and improve in the actual combat. For example, be skilled in processing large-scale training data, be proficient in balancing accuracy and speed, master tuning skills, quickly reproduce or improve the work of others, be able to implement new methods and so on.


Deep learning practical experience

Here, four aspects of deep learning experience to share.

1. Sufficient data. Large amounts of annotated data still essentially dominate the accuracy of deep learning models, and every practitioner of deep learning needs to realize that data is extremely important. There are three main ways to obtain data: open data (mainly open to academia, such as ImageNet and LFW), paid data from third-party data companies and data generated in combination with their own business.

2. Proficient in programming and implementation. The implementation of deep learning algorithm cannot be achieved without skilled programming ability, and proficient programming with Python is the foundation. If further changes are made to the underlying implementation or new algorithms are added, the underlying code may need to be modified, and proficiency in C++ programming becomes essential. One obvious phenomenon is that computer vision researchers, who used to be able to do well just by mastering Matlab, now need to take remedial courses in Python and C++.

3. Sufficient GPU resources. The model training of deep learning relies on abundant GPU resources, and the model convergence speed can be effectively improved through the model parallelization of multi-machine and multi-card, thus completing algorithm verification and parameter tuning faster. It is common for a company or lab specializing in deep learning to have dozens to hundreds of GPU resources.

4. Innovative approaches. Take ImageNet competition, an authority in the field of deep learning, for example. From 2012, when deep learning technology won the first place in the competition to the last competition in 2017, method innovation has always been the core driving force for the progress of deep learning. If you are just satisfied with adding a little more data, deepening the network or adjusting a few SGD parameters, it is difficult to make truly first-class results.

In my own experience, method innovation can bring incredible results. Once I participated in the Tianchi image retrieval contest organized by Alibaba, the author proposed an innovation — using a new loss function of labeled noisy data. As a result, the accuracy of depth model was greatly improved and I won the champion of that year.


Frontiers of deep learning

[Source of frontier information]

In actual combat, it is necessary to understand the latest progress of deep learning. In other words, papers: In addition to regularly scanning Arxiv, scanning Google Scholar for representative work citations, and following top conferences like ICCV, CVPR, and ECCV, zhihu’s deep learning column and Reddit are punctuated with discussions (or brilliant jokes) about the latest papers.

Some high-quality public accounts, such as Valse Leading-edge technology Introduction, Deep Learning Lecture Hall, Paper Weekly, etc., also often push cutting-edge technologies of deep learning, which can also become sources of information acquisition. At the same time, follow the Facebook/Quora pages of academics LeCun and Bengio, and follow the Weibo account aikelovable Life, and you will often find surprising findings.

[Recommended focus]

  • New network architecture. In the case of no fundamental breakthrough in deep learning optimization methods represented by SGD, modifying network structure is a method to improve the accuracy of network model quickly. Since 2015, various new network structures represented by various improvements of ResNet have mushroomed, among which DenseNet, SENet and ShuffuleNet are the representative ones.

  • New optimization method. Throughout the development history of artificial neural networks from the MCP model in 1943 to 2017, optimization method has always been the soul of progress. The breakthrough of optimization technology represented by error reverse conduction (BP) and stochastic gradient descent (SGD), or the proposal of a new generation of activation functions after Sigmoid/ReLU, are very desirable. The author believes that recent work such as “Learning Gradient Descent by Gradient Descent” and SWISH activation function are worthy of attention. However, whether it can achieve a fundamental breakthrough, that is, completely replace the current optimization method or ReLU activation function, remains to be predicted.

  • New learning techniques. Deep Reinforcement learning and Generative Adversarial Networks (GAN). In recent weeks, Alpha Zero has once again demonstrated the power of deep reinforcement learning. It does not rely on human experience at all. In the go project, the ability of deep reinforcement learning “left and right fight” has surpassed that of AlghaGo Master, who beat all human masters in a second. Similarly, generative adversarial networks and their variants are heralding an era in which learning algorithms generate their own data. The author’s company is also trying to combine deep reinforcement learning with GAN to augment cross-modal training data.

  • New data set. Data sets are the training ground for deep learning algorithms, so the evolution of data sets is the epitome of the progress of deep learning technology. Take face recognition as an example. In the post-LFW era, MegaFace and Microsoft Celeb-1M datasets have embraced large-scale face recognition and face recognition with data tag noise. In the post-Imagenet era, Visual Genome is trying to build a Visual Genome that includes objects, attributes, relationship descriptions and question pairs.

About the author: Liu Xin, doctor of Engineering, graduated from Institute of Computing Technology, Chinese Academy of Sciences, under the tutorial of Professor Shiguang Shan. He is mainly engaged in the research and industrial application of computer vision technology and deep learning technology, and currently serves as the CEO of AN artificial intelligence startup company, Zhongkesituo.

This article is the programmer’s original article, shall not be reproduced without permission, more exciting articles please click “read the original article” to subscribe to the Programmer.