This article was compiled from arXiv by Md Zahangir Alom et al.

Since AlexNet was proposed by Alex Krizhevsky and others at the University of Toronto in 2012, “deep learning” as a powerful method of machine learning has gradually led to today’s AI boom. As this technology has been applied to so many different domains, so many new models and architectures have been developed that we can’t tease out the relationships between network types. Recently, researchers from the University of Dayton have made a comprehensive review and summary of the development of deep learning in recent years, and pointed out the main technical challenges people are facing at present. Heart of the Machine feels that this is a very detailed overview paper, suitable for both those who understand deep learning from scratch and those who have a foundation.
Address: arxiv.org/abs/1803.01…


In recent years, deep learning, as a new branch of machine learning, has achieved great success in many fields and has been developing rapidly, constantly creating new application modes and creating new opportunities. Deep learning methods can be divided into supervised learning, semi-supervised learning and unsupervised learning according to whether the training data has marker information. Experimental results show the latest achievements of the above methods in image processing, computer vision, speech recognition, machine translation, art, medical imaging, medical information processing, robot control and biology, natural language processing (NLP), network security and other fields. This report gives a brief overview of the development of deep learning methods, including deep neural network (DNN), convolutional neural network (CNN), recurrent neural network (RNN) (including long and short-term memory (LSTM) and gated cyclic unit (GRU)), autoencoder (AE), deep belief network (DBN), Generative adversarial networks (GAN) and deep reinforcement learning (DRL). In addition, this paper also covers the cutting-edge development of deep learning methods and advanced variant deep learning techniques. In addition, the exploration and evaluation of deep learning methods in various application fields are also included in this survey. We’ll also cover recently developed frameworks, SDKS, and benchmark data sets for evaluating deep learning methods. However, these papers do not discuss some large deep learning models and recently developed model generation methods [1].



introduce

Since the 1950s, machine learning as a sub-field of ARTIFICIAL intelligence has begun to revolutionize several fields, and deep learning, which was born out of machine learning, has achieved the biggest original breakthrough to date, achieving remarkable success in almost every application field. Figure 1 shows the pedigree of AI. Deep learning (a deep architecture of learning or layered learning methods) is a machine learning technology that emerged in 2006. In deep learning, learning is to evaluate model parameters so that the learning model (algorithm) can perform specific tasks. For example, in an artificial neural network (ANN), the parameters are weight matrices

. On the other hand, deep learning contains several hidden layers between the input layer and the output layer, so that non-linear processing units at different stages have hierarchical structures for feature learning and pattern classification [1, 2]. Data representation-based learning methods are also known as representational learning [3]. According to the latest literature, deep learning-based representational learning involves a hierarchical structure of features or concepts, where high-level concepts can be defined from low-level concepts, and low-level concepts can be defined from high-level concepts. In some articles, deep learning is also described as a general learning method that can solve almost all problems in different application fields (not limited to specific tasks) [4].



A. Types of deep learning methods

Like machine learning, deep learning methods can be divided into the following categories: supervised, semi-supervised, partially supervised, and unsupervised. In addition, there is another class of Learning methods called Reinforcement Learning or Deep Reinforcement Learning, which are often discussed in the context of semi-supervised or unsupervised Learning methods.

Figure 1: Artificial intelligence pedigree: AI, Machine learning, Neural networks, Deep Learning, and Pulsed Neural Networks (SNN)



1) Supervised learning

A learning technique that uses annotated data. In its case, the environment contains a corresponding set of inputs and outputs
. For example, the input is X_T, agent prediction
, the loss value will be obtained
. The agent then iteratively adjusts the network parameters to better approximate the desired output. After successful training, an agent can make correct answers to environmental questions. Supervised learning mainly includes the following types: deep neural network (DNN), convolutional neural network (CNN), cyclic neural network (including LSTM) and gated cyclic unit (GRU). These networks are detailed in sections 2, 3, 4, and 5, respectively.



2) Semi-supervised learning

A learning technique that uses partially annotated data (often called reinforcement learning). The methodology is investigated in section 8 of this paper. In some cases, deep reinforcement learning (DRL) and generative adversarial networks (GAN) are often used as semi-supervised learning techniques. In addition, RNN and GRU containing LSTM can also be classified as semi-supervised learning. GAN is discussed in Section 7.



3) Unsupervised learning

A learning technique that does not use annotated data. In this case, the agent learns internal representations or important features to discover unknown relationships or structures in the input data. Unsupervised learning methods usually include clustering, reduction and generation techniques. Some deep learning techniques excel at clustering and nonlinear dimensionality reduction, such as autoencoders (AE), constrained Boltzmann machines (RBM), and GAN. In addition, RNN (such as LSTM) and RL are also used as semi-supervised learning [243]. Sections 6 and 7 of this paper will detail RNN and LSTM respectively.



4) Deep reinforcement learning (DRL)

A learning technique that works in an unknown environment. DRL started with Google Deep Mind in 2013 [5,6]. Since then, several advanced methods based on RL have been proposed, such as: if the environment sample input: agent ~ ρ, agentpredict:
, agentreceivecost:
Where P is the unknown probability distribution, the environment asks the agent a question and gives it a score with noise as the answer. This approach is sometimes called semi-supervised learning. Many semi-supervised and unsupervised learning methods have been implemented based on this concept (section 8). In RL, we don’t have a simple forward loss function, so this makes machine learning more difficult than traditional supervisory methods. The fundamental difference between RL and supervised learning is that, first of all, we cannot retrieve the function you are optimizing, but must query it through interaction; Second, we are interacting with a state-based environment: the input X_T depends on previous actions.

Figure 2: Classification of deep learning methods



B. Feature learning

The key difference between traditional machine learning and deep learning is how features are extracted. Traditional machine learning methods use several feature extraction algorithms, including scale invariant feature transform (SIFT), Accelerated robust feature (SURF), GIST, RANSAC, histogram direction gradient (HOG), local binary mode (LBP), empirical mode decomposition (EMD) speech analysis, etc. Finally, many learning algorithms, including support vector machine (SVM), random forest (RF), principal component analysis (PCA), kernel principal component analysis (KPCA), linear decline analysis (LDA), Fisher’s decline analysis (FDA), have been applied to the task of classification and feature extraction. In addition, other enhancement methods typically apply multiple learning algorithms to a single task or data set feature and make decisions based on multiple results of different algorithms.

Table 1: Different feature learning methods



In deep learning, on the other hand, these features are automatically learned and layered on multiple layers. This is why deep learning leapfrog traditional machine learning methods. The table above shows the relationship between different feature learning methods and different learning steps.



C. The timing and domain of applying deep learning

Ai is useful in the following areas, where deep learning plays an important role:

1. Lack of human expertise (Mars navigation);

2. Expertise (speech, cognition, visual and linguistic understanding) that people cannot yet explain;

3. Solutions to problems change over time (tracking, weather forecasts, preferences, stocks, price forecasts);

4. Solutions need to adapt to specific situations (biometric, personalized);

5. Human reasoning is limited, and the scale of the problem is huge (calculating web page rankings, matching ads to Facebook, sentiment analysis).

At present, deep learning is used in almost every field. For this reason, this approach is sometimes called the universal learning approach. Figure 4 shows some sample applications.

Figure 4: An example diagram of a successful application of deep learning with top-level results



D. Cutting-edge development of deep learning

Deep learning has some outstanding achievements in the fields of computer vision and speech recognition, as described below:


1) Image classification on ImageNet data set

The application benchmark of deep learning in the field of image classification is called Large-scale Visual Recognition Challenge (LSVRC). Based on deep learning and convolutional neural network technology, deep learning has a good performance in ImageNet measurement accuracy [11]. Russakovsky et al. recently published an article on ImageNet data sets and the highest accuracy achieved by researchers in recent years [285]. The chart below shows the evolution of deep learning in 2012. So far, the method we developed has a 3.57% error on ResNET-152, less than the human error of about 5%.

Figure 5: Accuracy of ImageNet tests using different deep learning models



2) Automatic speech recognition

Deep learning’s first success in speech recognition was the small-scale recognition tasks accomplished through the TIMIT dataset, a generic dataset commonly used for evaluation. The TIMIT Continuous Voice – Speech Corpus contains 630 speakers of eight major English accents from the United States, each speaker reading 10 sentences. The figure below summarizes error rates, including early results, as measured by call error rates (PER) over the past 20 years. The bar chart clearly shows that recently developed deep learning methods (top of graph) perform better than previous machine learning methods on the TIMIT dataset.

Figure 6: Call error rate (PER) for the TIMIT dataset



E. Why use deep learning



1) General learning methods

Deep learning is sometimes called general learning because it can be applied to almost any field.



2) Robustness

Deep learning methods do not require advance design of functionality. Its automatic learning feature is optimal for the task at hand. As a result, the task automatically acquires robustness against natural variations in the data.



3) the generalization

The same deep learning approach can be used for different applications or different data types, and this approach is often referred to as transfer learning. In addition, this approach is useful when there is insufficient data available. Researchers have published several papers based on this concept (discussed in more detail in Section 4).



4) Scalability

Deep learning methods are highly scalable. In a 2015 paper, Microsoft described a network called ResNet [11]. The network consists of 1202 layers and is typically deployed on a supercomputing scale. Lawrence Livermore National Laboratory (LLNL) in the US is developing a framework for such a network, which can implement thousands of nodes [24].



F. Challenges of deep learning:

  • Use deep learning for big data analysis
  • Deep learning methods need to be scalable
  • The ability to generate data is important in situations where data is not available for learning systems (especially for computer vision tasks, such as reverse graphics).
  • Low energy consumption technology for special purpose equipment, such as mobile intelligence, FPGA, etc.
  • Multitasking and transfer learning (generalization) or multi-module learning. This means learning from different fields or different models together.
  • Deal with cause and effect in learning.
Figure 7: The relationship between deep learning performance and data quantity



Second, in most cases, solutions to large-scale problems are being deployed on high-performance computer (HPC) systems (supercomputers, clusters, sometimes referred to as cloud computing), which offer great potential for data-intensive business computing. But as data explodes in speed, diversity, accuracy, and quantity, it becomes increasingly difficult to use enterprise servers for storage and computing performance. Most papers take these requirements into account and propose efficient HPC using heterogeneous computing systems. For example, Lawrence Livermore National Laboratory (LLNL) has developed a framework that: Livermore Big Artificial Neural Networks (LBANN) for large-scale deployment of deep learning (supercomputing scale) definitively answers the question of whether deep learning is scalable [24].

Third, generating models is another challenge of deep learning. An example is GAN, which is an excellent data generation method that can generate data with the same distribution [28]. Fourth, multitasking and transfer learning, which we discussed in Section 7. Fourth, we do a lot of research on efficient deep learning methods in network architecture and hardware. Section 10 discusses this problem.

Can we make general-purpose models for multiple domains and tasks? With a focus on multimodal systems, Google recently submitted a paper, One Model To Learn Them All [29], which introduces a new approach that can be learned from different application domains, Includes ImageNet, multiple translation tasks, image titles (MS-COCO dataset), speech recognition corpus, and English parsing tasks. We will use this survey to discuss the main challenges and corresponding solutions. In the past few years, other multitasking technologies have been proposed.

Finally, a graphical model is a causal learning system that defines how to infer causal models from data. Recently, deep learning methods to solve such problems have emerged [33]. However, there are many other challenging issues that have not been effectively addressed over the past few years. For example: image or video captions [34], using GAN [35] from text to image compositing [36] and other style transfers from one domain to another.

Recently, some researchers have completed many investigations on deep learning, including a very high quality summary, but it does not cover the recently developed GAN generation model [28]. In addition, it mentions the topic of reinforcement learning, but does not touch on the recent trend of deep reinforcement learning methods [1,39]. For the most part, surveys are classified according to different methods of deep learning. The main objective of this report is to introduce the general idea of deep learning and its related areas, including deep supervised (e.g. DNN, CNN and RNN), unsupervised (e.g. AE, RBM, GAN) (sometimes GAN is also used for semi-supervised learning tasks) and deep reinforcement learning. In some cases, deep reinforcement learning is considered a semi-supervised/unsupervised approach. We considered the latest trends in the field and applications developed based on this technology. In addition, we have included frameworks and benchmark datasets commonly used to evaluate deep learning techniques, as well as conference and journal names.

The rest of this paper is organized as follows: Section 2 discusses the detailed investigation of DNN, and Section 3 discusses CNN; The fourth section introduces different advanced techniques to effectively train the deep learning model; Section 5 discusses RNN; AE and RBM are discussed in section 6; GAN and its applications are discussed in Section 7. Reinforcement learning is introduced in Section 8; Section 9 explains transfer learning; Section 10 introduces the efficient application methods and hardware of deep learning. Section 11 discusses deep learning frameworks and standard development kits (SDKS); Section 12 presents benchmarking results for different application domains; Section 13 is the conclusion.