This article is the notes section of Ng’s deep Learning course [1].

Author: Huang Haiguang [2]

Main Author: Haiguang Huang, Xingmu Lin (All papers 4, Lesson 5 week 1 and 2, ZhuYanSen: (the third class, and the third, three weeks ago) all papers), He Zhiyao (third week lesson five papers), wang xiang, Hu Han, laughing, Zheng Hao, Li Huaisong, Zhu Yuepeng, Chen Weihe, the cao, LuHaoXiang, Qiu Muchen, Tang Tianze, zhang hao, victor chan, endure, jersey, Shen Weichen, Gu Hongshun, when the super, Annie, Zhao Yifan, Hu Xiaoyang, Duan Xi, Yu Chong, Zhang Xinqian

Editorial staff: Huang Haiguang, Chen Kangkai, Shi Qinglu, Zhong Boyan, Xiang Wei, Yan Fenglong, Liu Cheng, He Zhiyao, Duan Xi, Chen Yao, Lin Jianyong, Wang Xiang, Xie Shichen, Jiang Peng

Note: Notes and assignments (including data and original assignment files) and videos can be downloaded on Github [3].

I will publish the course notes on the official account “Machine Learning Beginners”, please pay attention.

Neural Networks and Deep Learning

Week 1: Introduction to Deep Learning

1.1 Welcome (Welcome)

The first video focuses on what deep learning is and what it can do. Here are the words of Ng:

Deep learning has transformed traditional Internet businesses such as web search and advertising. But deep learning is also enabling new products and businesses to help people in many ways, from getting better health care.

One of the things that deep learning does really well is reading X-ray images, to personalized education in life, to precision agriculture, even driving cars and so on. If you want to learn these tools of deep learning and apply them to do these suffocating operations, this course will help you do just that. When you’ve completed this series of specialized courses on Cousera, you’ll be more confident to continue your deep learning journey. In the next decade, I think all of us have the opportunity to create an amazing world and society, and that’s the power of AI. I hope you will play an important role in creating an AI society.

I think AI is the latest electricity. About a hundred years ago, the electrification of our society transformed every major industry, from transportation to manufacturing to health care to communications, and I think today we’re seeing the amazing power of AI, obviously, bringing about the same dramatic transformation. Obviously, among the various branches of AI, the most rapid development is deep learning. So deep learning is now a popular technique in the tech world.

You will acquire and master those skills through this course and subsequent courses in this course.

Here’s what you’ll learn:

In this series of cousera courses, also called specialized courses, in the first course (Neural networks and Deep Learning), you’ll learn the basics of neural networks. You’ll learn about neural networks and deep learning. This course will last four weeks, and each of the specialized courses will last two to four weeks.

But in the first course, you will learn how to build neural networks (including a deep neural network) and how to train them on data. At the end of this course, you will use a deep neural network to identify cats.

For some reason, the first class will use cats as objects.

And then in the second course, we’re going to use it for three weeks. You’re going to practice deep learning, learning how to build a neural network tightly, how to really make it perform well, so you’re going to learn hyperparametric tuning, regularization, diagnostic bias and variance, and some advanced optimization algorithms like Momentum and Adam algorithms, as if by dark magic, depending on how you build the network. The second course has only three weeks.

In the third course, we will spend two weeks learning how to structure your machine learning project. It turns out that strategies for building machine learning systems change the errors of deep learning.

One example: the way you slice up data, into training sets, comparison sets, or validation sets that change, and test sets, changes the error of deep learning.

So what’s the best practice?

Different contributions from your training set and your test set make a big difference in deep learning, so how do you deal with that?

If you’ve heard of end-to-end deep learning, you’ll also learn more about it in class 3, and whether or not you need to use it, and the material in class 3 is relatively unique, and I’ll share it with you. We learned about all the hot areas of building and improving a lot of deep learning problems. These are hot materials that most universities don’t teach in their deep learning classes, and I think it will help you make deep learning systems work better.

In course 4, we’ll talk about convolutional neural networks (CNN(s)), which are often used in the image field, and you’ll learn how to build models like this.

And finally, in course 5, you’ll learn about sequence models and how to apply them to natural language processing, among other things.

Sequence models include recurrent neural network (RNN) and long and short Term memory network (LSTM). You will learn what periods mean in lesson 5 and be able to apply them to natural language processing (NLP) problems.

Anyway, you’ll learn about these models in lesson 5 and be able to apply them to sequence data. For example, a natural language is a sequence of words. You will also be able to understand how these models can be applied to speech recognition or music arrangement and other problems.

So, through these courses, you’ll learn these tools of deep learning, and you’ll be able to use them to do amazing things and advance your career.

Wu En da


1.2 What is a neural network? (What is a Neural Network)

We often use the term deep learning to refer to the process of training neural networks. Sometimes it refers to neural network training on a particularly large scale. So what exactly is a neural network? In this video, I’m going to do some intuitive basics.

Let’s start with an example of a house-price forecast.

Suppose you have a data set that contains information about six houses. So, you know the size of the house in square feet or square meters, and you know the price of the house. At this point, you want to fit a function that predicts house prices based on square footage.

If you’re familiar with linear regression, you might say, “OK, let’s fit a straight line with this data.” So you might get a line that looks like this.

But in a weird way, as you might have noticed, we know that prices are never negative. So instead of a line that might make the price negative, we bend the line a little bit so that it ends up at zero. This thick blue line is ultimately your function for predicting prices based on square footage. Part of it is zero, and part of the line fits perfectly. You might think that this function only fits house prices.

As a neural network, this is probably the simplest neural network. We take the area of the house as the input to the neural network (we call it), pass through a node (a little circle), and finally output the price (we denote it by). This little circle is actually a single neuron. Then your network implements the function on the left.

You’ll often see this function in the literature on neural networks. It starts approaching zero, and then it becomes a straight line. This function is called ReLU activation function, which stands for Rectified Linear Unit. Rectify means, this is why you get a function of this shape.

You don’t have to worry about not understanding the ReLU function now, you’ll see it again later in the course.

If this is a single neuron network, no matter how large or small it is, it is formed by stacking these individual neurons together. If you think of these neurons as individual Lego blocks, you build a larger neural network by building blocks.

Let’s look at an example, we not only use the area of the house to predict the price of it, now you have some information about the other characteristics of the building, such as the number of the bedroom, there may be a very important factor, the family also can affect house prices, the number of the house can stay a family or a four or five personal family? And it really depends on the size of the house and the number of bedrooms that really determine whether a house can fit the size of your family.

Changing the subject, you probably know that zip codes might be a feature that tells you how pedestrianized you are. Whether the neighborhood is highly walkable, whether you can walk to the grocery store or school, and whether you need to drive. Some people prefer to live in walkable areas, which are also correlated with affluence (in the US), depending on zip code. But other countries may also show how good nearby schools are.

Each of these little circles on the graph can be a part of ReLU, which means a modified linear unit, or some other slightly nonlinear function. Based on the size of the house and the number of bedrooms, you can estimate the size of the family. Based on the zip code, you can estimate walkability or the quality of the school. In the end, you might think, these determine how much money people are willing to spend.

For a house, these are all things that are relevant to it. In this scenario, household size, walkability and the quality of schools can all help you predict housing prices. So, for example, it’s all of these four inputs, it’s the price that you’re trying to predict, and by adding these individual neurons together, we have a slightly larger neural network. This shows the magic of neural networks, although I’ve already described a neural network that requires you to get square footage, walkability, quality of schools, or other factors that affect prices.

Part of the magic of neural networks is that when you implement them, all you have to do is input and you get output. Because it counts the number of samples in your training set and everything in between. So, what you’re actually going to do is: here’s a neural network of four inputs, and the characteristics of that input might be the size of the house, the number of bedrooms, the zip code and the affluence of the area. Given the characteristics of these inputs, the neural network’s job is to predict the corresponding prices. Also noticed that it is called hidden unit circle, in a neural network, they each get their input from the input of the four characteristics, for example, the first node on behalf of the family, and family only depends on the population and the characteristics, in other words, in the neural network, you decide what to want to get in this node, And then compute what you want with all four inputs. Therefore, we say that the input layer and the middle layer are tightly connected.

It’s worth noting that the neural network gives enough data about sum, gives enough training samples about sum. Neural networks are very good at calculating accurate mapping functions from PI to PI.

This is a basic neural network. You might find that your own neural network is so efficient and powerful in a supervised learning environment, that you can just try to input one, and map it to something like what we saw in the housing price prediction example.

In the next video, let’s go over some more examples of supervised learning, some of which make you think that your network is going to be very useful, and that you can actually use it.

1.3 Supervised Learning with Neural Networks

There are a lot of different kinds of neural networks, and some of them work just fine, given how well they work, but it turns out that almost all of the economic value created by neural networks so far is essentially dependent on a machine learning category called supervised learning. Let’s take a look at that.

In supervised learning you have some input, and you want to learn a function that maps to some output, like the housing price prediction example we mentioned earlier, you just put in some features about the house and try to output or estimate the price. Let’s give you a couple of other examples of where neural networks have been used very effectively.

One of the most profitable areas for deep learning today is online advertising. It may not be the most inspiring, but it’s profitable. By putting information about an AD on the site, and because you’re putting information about the user, the site decides whether or not to show you the AD.

Neural networks have gotten really good at predicting whether you’re going to click on that AD, by showing users the ads they’re most likely to click on, and that’s one way in which neural networks have been incredibly profitable for many companies. The ability to show you the ads you’re most likely to click on has a direct impact on the revenue of some of the largest online advertising companies.

Computer vision has also made great strides in the past few years, thanks to deep learning. You can put in an image and you want to output an index, ranging from 1 to 1000 to try to tell you about the photo, and it could be any one of, say, 1,000 different images, so you might choose to tag the photo with it.

The recent advances in deep learning in speech recognition are also very exciting. You can now feed audio clips into a neural network and have it output a transcript. Machine translation has also made great strides thanks to deep learning. You can use a neural network to input an English sentence and then output a Chinese sentence.

In autonomous driving, you can input an image, like an information radar showing what’s in front of the car, and from that you can train a neural network to tell the car exactly where it is on the road, which is a key component of the autonomous driving system.

So deep learning systems can already create so much value by making intelligent choices about what to do and what to do, tailored to your current problem, and then fitting the supervised learning component, often to a larger system like autonomous driving. This suggests that slightly different types of neural networks can lead to different applications, for example, in real estate, which we talked about in the last video, where we’re using a common standard neural network architecture?

Maybe a relatively standard neural network for real estate and online advertising, as we’ve seen before. For image applications, we often use Convolutional Neural Network, often abbreviated to CNN, on Neural networks. For sequential data, such as audio, there is a time component where the audio is played out over time, so audio is the most natural representation. As a one-dimensional time series (two Kinds of English words temporal sequence). For serial data, RNN, a Recurrent Neural Network, is often used. Languages, English and Chinese alphabres or words occur one by one, so languages are also the most natural serial data, so more complex RNNs versions are often used in these applications.

For more complex applications like autonomous driving, you have a picture that might show more CNN convolutional neural network structures, where the radar information is completely different, you might have a more customized one, or some more complex hybrid neural network structures. So to be more specific about what a standard CNN and RNN structure is, in the literature you may have seen a picture like this, which is a standard neural network.

You may also have seen a picture like this, and this is an example of a convolutional neural network.

We’ll see the principle and implementation of this diagram later in the course, but convolutional networks (CNN) are usually used for image data.

You’ll probably see pictures like this as well, and you’ll learn how to implement it later in the lesson.

Recursive neural networks (RNN) are well suited for such one-dimensional sequences, where data may be a time component.

You’ve probably also heard of machine learning for structured and unstructured data. Structured data means a basic database of data. In a housing forecast, for example, you might have a database with specific columns that tell you the size and number of bedrooms, known as structured data. Or predict whether the user will click on ads, you may get information about the user, such as age and some information about the ads, then predict classification annotations to you, this is structured data, which means each characteristics, such as housing size bedroom number, or a user’s age, there is a good definition.

Unstructured data, on the other hand, refers to things like audio, raw audio, or images or text that you want to recognize. The feature here could be a pixel value in an image or a single word in text.

Historically, it’s hard to process unstructured data, it’s hard for computers to understand unstructured data compared to structured data, and humans have evolved to be very good at understanding audio signals and images, text is a more recent invention, but people are really good at interpreting unstructured data.

The rise of neural networks is one of the most exciting things about this. Thanks to deep learning and neural networks, computers are now better able to interpret unstructured data than they were a few years ago, and that creates opportunities for us. Many new and exciting applications are being used, speech recognition, image recognition, natural language word processing, probably even more than two or three years ago. Because people have an innate ability to understand unstructured data, you’ve probably heard more about the success of neural networks in unstructured data in media, and when neural networks recognize a cat that’s really cool, and we all know what that means.

But the results also show that neural networks in many short-term economic value creation, also based on structured data. Like better advertising systems, better profit recommendations, and better ability to handle big data. Many companies have to make accurate predictions based on neural networks.

So in this course, many of the techniques we’ll be discussing will apply to both structured and unstructured data. To explain the algorithm, we’re going to draw a little more pictures in the example of using unstructured data, but as you can imagine, using neural networks in your own team, I hope you’ll find that neural network algorithms are useful for both structured and unstructured data.

Neural network has changed the supervised learning, is to create a huge economic value, it has been proved that the basic concept of neural network technology behind the most are not far from us, have a plenty of decades, so why they are now just beginning, so good effect, the next set of video we will discuss why the recent neural network has become a powerful tool you can use.

1.4 Why the rise of deep learning? (Why is Deep Learning taking off?)

This video focuses on the main factors driving deep learning to become so popular. Including data scale, computation and algorithm innovation.

Deep learning and neural networks have been around for decades, so why are they suddenly popular now? This lesson will focus on some of the key drivers that have made deep learning so popular, and will help you find the best time to apply it within your organization.

Over the past few years, MANY people have asked me why deep learning works so well. When I answer this question, I usually draw them a graph, a shape on the horizontal axis, where I plot the amount of data for all the tasks, and on the vertical axis, where I plot the performance of the machine learning algorithm. Such as embodied in the spam filtering accuracy or AD click forecast, in automatic judgment when driving a car or a neural network position accuracy, according to the images can be found, if you put a performance of traditional machine learning algorithms drawn, as a function of the amount of data, you may get a curved line, such as the picture, Its performance initially increases as more data is added, but after a period of change it becomes like a plateau. Let’s say you have horizontal axes that pull very, very long, and they don’t know how to deal with large amounts of data, and a lot of the problems that we’ve had in society over the last decade have relatively small amounts of data.

Thanks to the advent of the digital society, now the amount of data is very huge, we spent a lot of time activities in the field of these figures, such as the computer website, and other digital services on mobile phone software, they can create data, at the same time a cheap camera is configured to mobile phones, and accelerometer and various kinds of sensors, We’re also collecting more and more data on the Internet of things. For many applications in just the last 20 years, we’ve collected a lot more data than machine learning algorithms can efficiently exploit.

What neural networks show is that if you train a small neural network, the performance might look like the yellow curve below; If you train a slightly larger neural network, say a medium-size one (the blue curve below), it will also perform better on some data; If you train a very large neural network, it will look like the green curve below and keep getting better and better. So two things to note: if you want to get high performance, you have two things to do, the first is you need to train a neural network that’s large enough to take advantage of the large amount of data, and you need to be able to draw on this axis, so you need a lot of data. So we often say that scale is what drives deep learning, and by scale I mean also the scale of neural networks, we need a neural network with many hidden units, many parameters and correlations, just like we need large data. In fact now the surest way to get better performance on neural network, is often either a larger neural network training, or more data, this can work to some extent, because eventually you run out of data, or your network is so large eventually lead to going to use for too long time to training, But just scaling up did allow us to spend a lot of time exploring the world of deep learning. In order to make the figure more technically more precise, I has set forth the amount of data under the shaft, here to add a label (label), by adding the label, which is to point to in the training sample, we input and tags at the same time, the introduction of a symbol, use lowercase letters said the size of the training set, Or the number of training samples, this lowercase letter is going to be combined with some other detail on the horizontal axis into this image.

In this small training set, the priorities of the various algorithms are actually not very well defined, so if you don’t have a large training set, the effect will depend on your feature engineering capabilities, which will determine the final performance. Suppose some people train a SVM (support vector machine) to perform closer to the correct characteristics, while some people train on a larger scale, perhaps the SVM algorithm can do better in this small training set. So you know that on the left side of the graphics area, the priority between the various algorithms is not defined clearly, the final performance is more depends on your ability to choose in with engineering characteristics and the algorithm to deal with some of the details, just in some of the large scale of data is very large training sets, which is on the right side when this will be very big, We can see more and more of the other ways that neural networks control, so if any of your friends ask you why neural networks are so popular, I encourage you to draw one for them.

So it’s safe to say that in the early days of deep learning, the scale of data and the amount of computation was limited by our ability to train a very large neural network, whether on a CPU or GPU, which enabled us to make great progress. But gradually, especially in the last few years, we’ve also seen tremendous innovations in algorithms. A lot of algorithmic innovation has been trying to make neural networks run faster.

As a concrete example, one of the great breakthroughs in neural networks was to convert the Sigmoid function to a ReLU function, which we talked about earlier in the course.

If you can’t understand some details, just now I said don’t need to worry about, can know a sigmoid function and machine learning problem is that in this area, namely the sigmoid function gradient will be close to zero, so the speed of learning will be very slow, because when you realize the gradient descent and gradient is close to zero, Parameter is updated slowly, so the learning rate also will be very slow, and by changing the things called activation function, the neural network in using this function, called the ReLU function (modified linear units), ReLU it for all the negative input of the gradient is zero, so the more doesn’t tend to gradually reduce to zero gradient. And the gradient here, the slope of the line is zero on the left. Just by converting the Sigmod function into the ReLU function, an algorithm called gradient Descent can run faster, which is an example of perhaps relatively simple algorithm innovation. But the impact of the fundamental algorithm innovation is actually the optimization of the computation, so there are lots of examples where we change the algorithm to make the code run faster, which allows us to train larger neural networks, or multi-port networks. Even if we have from all of the data on a large scale neural networks, a quick calculation is more important for another reason is that the process of training your neural network, a lot of times is intuitive, often you had an idea of the neural network architecture, so you try to write code to realize your ideas, and then allow you to run a test environment to tell you, How good your neural network effect, by reference to the results back to modify your neural network inside some of the details, and then you kept repeating the above operation, when you need a long time to train neural network, it takes a long time to repeat the cycle, there is a big difference here, according to your production efficiency to build more efficient neural network. When you can get an idea, give it a try and see how it works. In 10 minutes, or maybe a whole day, if you train your neural network for a month, sometimes it’s worth it, because you get a result very quickly. In 10 minutes or in a day, you should try more ideas, and that will most likely make your neural network work better and faster on your application, and really help in terms of speed up, so that you can get your experimental results faster. It also helped the researchers of neural network and the researchers of the project iteration is faster in depth study of work, can quickly improve your ideas, all of which makes the study of the deep learning community become so prosperous, including an incredible invention new algorithm and make continuous progress, these are the pioneers in doing, These forces keep deep learning growing.

The good news is that these forces are now working regularly, making deep learning better and better. Research shows that our society is still throwing away more and more digital data, or computing with special hardware, such as gpus, and faster Internet connections to all kinds of hardware. I’m very confident that we can build a super-scale neural network, and that the computing power will improve further, and that the algorithmic relative learning community will continue to produce extraordinary innovations on the algorithmic front. Based on this, we can answer with optimism, and remain optimistic about deep learning, which will get better and better in the years to come.

About this Course

You are nearing the end of the first week of the first course of this special course. First, a quick overview of what you will learn next week:

As mentioned in the first video, this special subject has five courses, and the first course is currently in: Neural network and deep learning. In this course you will be taught the most important basics. By the end of the first course, you’ll have learned how to build a deep neural network and make it work.

Here are some details about the first course, which has four weeks of study materials:

Week 1: Introduction to deep learning. At the end of each week, there will also be ten multiple choice questions to test your understanding of the material;

Week 2: About neural network programming knowledge, understand the structure of neural network, gradually improve the algorithm and think about how to make neural network efficient implementation. From the second week, do some programming training (paid project) and implement the algorithm by yourself;

Week 3: After learning the neural network programming framework, you will be able to write a hidden layer neural network, so you will need to learn all the key concepts necessary to make the neural network work;

Week 4: Build a deep neural network.

I’m going to end this video, but hopefully, after this video, you can check your understanding by looking at the ten multiple choice questions on the course website, so you don’t have to go over what you know, you don’t know what you know right now, you can keep trying until you get it all right to understand the whole concept.

1.6 Course Resources

I hope you enjoyed this course, and to help you complete it, there will be a list of course resources.

First of all, if you have any questions, or if you want to discuss any questions with other students, or if you want to discuss any questions with the faculty, including myself, or if you want to file a mistake, the forum is the best place to go, and I and the faculty will follow the forum regularly. The forum is also a good place to get answers to questions from your classmates. If you want to answer questions from your classmates, you can go to the forum from the course homepage:

Click the forum TAB to enter the forum

The forum is the best place to ask questions, but for some reason, you might want to contact us directly, you can send an email to this address, and we’ll try to read every email and try to address common questions. Because of the volume of mail, not every message can be answered quickly. In addition, some companies will try to do deep learning training for their employees. If you want to be responsible for your employees and hire experts to train hundreds or more employees in deep learning, please contact us by email. We are in the initial stage of university academic development. If you are a university leader or administrator and want to open a deep learning course in your university, please contact us through the university email. The email address is as follows. Good luck!

Contact us: [email protected]

Companies: [email protected]

Universities: [email protected]

The resources

[1]

Deep learning courses: mooc.study.163.com/university/…

[2]

Huang Haiguang: github.com/fengdu78

[3]

Making: github.com/fengdu78/de…