In this age, I am not afraid that you are Daniel, but I am afraid that Daniel will write articles.
As an AI100 think tank expert, Zhi Liang can always be in eloquently spoken, let a person learn a pile of dry goods, tao Heart of dry goods.
Years of actual combat experience, let wisdom bright whether in zhihu solution, or on the various blogs, are the existence of god.
Just a few days ago, I talked with Zhiliang about AI introduction. His endless stream of words shocked me. It was so practical. Repeatedly under the request, wisdom liang promised to write out their years of experience, this is written on nearly ten thousand words, whether it is his passion for the future of the prediction, or for the actual combat meticulous analysis, or a hard persuasion, all in the inside don’t spit fast.
This article mainly solves three problems:
1. Should developers turn to machine learning?
2. What does it take to switch to machine learning?
3. How do you start machine learning?
In the end, Zhiliang said, after writing, I felt my body was hollowed out.
It seems to be a real effort!
Enjoy.
The author | wisdom bright
Edit | pigeons (rgznai100)
Once upon a time, we read the biography of Bill Gates, lamenting the genius who could write OS by hand in college. Meanwhile, we couldn’t help but sigh: how can a few people and a few computers write an OS now?
Steve Jobs’s garage business, though heartwarming as chicken soup, has passed. All that remained were countless experienced programmers struggling to make the transition from C++/Delphi.
In the era of the rise of the Internet, Ma Huateng developed OICQ with a few people, creating the vast Tencent empire today.
In the age of the smartphone boom, you can make a fortune copying a game; And now, even if you make an APP with your heart, the promotion cost is at least one million.
The era of the Internet (and mobile Internet) is drawing to a close, and grassroots entrepreneurship has become extremely difficult.
JAVA programmers, representing the Internet era, began to sunset; Android/iOS programmers, who represent the era of the mobile Internet, should also feel that the job market is getting tougher.
Each wave of arrival, it means a piece of unoccupied blue sea, also means many new growth of the giant, what else? It means a huge demand for personnel, a tight development market, high salaries and opportunities for practitioners.
The most common thing we do is watch the aftermath of the last wave go by, lamenting that we were born at the wrong time, not realizing that the next wave has already arrived.
Yes, we are talking about AI
Andrew Ng, baidu’s former chief scientist, has said that he sees machine learning as changing the world as electricity. And more people have begun to predict the industrial revolution brought by ARTIFICIAL intelligence with the term “the fourth industrial Revolution” :
The first three industrial revolutions rid mankind of heavy manual labor, fine manual labor and simple calculation labor. This time, machine learning, along with ARTIFICIAL intelligence, is likely to spare humans a lot of human resources on simple thinking and judgment tasks.
For example, in China alone, there are 1.3 million taxis and over 10 million cargo vehicles. That is to say, there are 10 million people who take driving as their main occupation every day. If autonomous driving becomes widespread, tens of millions of people, at least in China, will be freed, which means that old drivers will have to learn how to eat all over again.
Security, for example, employs nearly a million people a day across the country who sit in front of various surveillance screens (yes, the job that gets its neck slapped in every spy/crime movie). However, when smart cameras appear, surveillance personnel will be defeated by AI, which can judge abnormal situations and recognize tens of thousands of faces in 0.1 seconds.
There are video site pornographer, will be lost to no hormones but can detect hundreds of porn videos per second AI;
Simultaneous interpreters will lose out to AI that has no latency and can talk to people simultaneously in dozens of different languages.
There are also web text screening editors, Courier address sorting, etc., these jobs we have heard of or not heard of, these jobs consume countless people’s time and energy, will be defeated by AI one by one.
This is not simply a matter of unemployment and re-employment. It is a new social restructuring, a wrenching overhaul of the way we take life for granted.
Even as machine learning takes these people out of work and forces them into other jobs, our society has embarked on a historic revolution.
Let’s take an example.
Autonomous driving, which is an area where the giant companies are putting a lot of effort. The intuitive feeling of many people is that after buying a car, it will not be so tired to drive.
Yes, but it’s not that simple.
Let’s dig into this problem.
Do you really need to buy a car if all cars go self-driving?
When we buy a car, we don’t buy four wheels and an engine, we buy “the ability to travel quickly, whenever we want”, so that we don’t have to endure the bus is too crowded, we can’t get a taxi in an emergency, we can’t come back from a remote area, and so on.
And in the case of automatic driving, drivers shift, huddle, remote places can not get a car, no one to pick up the order online, will there be any situation? So why would we all have to buy a car and put up with traffic and parking problems?
There will be plenty of driverless cars sitting quietly in huge parking lots, and a few minutes later, if someone orders, a driverless car will come up to you and be driven by you.
Let’s think a little bit further along this line.
When driverless cars flood our streets, do they necessarily need to be five-seater? Perhaps according to the analysis of actual operation data, single-person vehicles and two-person vehicles will become the mainstream of unmanned vehicles, which will not only make travel more convenient, reduce waste, but also greatly reduce the pressure of traffic flow and relieve congestion.
In the future, there may well be no traffic congestion. Imagine that when the real-time status of all vehicles is summarized to the control organization through the network, the control organization can echo with the vehicles at a distance according to the real-time road conditions, and the vehicles can automatically plan a more appropriate path to relieve the traffic pressure.
Think again, when intelligent driving is reliable enough, is the current traffic light mechanism still necessary? Even said, now all traffic rules will there be a huge change?
That’s enough imagination for a single technology to change the entire structure and rules of travel. Every technology in the future of ARTIFICIAL intelligence is a complete rewrite of its industry. As for what it could be rewritten into, I’m afraid no one can say.
It is only conceivable that we will need more machine learning experts and related developers without having to learn foreign languages or take a driver’s license.
We will need more AI practitioners
In fact, these things, the body in the IT circle, should have an intuitive understanding. In recent two years, more and more people talk about machine learning and neural network, and all kinds of “artificial intelligence” related news is overwhelming, there is a strong momentum of “not in the circle or out”.
At the same time, however, a lot of uneducated developers have been intimidated by the rhetoric.
What is convolutional neural network? What is convex optimization? Should I go back to rereading high numbers, line algebras, probabilities? That’s a lot of formulas. You don’t understand them at all, do you? I heard that no famous PhD can do this?
It’s not just common programmers. Literary programmers and… Well, that’s what all the fancy programmers say.
I said, hehe.
As mentioned in a long time ago answer in Zhihu, as a developer, I think the machine learning world is divided into several layers:
➤ Academic researchers
Their job is to theoretically interpret all aspects of machine learning, to try to figure out “why models/parameters work better this way”, to provide better models for other practitioners, and even to push the theoretical research one step further.
It can be said that very few people can achieve this step, talent is an irreversible mountain, opportunities and efforts are also indispensable.
➤ Algorithm improvers
They may not be able to answer the question “why does my method work?”, they may not have the historic results of Hinton and LeCun, but they can use their experience and some creative ideas to play the existing models better or come up with some improved models.
These people are usually the backbone of the machine learning giants or the growing unicorns, and they don’t have a problem with which model to use. They usually have a fixed number of options depending on their environment.
At this level, insight and idea are the most important things, and the difference between various tools doesn’t really matter that much. It’s possible to get a result a few days early or a few weeks late, but it’s impossible to get a result.
➤ Industrial implementers
These people are basically not going to get too deep into the algorithms-that is, the implementation of the various algorithms, the structure of the various models. They are more likely to replicate good work from papers, or take work that has been replicated by other people, and try to apply it to industry.
Sorted, let’s get down to business.
What does it take to become a machine learning developer? How do you go from a C++/JAVA/Android/iOS programmer to a machine learning developer?
There is only one answer:
Just Do IT, boy
As a programmer, reading a book ten times is better than running a program, and spending a lot of time hitting the books is better to finish your own program and run it. As we write code, we learn what we don’t know, so we can target machine learning.
Basic knowledge of
Of course, do not build high platform in floating sand (familiar to this sentence, please raise your hand), there are some basic knowledge or need to master. For example, in the field of computer vision, according to our team’s internal training experience, in order to independently develop machine learning, it is better to complete the following courses:
➤ Getting to know the basics of concepts and algorithms
Coursera Machine Learning Course – by Andrew Ng
Machine Learning Coursera
https://www.coursera.org/learn/machine-learning
➤ Advanced, Multilayer Neural Networks, convolution, and Softmax regression:
Stanford Machine Learning course UFLDL
UFLDL Tutorial
http://ufldl.stanford.edu/wiki/index.php/UFLDL_Tutorial
Focus on softmax regression, convolution and pooling
➤ Advanced, computer vision, new progress, implementation and application of convolutional Neural networks:
Stanford Computer Vision CS231n
Stanford CS231n to deep learning and computer vision to netease cloud classroom at http://study.163.com/course/introduction/1003223001.htm
The above courses will probably consume all your spare time for 1 to 2 months. But trust me, it’s worth it. There are plenty of courses online, and many excellent free courses, but as a starter, I couldn’t find a better fit than these three.
If you can’t spare a month or two of your spare time, I’m going to sneak you in on one of the most basic requirements that you need to get into machine learning:
You can do matrix multiplication
Seriously, in this era of highly encapsulated frameworks, gradients didn’t need to be calculated, losses didn’t need to be calculated, and reverse conduction was handled so well that you could write your first program without even knowing the following concepts:
-
It simply maps the input space to the output space through a series of matrix operations (or some similar operation). The value of the matrix involved in the operation is called the weight, which needs to find the optimal value through continuous iteration.
-
How far is the current weight value from the optimal value represented by a numerical value, which is called loss, and the function that calculates this value is called loss function.
-
Whether the current weight value should be increased or decreased is determined by taking the derivative of the loss function, which is called the gradient.
-
The method of updating weights by loss and gradient is called reverse conduction.
-
The iterative method is called gradient descent.
Although the program written in this way must not know why, in fact, 20 years ago when I wrote Hello world in C++ for the first time, I believe that every programmer who has devoted himself to machine learning development has great perseverance and courage, naturally will not lack the motivation and determination to continue learning.
Choose the framework
Well, then you need to find a framework to use as the development environment for your first application. (AI100 xiaobian note: the following is really all dry goods, real as fake package exchange real value of actual combat experience, stare good eyes began)
At present, there are many machine learning frameworks, which are roughly divided into two camps from the dimension of users:
➤ Academically friendly: Theano, Torch, Caffe
In academic research, it is common to come up with a new model, new algorithm and new function, and making a new breakthrough is also the most basic requirement of academic research. As a result, these frameworks often make it easy to customize the model or modify the internal implementation in depth. Many new achievements will be published in the paper at the same time, provide the implementation code on these frameworks for reference. Performance is also better.
The trade-off is either using a development language interface that is difficult (Caffe: C++) or niche (Torch: Lua), or having some odd shortcomings (Theano: extremely slow compilation).
Moreover, none of these frameworks seem to have given much thought to the “how to deliver the service” question. Want to deploy to a server? Caffe is at its simplest, but still has a long and painful trial and error.
➤ Industrial friendly: Tensorflow, MXnet, Caffe
Industry tends to focus more on “getting something done and making it work.” So these frameworks need to support parallel training first. Tensorflow and MXnet support multi-machine multi-card, single-machine multi-card, multi-machine single-card parallel, and Caffe supports single-machine multi-card. Although none of these performance is particularly desirable.
In our tests, Tensorflow’s dual-card parallelism only achieved about 1.5 times the performance of a single card, and the more cards, the lower the ratio.
Caffe is better, but parameter synchronization and gradient calculation take time anyway, so no framework can scale without a performance penalty.
In the case of multiple machines, performance loss is greater, and most of the time, people feel unacceptable (this aspect of optimization can only be mentioned in the future opportunity, if there is a similar aspect, welcome to discuss).
Tensorflow provides a relatively good deployment mechanism (Serving) and has a direct deployment scenario to mobile. MXnet and Caffe, on the other hand, compile directly, which is possible but, frankly, cumbersome.
As for the disadvantages, except Caffe, the other two frameworks are not too tight for dynamic tracking in academia. Tensorflow does not have an official implementation of pRelu, and just introduced a series of Detection models a few days ago.
MXnet is more aggressive, but limited by the small developer community, a lot of work has to wait for the gods to make a contribution or come to fruition on its own.
So is Caffe the best frame to use? Academic and implementation, flexibility and performance…
I have to say, I do think so. Of course, there is a premise, to understand C++……
If you’re not a C++ developer, the language isn’t much easier than machine learning.
For most of you interested in machine learning development (as opposed to research), I recommend Tensorflow as your first development framework.
Besides the above advantages, the most important factor is that it is popular. You’ll always have a group of like-minded people to consult or work with on any problem. This is very important for beginners.
Well, the choice is as simple as that.
Also, as a matter of conscience, whatever framework you choose, don’t try to run it on Windows. Even if it’s MXnet for Windows or the new Tensorflow, don’t ask me how I know… Go ahead and install Linux. Ubuntu14.04 or 16.04 is recommended.
Learn machine configuration
OK, then we need a machine to set up the frame and write our helloAI. And yet I see a lot of places where people are asking,
What configuration do I need to learn machine learning?
Do I need to buy a GTX1080/Titan/Tesla?
How many graphics cards should I install? A piece? Two pieces? Or four?
The answer tends to be: “Must have a GPU, at least 1080, you can’t even say hello without a four-way Titan.”
Well, not really.
If it’s just getting started and learning, the CPU or GPU doesn’t affect learning code and frameworks at all. When you run toy datasets like Mnist or Cifar, there’s not much difference. On my machine, for example, I ran the Cifar demo with the I7 CPU and GTX 1080 Ti at about 770 pics/s vs. 2200 pics/s. The GPU is probably less than three times the performance advantage.
See, the gap isn’t that big.
Here’s a tip, though. If you want to use the CPU version of Tensorflow, it’s best to use a self-compiled method instead of a PIP download. Because it will automatically open in the development of machine compile all supported acceleration instruction set (SSE4.1 / SSE4.2 / AVX/AVX2 / FMA), so that the CPU computing speed up greatly.
According to our tests, with the full acceleration instruction set turned on, the training speed increased by about 30% and the predicted speed roughly doubled.
Of course, if you really want to use a complex model to deal with real production problems, the complexity and data volume of the model are not comparable to that of a toy data set like Cifar. If using one of our model to run same Cifar data sets, the other parameters and conditions are exactly the same, it’s in the i5 / i7/960 / GTX1080 GTX1080Ti speed are respectively (unit or pics/s, the bigger the better) :
19/25/140/460/620
At this point you can see the difference, 1080Ti is about 25 times that of the I7 CPU. When inference was used online, GPU also had 10-20 times performance advantage. The more complex the model, the more obvious the advantage of GPU.
With all this in mind, if it’s just for getting started, I recommend not buying a GPU machine, but using your existing machine, using the CPU version, to learn the framework and basics. When you have a solid grasp of the basics, you will naturally want to run some more complex models and more “real” data. At this time, you can consider buying a GPU to shorten the training time.
When choosing a GPU, I’ve heard some friends recommend the GTX1070 x 2 option. In theory, the 1070 is about 75% of the performance of the 1080 at half the price, so it seems like a dual 1070 has the edge in every way.
However, keep in mind that dual-card performance is not twice as good as single-card performance. On current Tensorflow, it is only about 1.5x, which is about the same as 1080 single-card performance. And dual graphics card motherboard and power supply and chassis cooling need more consideration, from the cost-effective point of view is not really cost-effective.
However, if the graphics budget is stuck in the 5000-6000 slot, the dual 1070 has its advantages. For example, you can learn how to use multiple graphics cards for parallel computing. When you are not in a hurry, you can run two different tasks with two graphics cards at the same time. When combined, you can have 16G video memory and so on. With these considerations in mind, the Dual 1070 is indeed the best choice for beginners — if you can’t afford a pair of 1080/ TITAN (hahaha).
If you’re going to use notebooks as your primary learning machine, my advice is: don’t, unless you’re experienced with Linux or don’t want to use GPU acceleration. Many laptops have driver problems after installing Liunx, and the high heat when using GPU acceleration can also affect the stability of the system. If you don’t have a lot of experience, you can often lose hours of valuable learning time on a problem.
➤ Potholes when installing Tensorflow
In general, installing the CPU version via PIP on a clean system should not be a problem, following the instructions on the website.
A common rookie mistake is forgetting to implement:
sudo pip install –upgrade pip
Tensorflow cannot be found during installation.
The most common pit for GPU versions is:
-
Forget to turn off Lightdm and install the driver
It doesn’t matter. Execute
sudo stop lightdm
It’ll be ok. Ubuntu 16.04 with
sudo systemctl stop lightdm
-
Second query when installing CUDA
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 xxx.xx?
Typing yes is wrong here!
Remember to type no! Do not install the built-in CUDA driver. This point is particularly important, after the installation is often stuck in the GUI input password interface loop.
Starting data set selection
Mnist? Cifar? ImageNet? COCO? What are these?
➤ MNIST
No matter which textbook or framework you choose, when you first get into machine learning, you will surely come across the name Mnist (pronounced M-NIST).
This is a handwritten digital library built by Yann LeCun. Each piece of data is a fixed 784 bytes, consisting of 28×28 gray pixels, and looks something like this:
The goal is to 10-classify the input to output the real number represented by each handwritten number. Because it is famous for its small size (about 10M), large data (60,000 training pictures) and wide application range (NN/CNN/SVM/KNN can be used to run), its status is equivalent to the Hello World of machine learning. On LeCun’s Mnist website, the best scores of various models running this dataset are posted, and the current best score is about 99.7% of CNN’s.
MNIST handwritten digit database, Yann LeCun, Corinna Cortes and Chris Burgeshttp://yann.lecun.com/exdb/mnist/
Because the data set is so small, it is possible to run NN training in seconds, or a simple CNN model in minutes, even on a CPU.
➤ CIFAR
For those who want to start with images, the Cifar (pronounced See Far) database is a better way to get started.
Cifar-10 and CIFAR-100 DATASets
http://www.cs.toronto.edu/~kriz/cifar.html
This database is divided into two versions, CIFAR-10 and CIFAR-100. As the name suggests, CIFAR-10 has 10 categories, each with 5000 training images and 1000 test images, each image is a 32×32 pixel 3-channel bitmap. It looks something like this:
Cifar-100, on the other hand, has 100 categories, and each category becomes 500 training images +100 test images, with no change in image size.
It’s a better introduction to image processing than Mnist because it’s a three-way, realistic photo, albeit with a lower resolution. Some of the images have slightly more complex backgrounds that are closer to our real image manipulation scenes. Mnist’s grayscale input and clean background are relatively simple, and its 99.7% accuracy is really hard to improve.
Tensorflow gives the Cifar routine:
https://www.tensorflow.org/tutorials/deep_cnn
With the code: Tensorflow/Models
https://github.com/tensorflow/models/tree/fb96b71aec356e054678978875d6007ccc068e7a/tutorials/image/cifar10
➤ ImageNet and MS COCO
ImageNet (ImageNet) and COCO (http://mscoco.org/) are two industrial-grade image datasets. Usually when we refer to them, ImageNet refers to ILSVRC2012 training set and COCO is coco-2014 training set.
ImageNet has a large number of images (more than a million, divided into 1,000 categories) and annotations, most of which look like this:
COCO has a smaller number of images (more than 80,000 in 80 categories), but each image is marked with outline and accompanied by a classification label and 5 descriptive sentences (in English). It goes something like this:
So when we get into the actual work, we can choose our own data set as benchmark or Pretrain data set according to specific needs.
Run a CIFAR demo with video card, video memory allocation
After Tenforflow is installed, we can run the first Cifar demo as fast as this:
git clone https://github.com/tensorflow/models.git my_models
cd my_models/tutorials/image/cifar10/
python cifar10_train.py
OK, just a few minutes after downloading the data, we can see our first “image recognition model” in training.
In the process of training, we can see that loss information is continuously output in log. However, in addition to tracking loss, we also hope to see how accurate the current training model is in recognition, which is not provided by the script cifar10_train.py. We also need to execute
python cifar10_eval.py
This script continuously verifies the accuracy of the most recent checkpoint.
If using the GPU, will find that after the script to run training, all the memory has been filled the process, start the test script again complains a lot of memory (OOM), the decision is Tensorflow mechanism, all the memory it will default to occupy all the graphics card, regardless of whether you really need that much.
The solution to this problem is simple.
First, we can specify which graphics cards Tensorflow uses for training. To do this, specify the environment variable on the command line before executing this:
Export CUDA_VISIBLE_DEVICES = “0, 2”
“0, 2” is the GPU number you want to use, starting from 0 and separated by commas.
Or create a GPUOption in your code and set visible_device_list= ‘0,2’ to have the same effect.
We can then limit the amount of video memory Tensorflow uses so that it grows dynamically rather than fills up on startup. Create a GPUOption and set allow_growth=True.
The official Cifar routine can achieve about 86% accuracy, this result can be said to be relatively poor now, the latest model usually has about 97% accuracy, even if it is not carefully adjusted and random training, can easily reach about 93%, you can try to modify the model defined in ciFAR10.py, get better results.
I’m done. Then what?
So, after running this example, you’re actually a machine learning engineer.
Then you can collect some of your own data and train some of your own recognition engines; Or try to optimize the model and feel the pain of the so-called tuning party; Or directly try to implement more advanced networks such as ResNet and Inception to override Cifar; Or try NLP or reinforcement learning.
In short, these things are not nearly as difficult as they seem.
Of course, no matter which path, learning, progress and self-motivation are unavoidable courses.
In a new field, the vigor and vitality inevitably means the endless emergence of new achievements. The completion of the three courses I mentioned above can only make a person become a layman from the circle, with the basic qualifications to enter the field and catch the wave, as for whether to become a tide player or directly swallowed by the wave, or as the saying goes, no pain, no gain. Study hard may not be successful, and not to study hard, is doomed to be nothing.
Congratulations to those of you who have made it through all the courses. According to the data we have, only about 10% of current programmers have decided to make the switch to AI and have done so. Less than 30 percent of those who were able to complete basic courses. That is, by the time you finish the last assignment of CS231n, you are already in the Top 5% of developers.
At this level, there are usually two more difficult tasks: catching up on papers and catching up on math.
In machine learning, ArXiv has become a syndrome, with more advanced content popping up almost every week, and a lot of accepted knowledge being rewritten every month or two? I can’t imagine it. So keeping track of what’s going on in academia has become a required course for practitioners. Even as industrial implementers, if they cannot grasp the dynamics of the academic world at the first time, they will certainly face the dilemma of being overwhelmed by their rivals’ sudden strength.
Fortunately, there is one big difference between the ethos of machine learning and the traditional world of academia. Thanks to Iron Man Elon Musk’s OpenAI organization, academia has developed a ethos of pre-printing papers on ArXiv and open source. It’s just not enough to understand these papers and code. When you see this self-normalizing Neural Networks paper, you will understand what it means to “not have a PhD in this stuff” :
[1706.02515] Self – Normalizing Neural Networks
https://arxiv.org/abs/1706.02515
In any case, thanks to the open source academic environment, all practitioners are on the same starting line: the latest achievements are no longer monopolized by big companies, but small companies can also overtake each other and occupy a niche market with the most advanced products.
At the same time, it also makes everyone feel more pressure: maybe the hard-earned experience, skills, models and data will suddenly become worthless in the face of a subversive result, and they will have to face the blow of dimension reduction from the newcomers anytime and anywhere.
This is a grueling field.
It’s an exciting field.
This is a fascinating area.
This is an exciting area.
This is heaven.
This is hell.
We’ve been waiting here for you. Come, boy?
Zhiliang: Co-founder of Lulang Software, TensorFlow Contributor, head of Machine learning team of Flower Partner.
With more than ten years of development and management experience. In 2010, I started my business in the mobile Internet industry, and my team has developed several mobile terminal products with tens of millions of users. Recently, the artificial intelligence flower recognition application developed by my team has been online for several months, rapidly reaching millions of users under the condition of 0 promotion. I won the first prize of 2017 Alibaba Cloud API Solution Competition, and was invited to be one of the nine speakers in wechat Open Class Beijing station. As the machine learning developers in his team are mostly mobile Internet developers who have been transformed through internal training, his judgment on the industry and his experience in the transformation of programmers into the machine learning field are worth learning from.
Copyright Notice: This article is the exclusive contribution of experts from AI100 think tank, edited by AI100. If you need to republish, please reply “Republish” on wechat background for authorization. The content of this article is solely the opinion of the author.