The heart of the machine is original
Participation: Li Zannan, Li Yazhou, Huang Xiaotian
In 2016, the ai industry experienced soaring accuracy of speech recognition, major breakthroughs in neural machine translation, and the prosperity of image style transfer. Expectations for AI grew even higher in 2017, but there was still a lot of exciting research coming out of institutions and universities. This paper will attempt to make an inventory of the important scientific research achievements in the field of artificial intelligence in 2017.
Keynote from Jeff Dean, head of Google Brain: The number of machine learning papers people submit on arXiv is growing in Moore’s Law. Is artificial intelligence technology advancing as fast?
AlphaGo: Start from “zero”
Today, whenever we talk about ARTIFICIAL intelligence, we can’t help but mention AlphaGo, the famous Go program developed by Google’s DeepMind research company, which made AI fever again in 2017. From the beginning of the year, in the name of “Master”, AlphaGo faced various human Go masters on the network go platform, to the “man-machine last battle” with Ke Jie and others in May, the story of AlphaGo showed the powerful ability of computer in Go again with another Nature paper “AphaGo Zero” in October. AlphaZero, which followed, generalized this power into other areas.
The heart of the machine also witnessed the beginning and end of AlphaGo. During the man-machine war in May, apart from the live report, we also invited Martin Muller, professor of University of Alberta and top computer Go expert, and Dr. Li Yuxi, author of the paper “Review of Deep Reinforcement Learning”, to watch the live broadcast of the match. Muller’s team has developed monte Carlo methods for game tree search and planning, large-scale parallel search, and combinatorial game theory. Indeed, David Silver and Aja Huang, the first and second authors of DeepMind’s first Nature paper on AlphaGo, both of whom were involved in the design and development of the master go program, studied under him.
After beating famous Chinese go players such as Ke Jie, AlphaGo’s story appeared to be over as DeepMind announced the end of its man-machine war programme. On October 18, however, DeepMind surprised the world again with yet another Nature paper, Mastering the Game of Go Without Human Knowledge. In the paper, DeepMind demonstrated for the first time a new version of AlphaGo Zero — without any human knowledge. After three days and millions of games of self-defeating, it could easily beat Lee’s 乭 version of AlphaGo 100-0. DeepMind founder Hassabis said: “Zero is the most powerful, efficient and versatile version of AlphaGo to date — we will soon see the technology applied to other areas.”
It didn’t take long for Hassabis’s declaration to materialize, and during NIPS 2017 in December, DeepMind published a paper titled “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm,” It describes AlphaZero, a new generation algorithm that uses AlphaGo Zero technology and can be generalized to tasks in other fields. The new algorithm can start from scratch and surpass human performance on multiple tasks through self-playing reinforcement learning. After less than 24 hours of training, it is said to be able to beat the current best computer programs in the industry at chess and Japanese chess, which have already surpassed the level of human world champions, and easily beat AlphaGo Zero, which trained for three days.
On December 11, DeepMind released a go teaching program that included some 6,000 major opening changes in recent go history, all accompanied by winning rates assessed by AlphaGo. DeepMind had hoped to advance the human game. After the announcement, One of AlphaGo’s key researchers, Dr Huang Shijie, announced that he was leaving the project to work elsewhere at DeepMind, putting an end to AlphaGo’s work on go.
Read more:
- DeepMind has confirmed the Master’s identity: a full review of AlphaGo’s re-emergence
- Ke jie lost a quarter of the game, the heart of the Machine exclusive conversation with AlphaGo developer mentor Martin Muller
- Ke Jie conceded the second game in an exclusive interview with AlphaGo developer mentor Martin Muller
- AlphaGo will win a single match with five tigers, and AlphaGo will have the last laugh
- DeepMind’s next go program, AlphaGo Zero, is once again in Nature without human knowledge
- It’s not just go! After AlphaGo Zero, DeepMind launched the generalized reinforcement learning algorithm AlphaZero
Texas Hold ’em beat the Humans: DeepStack vs. Libratus
The complex poker game Texas Hold ’em has been mastered by artificial intelligence (AI). And the game hasn’t been conquered for once — bots developed by two different research teams have beaten humans in one-on-one Texas Hold ’em matches. In January, An artificial intelligence program called Libratus, developed at Carnegie Mellon University (CMU), defeated four professional human players Jason Les, Dong Kim, Daniel McAulay and Libratus in a 20-day one-to-one championship at the Rivers Casino in Pittsburgh, Pennsylvania Jimmy Chou. At the other end of the spectrum, Deepstack: A study by researchers at the University of Alberta in Canada, Charles University in Prague and the Czech Polytechnic University in the Czech Republic Expert-level Artificial Intelligence in Heads-up No-limit Poker was published in the prestigious academic journal Science, The research team showed that ARTIFICIAL intelligence has reached expert level at no-limit Poker.
Unlike go, where information is completely open, Texas Hold ’em is a game with imperfect information that reflects real-life situations when we face problems, such as auctions and business negotiations, so the technological breakthrough in Texas Hold’ em also means that the development of ARTIFICIAL intelligence is accelerating.
Interestingly, DeepStack and Libratus approach the same problem in different ways: DeepStack uses deep learning to train for a wide variety of cards (more than 11m), giving it a “gut feeling” about the probability of winning an actual match; Libratus adopts the game solving technique based on Nash equilibrium.
Read more:
- Industry | will challenge the Texas poker, artificial intelligence and human for the $200000 prize
- Blockbuster | DE man-machine war MSC, Libratus beat the world’s top poker player
- Academic | new paper puts forward to play poker DeepStack: artificial intelligence has reached level professional players
- NIPS Best Paper of 2017: CMU Lunpu Master imperfect Information Game Research award
Self-normalized neural network
Self-normalizing Neural Networks, a machine learning paper published on arXiv in June this year, was accepted by NIPS 2017, Its authors are Gunter Klambauer, Thomas Unterthiner and Andreas Mayr of Linz University in Austria. This paper attracted great attention in the circle after it was submitted. It proposed the scaling exponential linear unit (SELU) and introduced the self-normalization attribute. This unit mainly uses a function G to map the mean and variance of the two layers of neural network to achieve normalization effect. It is worth noting that Sepp Hochreiter, the author of this paper, has proposed LSTM together with Jurgen Schmidhuber, and the previous ELU is also from their group. Going back to the paper itself, the NIPS paper is only 9 pages long, but it has 93 pages of supporting appendices like the one below:
I wonder how scholars who reviewed the paper felt at that time. However, it proposes a method that allows you to modify ELU activation slightly so that the mean unit activation tends to zero mean/unit variance (if the network is deep enough). If it turns out to be the right direction, the batch specification will become obsolete and model training will be much faster. At least in the experiments in the paper, it beats the accuracy of BN + ReLU.
Links to papers: arxiv.org/abs/1706.02…
Read more:
- Detonating machine learning circles: “Self-normalized neural network” proposes a new activation function SELU
GAN and its variants
In 2016, Yann LeCun called GAN as one of the most important breakthroughs in deep learning, and we also saw the emergence of GAN variants in 2016, such as energy-based GAN and least square network GAN. By early 2017, we were seeing a proliferation of GAN variants, including a paper called WGAN that sparked a lot of discussion in the industry shortly after that, with one calling it “amazing.”
Since Ian Goodfellow proposed GAN in 2014, it has been confronted with problems such as difficulty in training, inability of loss of generator and discriminator to indicate training process, and lack of diversity of generated samples. While subsequent variations have attempted to address these issues, they have been less effective. Wasserstein GAN managed to do the following:
- The problem of GAN training instability is completely solved, eliminating the need to carefully balance the training degree of generator and discriminator
- It has basically solved the problem of collapse mode and ensured the diversity of generated samples
- During the training process, there is finally a value such as cross entropy and accuracy to indicate the training process. The smaller the value is, the better the GAN training is, and the higher the image quality generated by the generator is (as shown in the picture).
- All of these benefits do not require a well-designed network architecture, but can be achieved by the simplest multilayer fully connected network
In addition to WGAN, there are many other GAN variants that have emerged in 2017, which we have listed in the form of resources:
- FAIR proposed an alternative to the common GAN training method: WGAN
- Least square GAN: it is more stable than conventional GAN and converges faster than WGAN
- A new Google paper proposes adaptive generation versus Network AdaGAN: An Enhanced Generation Model
- Academic | Yann LeCun latest paper: based on the energy generated against network (paper)
- Academic circles | Yoshua Bengio team sending three papers: three generated against network is put forward
- Resources | Google open-source TFGAN: lightweight generated against network tool library
At the end of the year, however, a Google Brain paper sounded the alarm on the current hot GAN research. In a paper titled “Are GANs Created Equal? In A Large-scale Study, the researchers conducted detailed tests on six current variants of Wasserstein GAN and others, and concluded that “no evidence was found that any algorithm was superior to the original algorithm” (see: None of the six improvements surpass the original algorithm: New Google research casts doubt on GAN status). Perhaps we should focus more on new architecture.
Deep neural networks bump into speech synthesis
In recent years, with the application of deep neural networks, the ability of computers to understand natural speech has been completely reformed, such as the application of deep neural networks in speech recognition and machine translation. However, the use of computer-generated speech (speech synthesis or text-to-speech (TTS)) is still largely based on so-called concatenative TTS. However, the naturalness and comfort of speech synthesis by this traditional method have great defects. Whether deep neural network can promote the progress of speech synthesis as it promotes the development of speech recognition has also become one of the research topics in the field of artificial intelligence.
In 2016, DeepMind proposed WaveNet, which attracted a lot of attention in the industry. WaveNet can directly generate the original audio waveform, can get excellent results in text to speech and general audio generation. But in terms of practical application, one of the problems is that it has a lot of calculation, so it can’t be directly used in the product. So there’s a lot of room for improvement.
In 2017, we saw deep learning speech synthesis methods move from the lab to the product. From the concerns of Machine Heart, we have briefly combed out the following studies:
- Google: Tacotron, WaveNet (for Google Assistant)
- Baidu: Deep Voice, Deep Voice 2 (NIPS 2017), Deep Voice 3 (ICLR 2018)
- Hybrid Unit Selection TTS System
Read more:
- Industry | baidu put forward Deep Voice: real-time neural speech synthesis system
- Put forward neural TTS technology industry | baidu Deep Voice 2: support multiple speaker text to speech
- Industry | baidu released Deep Voice 3: the convolution attention mechanism TTS system
- The industry | after one year, DeepMind WaveNet formal transition speech synthesis technology
- Academic | Google joint nvidia blockbuster paper: voice to text cross-language transcription
- Academic | Yoshua Bengio Char2Wav put forward: to implement end-to-end speech synthesis
Mass data parallel training ImageNet
Deep learning flourishes with the emergence of large neural networks and large data sets. However, large neural networks and large data sets often require longer training time, which impedes the research and development process. Distributed synchronous SGD provides a promising solution by distributing a small batch of SGD (SGD Minibatches) into a group of parallel workstations. For this solution to be efficient, however, the workload on each workstation must be large enough, which means a nontrivial growth in SGD batch size. In June, Facebook introduced a distributed synchronous SGD training method that increased batch size, sparking a competition to “train ImageNet fast.” With more institutions participating, UC Berkeley researchers had reduced resNet-50 training on ImageNet to 48 minutes as of November.
Read more:
- New research community | Facebook: large quantities of SGD accurate training ImageNet only 1 hour
- Following one hour of ImageNet training, the mass training expanded to 32,000 samples
- $1.2 million Machine 24-minute training ImageNet, UC Berkeley demonstrates new parallel processing methods
Revolutionizing deep learning: Geoffrey Hinton and Capsule
As we all know, the latest wave in the AI industry was triggered by deep learning and its development. But could this approach lead humans to general artificial intelligence? Geoffrey Hinton, as a leader of deep learning and one of the authors of the key mechanism of back propagation, took the lead in proposing to abandon back propagation and innovate deep learning. His innovative approach is Capsule.
Capsule is a new form of neural network proposed by deep learning pioneer Geoffrey Hinton and others to correct the mechanism of back propagation. In his paper on Dynamic Routing Between Capsules, Geoffrey Hinton introduced Capsule as follows: “Capsule is a group of neurons, the input and output vectors instantiate the parameters of a specific entity type (that is, certain objects, concepts, entities such as the probability of certain attributes). We use the probability of the length of the input and output vectors represent entities, the direction of the vector said instantiate the parameters (i.e. some graphic attribute of the entity). The same level of capsule predicts the instantiation parameters of the higher level of capsule through the transformation matrix. When multiple predictions are consistent (dynamic routing is used in this paper to make the predictions consistent), the higher level of capsule will become active.”
The activation of neurons in the Capsule represents various properties of the particular entities present in the image. These properties can include many different parameters, such as posture (position, size, direction), deformation, speed, reflectivity, color, texture, etc. The length of the input and output vectors represents the probability of an entity’s occurrence, so its value must be between 0 and 1.
Heart of the Machine also took a closer look at Hinton’s paper published in October, which highlighted the fact that the inputs and outputs of the Capsule layer are vectors, The vector construction process can be thought of as the PrimaryCaps layer using 8 standard Conv2D operations to generate a vector of 8 elements in length, so each Capsule unit is equivalent to a combination of 8 convolution units. In addition, Hinton et al. also use a dynamic routing mechanism in the Capsule layer. This approach to coupling coefficient updating does not require a back propagation mechanism.
In addition to the Capsule paper published by Hinton et al., there was another paper “MATRIX CAPSULES WITH EM ROUTING”, in which EM ROUTING was adopted to modify the dynamic ROUTING of the original paper to achieve better results.
Read more:
- Finally, Geoffrey Hinton’s much-talked-about Capsule paper is out
- Abandoned by Geoffrey Hinton, why is backpropagation doubted? (Attached BP derivation)
- A brief analysis of Hinton’s recent Capsule initiative
- View | Geoffrey Hinton: give up back propagation, our artificial intelligence needs to start again
Beyond neural networks? Vicarious proposes a new probabilistic generation model
A four-layer recursive cortical network representing the letter A
Despite being challenged by Yann LeCun and others, a paper on generating visual models by Vicarious, a well-known start-up, was published in Science. This new probabilistic generation model (also known as recursive cortical network) can achieve strong performance and high data efficiency in a variety of computer vision tasks, with recognition, segmentation and reasoning capabilities, and exceeds deep neural networks in difficult benchmark tasks such as scene text recognition. Researchers say this approach could lead us to general artificial intelligence.
The model shows excellent localization and occlusion-reasoning abilities, and is 300 times more efficient in using training data. In addition, the model also breaks through CAPTCHA, a text-based Automatic Turing test to distinguish computers from humans, which is a heuristic method to segment objects without a specific CAPTCHA.
“I think of CAPTCHA as a ‘total AI problem.’ If you solve this type of problem completely, you’ve got general AI.” Vicarious CTO George told Machine Heart that in order to fully recognize CAPTCHA, the model must be able to recognize any text. It’s not just captcha, but even if someone writes any type of font on a piece of paper (like art in powerpoint), the model needs to recognize it.
Recursive cortical networks will not only be used to break CAPTCHA, but also for control, reasoning, and robotics. For the past two years, Vicarious AI has been working in the lab on how to apply technology to industrial robots. Industrial robots are the way Vicarious AI has landed its technology, but that doesn’t mean it’s done. Vicarious AI hopes to have advanced intelligence A.I. around 2040.
Read more:
- Probabilistic Generation Models Beyond Neural Networks
- We talked to Vicarious AI’s CTO when everyone was talking about their Science paper.
From TPU to NPU: Neural network processors sweeping all devices
The most recent wave of AI began around 2011 with a surge in deep learning. From voice recognition to training virtual assistants to communicate naturally, from detecting lane lines to driving cars fully on their own, data scientists are scaling new heights in artificial intelligence as the technology advances. Solving these increasingly complex problems requires increasingly complex deep learning models. Behind it, the rapid development of GPU technology plays an important role, and the breakthrough of hardware computing capability is the reason behind this great development of deep learning.
In recent years, people have gradually recognized the importance of computing chips for ARTIFICIAL intelligence, and more and more chips are dedicated to AI task acceleration. 2017 has also become a year of continuous commercial application of deep learning computing chips. Whether it’s Google’s TPU behind AlphaGo or Nvidia’s Tesla V100 with its new Tensor Core architecture, dedicated chips for deep learning on the server side have gained mass adoption and become an essential part of the cloud infrastructure. In the mobile end, soCs for machine learning task acceleration have also been introduced to users with the launch of Apple A11 (Neural Engine) and Huawei Kirin 970 (NPU). Today, some neural networks can be crammed into smartphones to provide judgment power for a variety of apps, and the servers of tech giants are processing countless machine learning-related task requests with unprecedented efficiency, and the artificial intelligence ecosystem we envision is taking shape.
Read more:
- Blockbuster | Google released TPU research papers, how the neural network dedicated processor is tempered?
- Understanding the Tensor Core for Deep learning: What is it?
- The heart of the machine exclusive interpretation: Huawei’s first mobile terminal AI chip Kirin 970
conclusion
From the universalization of AlphaGo to Geoffrey Hinton’s Capsule, 2017 is a year in which the AI industry is not only putting technology on the ground, but also moving toward the ultimate challenge of universal AI. With the deepening of our exploration of deep learning, the advantages and disadvantages of new technologies have gradually shown in front of our eyes. In addition to creating product and service users, exploring new directions is an urgent task. What changes are in store for ai in 2018? Let’s look forward to it.
Read more:
Hearts of Machines: The top 10 highlights of AI 2016
This article is the heart of the machine original, reprint please contact this public number for authorization.