This year’s ICML (International Conference on Machine Learning) is coming!
On July 19, Beijing time, ICML 2021 EXPO hosted by Baidu will also be officially held, which is the only EXPO applied for by domestic enterprises.
In this Expo, researchers from Baidu will introduce the latest progress and industrial practice accumulation of Baidu based on flying OARS in computer vision, natural language processing, speech, quantum computing and other technical fields, including ten keynote speeches.
Baidu makes its debut at ICML to show off its multi-faceted AI technology
ICML (International Conference on Machine Learning), as the annual top international conference on machine learning sponsored by the International Society for Machine Learning (IMLS), is an important stage for the international machine learning field to discuss the cutting-edge scientific and technological achievements and practical application of technology.
The EXPO held by Baidu will comprehensively demonstrate the powerful technical advantages and profound industrial practice accumulation of flying OARS in the field of deep learning from multiple perspectives.
PaddlePaddle, as China’s first industry-level deep learning platform with rich functions and open source, has attracted 3.2 million developers and 120,000 service enterprises and institutions, covering multiple fields such as industry, energy, finance, medical care, agriculture, urban management, etc.
Not long ago, the flying OARS open source framework has been officially upgraded to version 2.1, which has been optimized for automatic mixing accuracy, dynamic graphics, high-level API, etc. In terms of model suites, Ernie has released four new open source pre-training models. Deployments and hardware ecology have also continued to expand.
Ten keynote speeches dry goods full look forward to collision technology spark
The development of technology cannot be separated from the researchers’ dedicated study, but also need to communicate with each other, collision and inspiration spark. The ICML 2021 Expo held by Baidu contains ten keynote speeches. We are looking forward to exchanging and learning with top AI talents around the world, sharing and discussing the latest technological achievements and application experiences of Baidu’s flying OARS.
The following is the introduction of the keynote speech:
PaddleCV: Rich and Practical CV Models from Industrial Practice
Topic 1:
PADDLECV: A rich and useful CV model for industrial practice
In order to meet the needs of low-cost development and rapid integration, FeiBar focuses on building a large-scale official model library, including the mainstream models that have been honed for a long time through industrial practice and the models that have won the championship in international competitions. Senior algorithm engineers from Baidu will share the technology of PADDLETV, a visual model library of flying paddles.
PADDLEV is a visual model library that focuses on the development of flying OARS. It provides developers with a variety of end-to-end development kits and massive visual direction models for visual scenes such as image classification, object detection, image segmentation, text recognition, image generation, etc. PaddleOCR and PaddleDetection development kits are widely used by many enterprises. Feoar development kits are tailored around the actual research and development process of enterprises, serving enterprises throughout the energy, finance, industry, agriculture and many other fields.
GP-NAS: Gaussian Process based Neural Architecture Search
Topic 2:
GP-NAS: Automatic model structure search technology based on Gaussian processes
Through automatic model structure Search of deep Neural network, NAS (Neural Architecture Search) has surpassed the performance of manually designed model structures in various computer vision tasks. GP-NAS aims to address three important issues in NAS: how to measure the correlation between model structure and its performance? How to evaluate the correlation between different model structures? How do you learn these correlations with a small number of samples?
To this end, GP-NAS first modeled these correlations from a Bayesian perspective. By introducing a new method of NAS based on Gaussian process, the correlation is modeled by customizing kernel function and mean function. Moreover, both the mean function and kernel function can be learned online to achieve adaptive modeling of complex correlations in different search Spaces.
In addition, by combining the sampling method based on mutual information, the mean function and kernel function of GP-NAS can be estimated/learned with the minimum number of sampling times. After learning the mean function and kernel function, GP-NAS can predict the performance of any model structure under different scenarios and different platforms, and theoretically obtain the confidence of these performance.
GP-NAS has not only obtained the experimental results of SOTA in CIFAR10 and ImageNet classification tasks, but also achieved very good results in face recognition tasks. We will also discuss the design of the search space and the consistency of the hypernetwork, and introduce GP-NAS’s experience of winning many international competitions.
Multimodal-based 3D Object Detection
Three themes:
Three-dimensional target detection based on multi-mode
Accurate estimation of the three dimensional position of the surrounding objects is very important for the autonomous driving system. In order to ensure the safety of autonomous driving systems, driverless cars usually use a variety of sensors (such as cameras, LIDAR, etc.) to sense the environment around them. In the presentation, Baidu introduced 3D object detection algorithms based on different sensors.
Firstly, two different 3D object position estimation algorithms based on single frame image are introduced using CAD model and CAD model FREE. The camera has low cost and can provide detailed texture and color information. Experiments show that the detection effect is obvious for objects close to the distance.
Compared with the camera-based estimation algorithm, the performance of the 3D object detection algorithm based on LIDAR is significantly improved. How to improve the detection effect of rare categories is always an open research question. Then, a simple and effective 3D object enhancement strategy based on Rendering is introduced, which can effectively consider the occlusion relationship between different foreground objects and between foreground and background objects. The test results on public data sets show that this algorithm has improved the detection effect for all categories, especially for the detection results of rare categories.
Finally, a simple and effective multi-model fusion framework based on 2D/3D scene segmentation is introduced, which can exploit the advantages of image and point cloud simultaneously, and effectively improve the detection results of 3D objects. Previously, an upgraded version of the algorithm won the championship at the ICRA2021 Nuscenes 3D Object Detection Open. At present, the PaddlePadlle framework supports general 3D point cloud understanding, including 3D object detection and segmentation based on point cloud, as well as 3D object position estimation based on single frame image, etc. In the future, more 3D depth models based on point cloud will be open source.
PaddleSeg: A High-Efficient Development Toolkit for Image Segmentation
Topic 4:
Efficient image segmentation tool – Paddleseg
Semantic segmentation is a very important and challenging visual task, which has important application value in the fields of human-computer interaction, augmented reality and unmanned driving. This presentation will introduce Paddleseg, a semantic segmentation algorithm platform based on flying paddles, which provides the implementation of many classic semantic segmentation algorithms (FCN, DEEPLAB, PSPNET, etc.). The presentation will also introduce some new semantic segmentation algorithms recently developed by Baidu based on Paddleseg.
Interpretable Deep Learning: Interpretation, Interpretability, Trustworthiness, and Beyond
Topic 5:
Interpretable deep learning — explanatory, interpretable, trustworthy, and transcendent
Deep learning models have reached or even surpassed human levels in many fields, such as computer vision, natural language processing, biology, and medicine. However, deep learning models have always been used as black boxes, and the decision-making process and judgment criteria have always been difficult to understand.
Based on the current mainstream interpretability algorithms, this presentation will systematically introduce the interpretability of deep learning models, including the importance of interpretability, the classification of interpretability algorithms, and how to evaluate the reliability of these algorithms. Baidu open source interpretability algorithm code library based on flying oar interpretability, integrating more than ten kinds of interpretability algorithms, which also includes the latest research work on interpretability of Baidu, and will introduce two of the work in detail. Baidu open source library interpretDL fully decouple algorithms and models, and provides detailed tutorials, convenient to use, while meeting the needs of academia and industry.
Paddle Graph Learning and Its Applications
Topic 6:
Fly paddle diagram neural network framework PGL and its application
In this presentation, Baidu will introduce the efficient and easy to use large-scale graph neural network framework PGL. The biggest feature of graph neural network is that it can model the connection information between samples, but the relationship between coding samples is generally more complex in the original deep learning framework. PGL adopts message passing paradigm as the programming interface of graph-neural network, which makes the programming of graph-neural network very convenient. In addition, a large number of performance optimizations have been done for graph-neural network scenarios, including the proposed parallel message aggregation, multi-card parallel Fullbatch training and other technologies, which have greatly improved the industrial practicability of graph-neural network.
This presentation will show the progress made by PGL in graph-neural network research, and share specific cases on how to implement industrial-scale graph-neural network applications through the integration of the trillion-scale graph-engine and parameter servers.
Unified Modal Learning: Motivation, Practice and Beyond
Topic 7:
Unified Pattern Learning — Motivation, Practice and Transcendence
Existing pre-training techniques focus on solving single-mode tasks or multimode tasks separately, ignoring the benefits and challenges of using a unified pre-training model to solve single-mode and multimode problems simultaneously. Humans, on the other hand, are very good at associating learning from heterogeneous data from multiple sources to better understand concepts related to the physical world.
Based on this, Baidu proposed Unified Modal Learning, which aims at joint learning from large-scale image, text, and textual peer data, and has the ability to solve single-modal tasks and multi-modal tasks simultaneously. Based on flying OARS, Baidu proposed a unified modal learning framework, UNIMO, and has made leading achievements in a number of natural language processing and visual-language multimodal tasks. Baidu hopes that unified modal learning will provide a possible path to universal artificial intelligence, which can be built with the help of the community.
FedCube: Federated Learning and Data Federation for Collaborative Data Processing
Eight themes:
FedCube — Federated learning and data federation for collaborative data processing
In recent years, data and computing resources are often distributed among users’ terminals, devices in various regions or organizations. Due to legal or regulatory limitations, distributed data and computing resources cannot be directly aggregated or shared between different locales or organizations for data processing or machine learning tasks. Federated Learning and Data Federation effectively utilizes distributed data and computing resources, trains machine learning models and collaboratively processes data while complying with laws and regulations and ensuring data security and data privacy.
In this presentation, Baidu presented the functional architecture of the federated learning system including PADDLEPL and introduced the research work based on the federated learning system of Baidu.
Generalizing from a Few Examples by PaddleFSL
Topic 9:
PaddLefsl is a small sample learning tool library based on flying paddles
The field of artificial intelligence (AI) is booming, but existing technologies often require massive amounts of data and high-powered high-performance computing devices. In contrast, human beings can quickly learn rules from a few examples with the knowledge they have learned, which makes the current artificial intelligence still far from being “human-like”. Small sample learning (FSL) studies how to generalize quickly to new tasks containing only a few annotated data are an important step in closing the gap between artificial intelligence and human learning.
In addition, FSL makes it possible to learn from rare situations, such as the discovery of drugs in which some marker molecules are given to predict new molecular properties. Given the high cost of obtaining high-quality labeled data, the application of FSL can also help reduce the cost of collection, labeling, processing, and computation of large-scale monitoring data for industrial applications.
In this presentation, Baidu will introduce the FSL toolkit PaddleSL based on flying paddles. It includes many easy-to-use FSL methods, supporting common applications such as image classification and relationship extraction, and is easy to extend to new applications. Baidu hopes that the PaddleSSL will make it easier for researchers and developers in academia and industry to explore FSL in a variety of scenarios.
Paddle Quantum: Towards Quantum Artificial Intelligence
Subject ten:
Measuring Oars — Towards Quantum Artificial Intelligence
Artificial intelligence is an important driver of a new round of industrial change, and quantum computing is a highly anticipated cutting-edge technology. The fusion of the two gives birth to a new direction: quantum artificial intelligence, and the speech brings Baidu’s latest progress in this direction. Based on the deep learning platform Feibaer, Baidu has developed the country’s first quantum machine learning tool set “Lianbaer”, aiming to accelerate the innovation of the fusion of artificial intelligence and quantum computing.
Recently, QoS has released version 2.1, which improves the operating efficiency by an average of 20%. Through quantum neural network, quantum kernel method, quantum circuit containing noise and other modules, developers can easily carry out application research and development in artificial intelligence, combinatorial optimization and quantum chemistry on QoS. With the help of deep learning to empower quantum technology, Loccnet discovered a new entanglement purification scheme, which achieved better results than existing schemes in the industry. In addition, QML.baidu.com provides rich tutorials and cases to help developers get started and develop. Baidu Quantum Platform is based on pulse measurement, oar measurement and volumeter measurement. It aims to combine users and quantum services closely, enable education, scientific research, industrial production and other fields, build an open and sustainable Baidu Quantum Ecology, and finally realize the beautiful vision of “everyone can use quantum”.