By Derrick Mwiti
Source: tc n/AieEBzZx
The depth portal
I’ve had the pleasure of writing a number of articles exploring the cutting edge of machine learning and deep learning research in 2019 (you can find many here [1]), and I wanted to take a moment to highlight the papers that interest me the most. I’ll also share links to their code implementations so you can try them out.
1. Contrastive Representation Distillation
In this paper, a series of contrast objectives are used in the field of model distillation to capture correlations and higher-order output dependencies. They have been modified in this paper to extract knowledge from one neural network to another.
Thesis: arxiv.org/abs/1910.10…
Code: github.com/HobbitLong/
RepDistiller****
Three distillation stages are considered in this paper:
- The model of compression
- Transfer knowledge from one approach (e.g. RGB) to another (e.g. depth)
- Reduce a group of networks to a single network
The main idea of contrastive learning is to learn the representation of positive example pairs as close as possible and the representation of negative example pairs as far as possible in a certain metric space.
2. Network Pruning via Transformable Architecture Search
This is a paper in the field of network pruning. It suggests applying neural architecture search directly to networks with flexible channel and layer sizes. Minimizing the loss of pruned networks helps to learn the number of channels.
Thesis: arxiv.org/abs/1905.09…
Code: github.com/D-X-Y/NAS-P…
The feature graph of the pruned network is composed of K feature image segments, which are sampled based on probability distribution. Losses propagate back to network weights and parameterized distributions.
The pruning method proposed in this paper is divided into three stages:
- Train large untrimmed networks using standard classification training procedures.
- Search for the depth and width of a small network through a Transformable Architecture search (TAS). TAS seeks optimal network size.
- Simple knowledge extraction (KD) method is used to transfer information from unpruned networks to small networks of search.
3. Learning Data Augmentation Strategies for Object Detection
Although this is not a model architecture per se, this article presents a way to create transformations that can be used in object detection datasets that can be transferred to other object detection datasets. Conversion is usually applied during training. Here is the code for training using the learned strategies:
Thesis: arxiv.org/abs/1906.11…
Code: github.com/tensorflow/…
tree/master/models/official/detection
In this model, the expansion strategy is defined as a group of N strategies randomly selected during training. Some of the operations that have been applied in this model include distorting color channels, geometrically distorting images, and distorting only the pixel content found in boundary box comments.
4. XLNet: Generalized Autoregressive Pretraining for Language Understanding
XLNet is an exciting paper in the Transformer space. XLNet is a general-purpose autoregressive pretraining method for learning bidirectional contexts by maximizing the expected likelihood on all factorization order permutations. It does not use a fixed order of forward or backward decomposition.
Thesis: arxiv.org/abs/1906.08…
Code: github.com/zihangdai/x…
Instead, it maximizes the expected logarithmic likelihood of the sequence for all possible permutations of the factorization order. The result of these permutations is that the context for each location can consist of two tags on the left and right. Bidirectional contexts are captured because each location has learned to leverage context information from all locations.
5. Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context (ACL 2019)
Transformer-XL (meaning extra long) can be used to learn dependencies beyond fixed length without destroying temporal coherence. It introduces segment-level recursion mechanism and position coding scheme. The TransformerXL learns 80% longer dependencies than RNN and 450% longer than plain Transformers. Both TensorFlow and PyTorch are available.
Thesis: arxiv.org/abs/1901.02…
Code: github.com/kimiyoung/
transformer-xl
The author introduces recursion into its deep self-attention network. Instead of calculating the hidden state from scratch for each new segment, they reused the hidden state obtained in the previous segment. The hidden state reused acts as memory for the loop segment.
This creates a circular link between the segments. Modeling long-term dependencies is possible because information is passed through circular connections. The authors also introduce a more efficient relative position encoding method that can focus attention to positions with longer attention lengths than observed during training.
6. Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos (AAAI 2019)
This paper deals with unsupervised learning tasks of scene depth and robot self-motion, in which supervision is provided by monocular video. This is done by introducing geometric structures into the learning process. It involves modeling scenes and individual objects, self-motion of the camera, and object motion learned from monocular video input. The author also introduces an online optimization method.
Thesis: arxiv.org/abs/1811.06…
Code: github.com/tensorflow/…
els/tree/master/research/struct2depth
We introduce an object motion model that shares the same architecture as self-motion networks. However, it is specifically used to predict the motion of individual objects in 3D.
It takes a sequence of RGB images as input. This is complemented by a pre-calculated instance segmentation mask. The job of the motion model is to learn to predict the transformation vector of every object in 3D space. This creates the appearance of the observed object in each of the target boxes.
7. Auto-Keras: An Efficient Neural Architecture Search System
This paper proposes a framework that enables Bayesian optimization for efficient NAS boot network morphology. Based on their approach, the authors built an open source AutoML system called Auto-Keras.
Thesis: arxiv.org/abs/1806.10…
Code: github.com/keras-team/
autokeras
The main part of this method is to explore the search space by deforming neural network structure under the guidance of Bayesian optimization (BO) algorithm. Since NAS Spaces are not Euclidean Spaces, we solve this problem by designing neural network kernel functions. The kernel is the editing distance used to morph one neural architecture into another.
8. Depth-Aware Video Frame Interpolation (CVPR 2019)
In this paper, a video frame interpolation method is proposed to detect occlusion by exploring depth information. The authors develop a depth-aware flow projection layer that can synthesize real-time streams that sample objects closer to rather than more distant objects.
Thesis: arxiv.org/abs/1904.00…
Code: github.com/baowenbo/DA…
Hierarchical feature learning is accomplished by collecting contextual information from adjacent pixels. Then output frames are generated by synthesizing input frames, depth maps and context features based on optical flow and local interpolation.
A depth-aware video frame interpolation (DAIN) model is proposed, which can efficiently generate high quality video frames using optical flow, local interpolation kernel, depth map and context features.
9. OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
OpenPose is an open source real-time system for multi-person 2D pose estimation, including body, foot, hand and face key points. This paper presents a real-time method for detecting 2D human posture in images and videos.
Thesis: arxiv.org/abs/1812.08…
Code: github.com/CMU-Percep
tual-Computing-Lab/openpose_train
The proposed approach uses a nonparametric representation called Part Affinity Fields (PAF). Some of the authors of this article are from IEEE. This method takes the image as the input of CNN and predicts the confidence graph for detecting body parts and the PAF for site association. This article also open-source an annotated foot data set with 15K instances of human feet.
10. FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation
In this paper, a joint upsampling module called joint pyramid upsampling (JPU) is proposed to replace dilated convolution which consumes a lot of time and memory. It works by formalizing the process of extracting high-resolution graphs as a joint upsampling problem.
Thesis: arxiv.org/abs/1903.11…
Code: github.com/wuhuikai/
FastFCN
This method uses the full connection network (FCN) as the backbone, and uses JPU to upsample the final feature image with low resolution, so as to obtain the high resolution feature image. Replacing bloated convolution with JPU does not result in performance loss.
conclusion
Hopefully this gave you some insight into the field of machine learning and deep learning research in 2019. I have tried to include links to the original paper and its code wherever possible. Try them out and let us know your progress.
Reference \
/ [1] heartbeat. Fritz. Ai research – gu…
Note: the menu of the official account includes an AI cheat sheet, which is very suitable for learning on the commute.
You are not alone in the battle. The path and materials suitable for beginners to enter artificial intelligence download machine learning online manual Deep learning online Manual note:4500+ user ID:92416895), please reply to knowledge PlanetCopy the code
Like articles, click Looking at the