Since CVPR 2018 announced the acceptance of papers, the heart of the machine for you to introduce a number of papers, and after the plan to release this one unexpectedly in CVPR 2018 the best paper (small make up clumsy ah), so recommended to you in advance.
The introduction
Target recognition, depth estimation, edge detection, attitude estimation, etc., are examples of common visual tasks considered useful and solved by the research community. There is an obvious correlation between some of these tasks: we know that surface normals and depths are related (one is the derivative of the other), or that vanishing points in space are helpful for positioning. Other tasks are less relevant: how keypoint detection and shadows in space work together to perform pose estimation.
In addition, models that incorporate correlations between tasks require less oversight, use fewer calculations, and operate in a more predictable manner. Incorporating such an architecture is the first stepping stone to developing a comprehensive/general perception model that can be proven to be effective [34, 4], that is, capable of solving a large number of tasks before the need for supervision or computation becomes intractable. However, the structure of this mission space and its implications remain largely unknown. These correlations are important, but finding them is complicated by the imperfection of our learning models and optimizers.
In this paper, researchers attempt to uncover this underlying structure and propose a framework for mapping visual task space. By “structure,” I mean a set of computationally discovered correlations that specify which tasks provide useful information to another task and how much information to provide (see Figure 1).
Therefore, a completely computational approach is adopted by using neural networks as computational functions. In a feedforward network, each layer successively generates a more abstract representation of the input, which contains the information needed to map from input to output. However, if the tasks are assumed to be interrelated in some form [83, 19, 58, 46], then these representations can transmit statistics that are useful for solving other outputs (tasks). The basis of this approach is whether a task-based solution can be read easily enough from a representation of the training of another task to compute the affinity matrix between tasks. Such migrations are fully sampled and a globally valid migration strategy is extracted from them by a binary integer programming paradigm. The results show that this model can solve tasks with less data than learning tasks independently, while the resulting structure is just as valid for common datasets (ImageNet [78] and Places [104]).
The fully computational and representation-based approach proposed in this paper avoids imposing a priori (possibly false) assumptions on the task space. This is crucial because priors about the correlation between tasks are usually derived from human intuition or analytical knowledge, while neural networks do not need to operate according to the same principles [63, 33, 40, 45, 102, 88]. For example, while we might expect depth to migrate better to surface normals (taking derivatives is easy), we find it better to migrate in reverse in a computational framework (that is, better for neural networks).
Taskonomy: Disentangling Task Transfer Learning
The paper address: http://taskonomy.stanford.edu/taskonomy_CVPR2018.pdf
Are visual tasks related? For example, can surface normals be used to simplify the process of estimating image depth? Intuitively positive answers to these questions suggest that there is a structure between various visual tasks. Understanding this structure is of great value; It is the concept behind transfer learning and provides a rational approach to identifying redundancy between tasks, for example, in order to seamlessly reuse supervision across related tasks or to solve multiple tasks in a system without increasing complexity.
We propose a fully computational approach to model the spatial structure of visual tasks by looking for (first-order or higher) transfer learning dependencies in 26 two-dimensional, 2.5-dimensional, tri-dimensional and semantic tasks located in a hidden space. The finished product is a computational classification diagram for task transfer learning. We examine the results of this structure, such as the emergence of non-trivial correlations, and use them to reduce the need for annotated data. For example, we showed that the total number of annotated data points required to solve a set of 10 tasks can be reduced by about two-thirds (compared to independent training) while maintaining nearly the same performance. We provide a set of tools for calculating and probing this classification structure, including a solver that users can use to design effective oversight policies for their use cases.