A review of small sample learning in computer vision

Preface:

Today, computers can outperform humans when it comes to using billions of images to solve specific tasks. In the real world, though, it is rare to build or find a data set that contains such a variety of books.

How can we overcome this problem? In the field of computer vision, we can use data enhancement (DA), or collect and tag additional data. DA is a powerful technology that can be an important part of the solution. Tagging additional samples is a time-consuming and expensive task, but it does provide better results.

If the data set is really small, neither technique may help. Imagine a task where we need to create a category with only one or two samples per class, and each sample is very difficult to find.

This will require innovative approaches. Few-shot Learning (FSL) is one of them.

In this article, we will introduce:

What is small sample learning – definition, purpose, and examples of FSL problems

Small sample Learning variants — N-shot Learning, few-shot Learning, one-shot Learning, zero-shot Learning.

Small sample Learning methods — Meta-learning, data-level, Parameter-level

Meta-learning algorithms – defined, metric learning, gradient based meta-learning

Small sample image classification algorithms — Model-independent meta-learning, matching, prototyping and relational networks

Small sample target detection — YOLOMAML

What is small sample learning?

Few-shot Learning(FSL) is a sub-area of machine Learning. In the case of only a few training samples with supervisory information, the training model realizes the classification of new data.

FSL is a fairly young field that needs more research and refinement. Computer vision models can work well with relatively few training samples. In this article, we will focus on FSL in computer vision.

For example: Let’s say we work in the healthcare industry and have problems classifying bone diseases from X-rays.

Some rare pathologies may lack sufficient images for the training set. This is exactly the type of problem that can be solved by building an FSL classifier.

Small samples learn the variants

According to different variations and extremes of FSL, it can be divided into four types:

N-Shot Learning (NSL)

Few-shot Learning (FSL)

One-Shot Learning (OSL)

Zero-Shot Learning (ZSL)

When we talk about FSL, we usually mean N-way-K-shot-classification.

N represents the number of categories and K represents the number of samples to be trained for each category.

N-shot learning is seen as a broader concept than all other concepts. This means that few-shot, one-shot and zero-shot Learning are subfields of NSL.

Zero-Shot Learning (ZSL)

The goal of zero-shot Learning is to classify invisible classes without any training samples.

This may seem a little awesome, but think of it this way: Can you classify objects without seeing them? If you have a general understanding of an object, its appearance, properties, and functions, you shouldn’t have a problem. This is the approach used when doing ZSL, and based on current trends, zero-sample learning will soon become more effective.

One – Shot and what – Shot

In one-shot Learning, there is only One sample for each class. The Rot-Shot has 2 to 5 samples per class, making it a more flexible version of OSL.

When we talk about the whole concept, we use the term few-shot Learning. But the field is young, so people use these terms in different ways.

Small sample learning methods

First, let’s define an N-way-K-shot-classification problem.

Assume a training set, including N categories of labels, each category K labeled images (a small number, each category less than 10 samples), Q test pictures.

We want to classify Q test images in N categories. The N * K samples in the training set are the only samples we have. The main problem here is that there is not enough training data.

The first step in the FSL mission is to learn from other similar problems. This is why small sample learning is described as a meta-learning problem.

In the traditional classification problem, we try to learn how to classify from the training data and use the test data for evaluation. In meta-learning, we learn how to learn a given set of training data for classification. We use one set of classification problems for an otherwise unrelated set.

When solving FSL problems, two approaches are usually considered:

Data-level approach (DLA)

Parameter-level approach (PLA)

Data level method

This method is really simple. It is based on the concept that if you don’t have enough data to build reliable models and avoid overfitting and underfitting, you simply add more data.

This is why many FSL problems are solved by using additional information from large underlying data sets. The key feature of the base data set is that it has no classes provided in the training set for the Rot-show task. For example, if you want to classify a particular bird, the underlying data set could contain images of many other birds.

We can also generate more data ourselves. To achieve this, we can use data enhancement and even the generation of adversative networks (GAN).

Parameter level method

From the perspective of parameter level, it is easy to overfit the samples of few-shot Learning, because they usually have a wide range of high-dimensional Spaces.

To overcome this problem, we should limit the parameter space and use regularization and appropriate loss functions. The model can be generalized to a limited number of training samples.

On the other hand, we can improve model performance by directing it to a wide range of parameter Spaces. If we use a standard optimization algorithm, it may not give reliable results due to the small amount of training data.

This is why models are trained at the parameter level to find the best route in the parameter space to provide the best prediction. As we have mentioned above, this technique is called meta-learning.

Meta-learning algorithm

In the classical paradigm, when we have a particular task, the algorithm is learning whether its task performance improves with experience. In the meta-learning paradigm, we have a set of tasks. The algorithm is learning to learn whether its performance on each task improves with experience and the number of tasks. The algorithm is called meta-learning algorithm.

Suppose we have a TEST task TEST. We will TRAIN our meta-learning algorithm on a batch of training task trains. The training experience gained from trying to solve the TRAIN task will be used to solve the TEST task.

Solving an FSL task has a series of steps. Imagine the classification problem we mentioned earlier. First, we need to select an underlying data set. Selecting the underlying data set is critical.

Now we have the N-way-K-shot classification problem (let’s name it TEST) and a large base data set that we will use as a meta-learning training set (TRAIN).

The entire meta-training process will be completed in a limited number of episodes.

From the TRAIN, we sample N categories, K support set images and Q test images of each category. In this way, we have a classification task similar to our final TEST task.

At the end of each episode, the parameters of the model are trained to maximize the accuracy of the Q image in the test set. This is where our model learns the ability to solve invisible classification problems.

The overall efficiency of the model is measured by its accuracy on the TEST classification task.

In recent years, researchers have published a number of meta-learning algorithms for solving FSL classification problems. All of these can be divided into two categories: metric-learning and gradient-based meta-learning algorithms.

Measure learning

When we talk about measurement learning, we are usually referring to techniques for learning distance functions on targets.

In general, metric learning algorithms learn comparative data samples. In the case of small sample classification problem, they classify test samples according to the similarity between test samples and training samples. If we process images, we basically train a convolutional neural network to output image embedding vectors and then compare them to other embedding vectors to predict categories.

Gradient based meta-learning

For the gradient-based approach, a meta-learner and a base learner need to be built.

A meta-learner is a model that learns across sets, while a basic learner is a model that is initialized and trained by a meta-learner within each set.

Imagine a segment of meta-training that contains some classification tasks defined by N * K image training sets and Q test sets:

1. Select a meta-learner model,

The episode began.

3. Initialize the base learner (usually CNN classifier),

4. Train it on a training set (the exact algorithm used to train the base learner is defined by the meta-learner),

5.Base-learner predicts classes on the test set,

6. The meta-learner parameters are trained according to the loss caused by the misclassification.

7. From this point of view, the pipeline may vary depending on the chosen meta-learner.

A classification algorithm for Few-Shot images

In this section, we will cover:

Model-agnostic meta-learning (MAML)

Matching Networks

Prototypical Networks

Relation Networks

Model agnostic meta-learning

MAML is based on the concept of gradient meta-learning (GBML). GBML is about the meta-learner gaining prior experience from the training base model and learning the common feature representation of all tasks. Whenever there is a new task to learn, the meta-learner will use the small amount of new training data brought by the new task to make a little tweak to the meta-learner with its previous experience.

However, we don’t want to start with random parameter initialization. If we do this, our algorithm will not converge to good performance after a few updates.

MAML aims to solve this problem.

MAML provides good initialization of the meta-learner parameters to achieve optimal fast learning on new tasks using only a small number of gradient steps, while avoiding the over-fitting that can occur when using small data sets.

Here’s how it works:

1. The meta-learner creates its own copy at the beginning of each episode(C),

2. C. With the help of base-model, as we discussed earlier.

3.C. Predict the test set

4. Losses calculated based on these projections are used to update C,

5. This continues until all episodes have been trained.

The great advantage of this technique is that it is considered independent of the choice of the meta-learner algorithm. Therefore, MAML method is widely used in many machine learning algorithms that require rapid adaptation, especially deep neural networks.

Matching network

Matching Network (MN) is the first metric learning algorithm designed to solve FSL problems.

For matching network algorithms, large basic data sets are needed to solve small sample learning tasks. As shown above, this data set is divided into several episodes. Then, for each episode, the matching network applies the following steps:

1. Each image from the training set and test set is sent to CNN, output Embeddings,

2. Each test image was classified using SoftMax of the cosine distance from its Embeddings to the training set Embeddings,

3. The cross entropy loss of result classification propagates back through CNN.

In this way, matching network learning computes image embedding. This approach allows MN to classify images without prior knowledge of a particular class. Everything is done by comparing different instances of the class.

Because the categories in each set are different, the matching network computes the image features associated with distinguishing the categories. In contrast, in the case of standard classification, the algorithm learns characteristics specific to each class.

It is worth mentioning that the author actually proposes some improvements to the original algorithm. For example, they enhanced their algorithm with bidirectional LSTM. The embedding of each image depends on the embedding of the other images.

All of the improvements can be found in their original article. However, improving the performance of the algorithm may make the computation longer.

Prototype network

A prototype network (PN) is similar to a matching network. Still, there are some subtle differences that help improve the algorithm’s performance. PN actually gets better results than MN.

The PN procedure is essentially the same, but the embeddings of the test images are not compared to each image embeddings in the training set. Instead, prototype networks offer an alternative approach.

In PN, you need to form a class prototype. They are basically class embeddings formed by averaging the embeddings from images of this class. The test images are then embedded to compare only those class prototypes.

It is worth mentioning that in the case of the one-shot Learning problem, the algorithm is similar to Matching Networks.

In addition, PN uses Euclidean distance instead of cosine distance. It is seen as a major part of algorithmic improvement.

network

All the experimentation to build the matching and prototype network actually led to the creation of the relationship network (RN). RN builds on the concept of PN, but with major changes to the algorithm.

The distance function is not defined in advance, but is learned by the algorithm. RN has its own relationship module to perform this operation.

The overall structure is as follows. The relationship module is placed on top of the embed module, which calculates the embed and class prototype from the input image. The relational module is input to query the embedded concatenation of the image with each class prototype and output the relational score for each pair. Apply Softmax to relational scores and get a prediction.

Few-Show Target detection

Obviously, we may encounter FSL problems in all computer vision tasks.

An N-WAY-K-Shot target detection task includes a training set: N class tags, for each class, K marked images containing at least one object belonging to the class, and Q test images.

Note that there is a key difference from the rot-shot image classification problem, because the target detection task exists where an image contains multiple objects belonging to one or more of N categories. Therefore, it may face the problem of class imbalance, because the algorithm trains at least K sample targets of each class.

YOLOMAML

The field of few-shot target detection is growing rapidly, but there are Few effective solutions. The most stable solution to this problem is the YOLOMAML algorithm.

YOLOMAML has two mixed parts: the YOLOv3 object detection architecture and the MAML algorithm.

As mentioned earlier, MAML can be applied to multiple deep neural networks, which is why it is easy for developers to combine the two parts.

YOLOMAML is a direct application of MAML algorithm on YOLO detector. For more information, check out the official Github repository.

Github.com/ebennequin/…

conclusion

In this article, we have found out what few-shot Learning is, what FSL variants and problem solving methods are available, and which algorithms can be used to solve FSL tasks such as image classification and target detection.

Rot-shot Learning is a rapidly growing and promising field, but it is still very challenging and unstudied, with much work, research and development to be done.

Link to original article:

Neptune. Ai/blog/unders…

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

A review of small sample learning in computer vision

What is small sample learning?