Small sample learning and meta-learning basics

Ai ultimately relies on learning from big data. It’s hard to generalize a model quickly with very little data. Instead, humans can quickly apply what they’ve learned in the past to learn new things. One important direction is to narrow the gap between AI and humans. Learning with limited data.

Few -shot learning

Deep learning is the method of data Hunger, which requires a large amount of data, labeled or unlabeled. Small sample learning research is how to learn from a small sample. For classification problems, there are only one or a few samples per class. Small sample learning can be divided into zero-shot learning (that is, to identify the category samples that did not appear in the training set) and one-shot learning/few shot learning (that is, in the training set, each category has one or several samples).

Humans are capable of learning from small samples. Take zero-shot learning as an example, if there is a Chinese word “give up”, you are asked to choose the corresponding English word from the five words “I”, “your”, “she”, “them” and “abnegation”. Although you do not know the English word “give up”, you will compare “give up” with each word. And in your previous study, you already know the Chinese meaning of “I”, “your”, “she” and “them” are not “give up”, so you will choose abnegation.

(2) Query Set: a sample used for comparison with the training sample. Generally speaking, a Query set is a sample. (3) In the support set, if there are n categories and each category has k samples, So the training process is called n-way K-shot, which is 5-way 1-shot.

Meta Learning

Meta-learning is to “learn how to learn”, that is, to use previous knowledge and experience to guide the learning of new tasks and to have the ability to learn. Meta-learning is regarded as the basis for the realization of Artificial General Intelligence (AGI) and will certainly make Artificial Intelligence get out of the predicament of deep learning. At present, meta-learning methods are mainly divided into measurement method, model method, optimization method and data enhancement based method.

The core idea of metric-based meta-learning is similar to the nearest neighbor algorithm (K-NN classification, K-means clustering) and kernel density estimation. The probability predicted by this method on the set of known tags is the weighted sum of sample tags in the Support set. The weight is calculated from the kernel function and represents the similarity between two data samples. Convolutional Siamese Neural Network proposes a one-shot image classification method using twin networks; After feature extraction of support set, Cosine is used to measure in embedding space, and the Matching degree of test samples is calculated to realize classification. The Relation Module structure proposed by the Network replaced the Cosine and Euclidean distance measures in Matching Network and Prototypical Network, making it a learning nonlinear classifier for judging relationships. Realization classification; Prototypical Network uses the clustering idea to project the support set to a metric space, obtain the vector mean value on the basis of Euclidean distance metric, calculate the distance between test samples and each prototype, and achieve classification. However, although the training model does not need to be adjusted for the test task, it does not work well when the test is far away from the training task. In addition, when the task becomes larger, pair-by-pair comparison will lead to expensive computation costs.

Meta learning is to learn how to learn, and fee-shot learning is the goal to achieve (we can train the results we want with a little training materials). Why do you think Meta Learning is like fee-shot Learning? Is hoping to achieve fee-shot learning, want a learning algorithm only see a little information can learn. The Algorithm of few-shot learning is often obtained by meta learning.

K-way is how many categories are in the support set; The n-shot is how many examples there are for each category.

In the case of fee-shot classification, the accuracy of prediction will be affected by the number of categories and samples in the support set. As the number of classification categories increases, the accuracy will decrease. The predicted value increases as the number of shot increases.

Learn a function to determine similarity.

other

Omniglot data set

Office website

Tensorflow

Common data sets: Omniglot is the most common data set in Meta Learning. This data set is not big, only a few megabytes, and is suitable for academic use. Omniglot is somewhat similar to MNIST. MNIST has 10 classes with 6000 samples per le; Omniglot has a large number of categories and a small sample of each class, with more than 1600 classes and only 20 samples per class.

Another data set is mini-Imagenet. A total of 100 categories, each category has 600 samples, samples are 84*84 small pictures.

https://deepai.org/dataset/imagenet
Copy the code

(1) Siamese Neural Network (2) Matching Network (3) Prototypical Network (4) Relation Network

Introduce two kinds of training methods. The first is to take two samples at a time and compare their similarity, which requires a large data set, each class is labeled, and there are many samples under each class. The training set is used to construct positive samples and negative samples. The positive samples tell the neural network what things are the same, and the negative samples can tell the neural network the difference between them. Positive samples are selected by taking one image at a time, and then another image randomly from the same category, with the label set to 1. The negative sample is a random sample of one image, and then you exclude that category, and then you randomly select one image from the other categories, and the two images are different categories, and the label is set to 0.

The output of the neural network is expected to be close to the label, and the difference between the label and the prediction is denoted as the loss function, which can be the cross entropy. With the loss function, the gradient can be calculated using back propagation, and then the gradient descent can be used to update the model parameters.

Siamese Network calculates the similarity between two pairs. Another method is triplet Loss. Three images are required for a round of training each time. Firstly, one image is randomly selected from the training set as the anchor point, and this anchor point is recorded. Then, one image is randomly selected from this category as the positive sample, and then this category is excluded for random sampling to obtain a negative sample.

Prototypical Networks is based on the idea that for each category, there exists a point in the embedding space, called the prototype of the class, around which the embedding space representation of each sample will be clustered. To do this, the input is mapped to the embedded space using a nonlinear mapping of the neural network, and the average value of the support set in the embedded space is used as the prototype of the class. When predicting categorization, you just need to compare the prototype of which class supports the set category more closely.

The core idea of metric-based meta-learning is similar to the nearest neighbor algorithm (K-NN classification, K-means clustering) and kernel density estimation. The probability predicted by this method on the set of known tags is the weighted sum of sample tags in the Support set. The weight is calculated from the kernel function and represents the similarity between two data samples. Therefore, learning a good kernel function is very important for metric-based meta-learning model. Metric element learning is a method proposed to solve this problem. Its goal is to learn a metric or distance function between different samples. The definition of a good measure varies from task to task, but it must represent connections between inputs in the task space and help us solve problems.

Siamese Neural Network twin Network one-shot image classification method; After feature extraction of support set, Cosine is used to measure in embedding space, and the Matching degree of test samples is calculated to realize classification. The Relation Module structure proposed by the Relation Network replaced the Cosine and Euclidean distance measures in MatchingNet and Prototypical Net, making it a learning nonlinear classifier for judging relationships and realizing classification. Prototypical Network uses the clustering idea to project the support set to a metric space, obtain the vector mean value on the basis of Euclidean distance metric, calculate the distance between test samples and each prototype, and achieve classification.

However, although the training model does not need to be adjusted for the test task, it does not work well when the test is far away from the training task. In addition, when the task becomes larger, pair-by-pair comparison will lead to expensive computation costs. Metrics-based meta-learning tends to extract the features contained in task samples to the maximum extent and use feature comparison to determine the types of samples. Therefore, how to extract the features that best represent the characteristics of samples has become the focus of this direction. Meta-learned Representations are different from those of ordinary learning, and are more conducive to small-sample learning.

Using the feature representation of meta-learning can improve the effect of small sample learning. The author classifies it into two different mechanisms: (1) fixing the parameters of feature extraction module and only updating (fine-tuning) the parameters of the last Classification Layer. Under this mechanism, the classification data points will be more clustered in the feature space, so the classification boundary will be less sensitive to the samples provided during fine-tuning. (2) Find the best advantage in the model parameter space as the basic model, which is close to the best advantage of most task-specific model parameters. Then, in the face of a new specific Task, the basic model can be updated to a specific model suitable for the new Task through several steps of gradient calculation. I will continue to update articles on small sample learning.

Eat at the same time every day without skimping, commit to constantly finding more human delicacies, and…… Then I got fat.

We also want to eat on time yo ♥️

I wish you a happy National Day wow! 😁