The original link: mp.weixin.qq.com/s/z9QbjeoLo…
Several classical algorithms commonly used in machine learning have been briefly introduced in three articles before, including of course the current very popular CNNs algorithm:
- Summary and Comparison of Common Machine Learning Algorithms (PART 1)
- Summary and Comparison of Commonly used Machine Learning Algorithms (Middle)
- Summary and Comparison of Commonly used Machine Learning Algorithms
These algorithms have their own advantages and disadvantages and applicable fields, so it is necessary to get familiar with them, but how to apply them still needs specific analysis. The common application directions of machine learning include the following:
- Computer Vision (CV)
- Natural Language Processing (NLP)
- Speech recognition
- Recommendation system
- advertising
, etc.
For more details, please refer to a previously recommended website:
paperswithcode.com/sota
The site is divided in great detail into 16 general directions, including a total of 1081 sub-directions. If you want to enter the field of machine learning, you should first choose a direction field, and then understand and get familiar with the algorithms and specific solving skills required in the direction field.
Of course, this article mainly introduces the application of computer vision, computer vision is one of the most popular and developed among the 16 directions.
Computer vision can be divided into the following general directions:
- Image classification
- Target detection
- Image segmentation
- Style migration
- Image reconstruction
- super-resolution
- Image generation
- face
- other
Although what is said here are images, but in fact video also belongs to the research object of computer vision, so there are video classification, detection, generation, and tracking, but the length of the relationship, as well as the current research direction is also focused on images, I will not introduce the content of video application for the moment.
Each direction provides a brief introduction to the problem that needs to be addressed in that direction, as well as recommendations for Github projects, papers, or review articles.
1. Image Classification
Image classification, also known as image recognition, as the name implies, is to identify what the image is, or what category the objects in the image belong to.
Image classification can be divided into many seed directions according to different classification criteria.
For example, according to the category label, it can be divided into:
- Dichotomous problems, such as determining whether an image contains a face;
- Multi-classification problems, such as bird identification;
- Multi-label classification: Each category contains labels with multiple attributes. For example, for clothing classification, clothing color, texture, sleeve length and other labels can be added to output not only a single category, but also multiple attributes.
According to the classification object, it can be divided into:
- General categories, such as birds, cars, cats, dogs, etc.;
- Fine-grained classification. Currently, image classification is a popular field, such as birds, flowers, cats and dogs, etc. Some of their finer categories are very similar, while the same category may be difficult to distinguish due to occlusion, Angle, illumination and other reasons.
According to the number of categories, it can also be divided into:
- Fee-shot learning: i.e. small sample learning, training sets with a small number of each category, including
one-shot
和zero-shot
; - Large-scale learning: large-scale sample learning is now the mainstream classification method, which is also due to the requirements of deep learning on data sets.
Recommended Github projects are:
- Awesome Image Classification
- awesome-few-shot-learning
- awesome-zero-shot-learning
Paper:
- ImageNet Classification With Deep Convolutional Neural Networks, 2012
- Very Deep Convolutional Networks for Large-Scale Image Recognition, 2014.
- Going Deeper with Convolutions, 2015.
- Deep Residual Learning for Image Recognition, 2015.
- Inceptionv4 && Inception-ResNetv2, 2016
- RexNext, 2016
- NasNet, 2017
- ShuffleNetV2, 2018
- SKNet, 2019
Article:
- Introduction | from VGG to NASNet, network overview image classification
- CNN Network Architecture Evolution: From LeNet to DenseNet
- Wei Xiu-can, Megvii Nanjing Research Institute: Review of fine-grained image analysis
- Small sample learning annual progress | VALSE2018
Commonly used image classification data sets:
- Mnist: Handwritten digital data set, containing 60,000 training sets and 10,000 test sets.
- Cifar: divided into Cifar10 and Cifar100. The former contains 60,000 images in a total of 10 categories with 6,000 images per category. The latter is 100 categories with 600 images per category. Categories include animals such as cats, dogs and birds, airplanes, cars and boats.
- Imagenet: Probably the largest open source image dataset, with 15 million images and 22,000 categories.
2. Object Detection
Target detection usually involves two aspects: first, finding the target, and then identifying the target.
Target detection can be divided into single object detection and multi-object detection, that is, the number of targets in the image, as shown in the following example:
The above two examples are images from the VOC 2012 dataset, but there are actually more complex scenarios, such as the MS COCO dataset image examples:
In fact, there are many methods in the field of target detection, and their development history is as follows:
From the figure above you can see that there are several method families:
- R-cnn series, from R-CNN to Fast R-CNN, Faster R-CNN, Mask R-CNN;
- YOLO series, from V1 to V3 in 2018
Making projects:
- awesome-object-detection
- Github.com/facebookres…
- Github.com/jwyang/fast…
Paper:
- R – CNN, 2013
- Fast R – CNN, 2015
- Faster – R – CNN, 2015
- Mask R – CNN, 2017
- YOLO, 2015
- YOLOv2, 2016
- YOLOv3, 2018
- SSD, 2015
- FPN, 2016
Article:
- Target detection: R-CNN, Fast R-CNN, Faster R-CNN, YOLO, SSD
- Tutorial | single stage type description of target detection method: YOLO and SSD
- From RCNN to SSD, this should be the most comprehensive inventory of target detection algorithms
- From R-CNN to RFBNet, the target detection architecture evolved over a 5-year period
Common data sets:
-
VOC 2012
-
MS COCO
3. Object Segmentation
Image segmentation is based on image detection, it needs to detect the object, and then the object is segmented.
Image segmentation can be divided into three types:
- Ordinary segmentation: separate the pixel regions belonging to different objects, such as the segmentation of the foreground region and the background region;
- Semantic segmentation: on the basis of ordinary segmentation, classification at the pixel level, pixels belonging to the same category should be classified into one category, for example, objects of different categories are divided;
- Instance segmentation: on the basis of semantic segmentation, each instance object is segmented, for example, several dogs in the picture are segmented, and they are identified as different individuals, not just which category they belong to.
An example of graphic segmentation is shown below, an example of instance segmentation is shown below, with different colors for different instances.
Making:
- awesome-semantic-segmentation
Paper:
-
U – Net, 2015
-
DeepLab, 2016
-
FCN, 2016
Article:
-
Depth | convolution neural network used for image segmentation: from R to Mark R (CNN) — CNN
-
Review —- review of image segmentation
-
A review of image semantic segmentation
4. Style Transfer
Style transfer refers to applying the styles of one domain or several images to other domains or images. For example, abstract styles are applied to realistic images.
An example of style transfer is as follows. Figure A is the original one, and b-F is the result of different styles.
The general data set uses commonly used data sets plus some famous art paintings, such as Van Gogh and Picasso.
Making:
- A simple, concise tensorflow implementation of style transfer (neural style)
- TensorFlow (Python API) implementation of Neural Style
- TensorFlow CNN for fast style transfer
Paper:
- A Neural Algorithm of Artistic Style, 2015
- Image Style Transfer Using Convolutional Neural Networks, 2016
- Deep Photo Style Transfer, 2017
Article:
- A brief history of Neural Style
- The migration review Style Transfer | Style
- (Perceptual Losses)
- Image Style Transfer
- Style Transfer
5. Image Reconstruction
Image reconstruction, also known as Image Inpainting, aims to repair missing parts of an Image, such as old, damaged black and white photographs and films. Often a common data set is taken and the areas in the image that need to be fixed are artificially created.
An example of restoration is shown below, and there are a total of four pictures that need to be restored. The example comes from the paper “Image Inpainting for Irregular Holes Using Partial Convolutions”.
Paper:
- Pixel Recurrent Neural Networks, 2016.
- Image Inpainting for Irregular Holes Using Partial Convolutions, 2018.
- Highly Scalable Image Reconstruction using Deep Neural Networks with Bandpass Filtering, 2018.
- Generative Image Inpainting with Contextual Attention, 2018
- Free-form Image Inpainting with Gated Convolution, 2018
- EdgeConnect: Generative Image Inpainting with Adversarial Edge Learning, 2019
Making:
- Awesome-Image-Inpainting
- generative_inpainting
- edge-connect
Article:
- The goddess is being coded? A imagine a dash back effect beyond Adobe | is open source
- 2018 CVPR image inpainting
6. Super-resolution
Super resolution refers to generating a task with higher resolution and more detail than the original image. An example is shown in the figure below, from the paper “Photo-realistic Single Image super-resolution Using a Generative Adversarial Network.”
In general, super-resolution models can also be used to solve image restoration and inpainting, as they are used to solve problems related to comparison.
Commonly used data sets mainly adopt existing data sets and generate low resolution images for model training.
Making:
- Image Super Resolution for Anime-style Art– Super Resolution application for Anime images, 14K stars
- neural-enhance
- Image super-resolution through deep learning
Paper:
- Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network, 2017.
- Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution, 2017.
- Deep Image Prior, 2017.
- ESRGAN: Enhanced super-resolution Generative Adversarial Networks, 2018
Article:
-
Image super-resolution reconstruction
-
How will super resolution technology evolve? These six ECCV 18 papers take you through them all at once
-
A recent review of deep learning image super-resolution: From Models to Applications
-
ESRGAN: Enhanced Super-Resolution Method Based on GAN
7. Image Synthesis
Image generation is the task of generating a modified part of the image or a completely new image based on an image. This application has developed rapidly in recent years, mainly because GANs is a very hot research direction in recent years, and image generation is a major application of GANs.
An example of image generation is as follows:
Githubs:
- Tensorflow-generative -model-collections– integrates various types of GANs code
- The-gan-zoo- a collection of all current gans-related papers
- AdversarialNetsPapers
Paper:
- Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, 2015.
- Conditional Image Generation with PixelCNN Decoders, 2016.
- Pix2Pix–Image-to-image Translation with Conditional Adversarial Networks, 2016
- Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, 2017.
- BigGAN –LARGE SCALE GAN TRAINING FOR HIGH FIDELITY NATURAL IMAGE SYNTHESIS, 2018
Article:
- Dry goods | management.but, GAN principle, text version (complete)
- Depth | generated against Internet beginners: article, understand the basic principle of GAN (resources)
- Exclusive | GAN NIPS 2016 violent speech scene: the father of the principle of comprehensive interpretation to generate against a network and the future (attached PPT)
- Nvidia releases GAN again! Multi-level feature style transfer face generator
8. A face
Face applications, including face recognition, face detection, face matching, face alignment and so on, which should be the most popular computer vision is the most mature application, and has been widely used in a variety of security, identity authentication, such as face payment, face unlock.
Here are a few Github projects, papers, articles, and datasets directly recommended
Making:
- Awesome-face_recognition: A collection of all papers related to human faces in the last decade
- Face_recognition: face recognition library, can realize recognition, detection, matching and other functions.
- facenet
Paper:
- FaceNet: A Unified Embedding for Face Recognition and Clustering, 2015
- Face Recognition: From Traditional to Deep Learning Methods, 2018
- MSFD: Multi-scale Receptive Field Face Detector, 2018
- DSFD: Dual Shot Face Detector, 2018
- Neural Architecture Search for Deep Face Recognition, 2019
Article:
- Face recognition technology comprehensive summary: from traditional methods to deep learning
- Resources | from face detection to semantic segmentation, training models of OpenCV library
Data set:
- LFW
- CelebA
- MS-Celeb-1M
- CASIA-WebFace
- FaceScrub
- MegaFace
Other 10.
There are actually many other directions, including:
- Image Captioning: Generate a description for a picture.
Show and Tell: A Neural Image Caption Generator, 2014.
- Text to Image: Generate images based on Text.
AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks, 2017.
- Image Colorization: Changes an Image from black and white to color.
Colorful Image Colorization, 2016.
- Human Pose Estimation: Human behavior identification
Cascaded Pyramid Network for Multi-Person Pose Estimation, 2017
Directions include 3D, video, medical images, q&A, autopilot, tracking, and more. Check out this website:
Paperswithcode.com/area/comput…
And if that one direction, want to start learning the content, the first recommend to find aspects of review articles or papers in Chinese, of course, if the English reading ability is good, you can also view the review articles in English and by looking at the reviews to see the need to read the paper, paper recommended first three to five years at a recent paper, paper is too long, Unless you need to know more about an algorithm, you don’t need to read much.
In addition, it is necessary to combine the actual project to deepen the understanding of the algorithm, by running down the code, you can also better understand how an algorithm is implemented.
reference
- Machinelearningmastery.com/application…
- paperswithcode.com/sota
summary
This article briefly introduces several computer vision applications, including the problems they solve and recommends several Github projects and papers, articles, and commonly used data sets.
Welcome to follow my wechat official account – Machine Learning and Computer Vision, or scan the qr code below, we can communicate, learn and progress together!
Past wonderful recommendation
Machine learning series
- Beginners of machine learning actual combat tutorial!
- Model evaluation, over-fitting, under-fitting and hyperparameter tuning methods
- Summary and Comparison of Commonly used Machine Learning Algorithms
- Summary and Comparison of Common Machine Learning Algorithms (PART 1)
- How to Build a Complete Machine Learning Project
- Data Preprocessing for feature Engineering (PART 1)
Github projects & Resource tutorials recommended
- [Github Project recommends] a better site for reading and finding papers
- TensorFlow is now available in Chinese
- Must-read AI and Deep learning blog
- An easy-to-understand TensorFlow tutorial
- Recommend some Python books and tutorials, both beginner and advanced!
- [Github project recommendation] Machine learning & Python
- [Github Project Recommendations] Here are three tools to help you get the most out of Github
- Github provides information about universities and foreign open course videos
- Did you pronounce all these words correctly? Incidentally recommend three programmers exclusive English tutorial!