↑ : Click above the blue word to pay attention to us!


\


Ingenious OR ingenious

Author: Zhou Yan

Editor’s note: The research of motion recognition is mainly based on the recognition of video data, which mainly includes motion recognition (such as gesture recognition, motion recognition, etc.), target recognition and attitude prediction several sub-directions. Research in all these directions is inseparable from representative video data. Unlike MNIST and ImageNet, which are very mature and commonly used data sets in the field of image recognition, the data sets in the field of motion recognition are relatively limited and usually occupy a large disk space. Therefore, it is necessary to carefully select an appropriate data set before carrying out relevant research.

Action Recognition is a hot Topic in the field of computer vision recently. In recent years, more and more relevant articles have appeared in CVPR, ICCV, NIPS and other machine learning and computer vision conferences. Here’s a Github repo (github.com/jinwchoi/aw…

The research of motion recognition is mainly based on the recognition of video data, which mainly includes motion recognition (such as gesture recognition, motion recognition, etc.), target recognition and attitude prediction several sub-directions. Research in all these directions is inseparable from representative video data. Unlike MNIST and ImageNet, which are very mature and commonly used data sets in the field of image recognition, the data sets in the field of motion recognition are relatively limited and usually occupy a large disk space. Therefore, it is necessary to carefully select an appropriate data set before carrying out relevant research.

Access to their own research best video data set may need to spend a lot of bandwidth and hard disk resources to download, in this paper, the existing public data sets in the field of gesture recognition were studied, in detail introduced the characteristics of each data set, the reader can with the help of this article, according to their own needs to choose the appropriate download data sets.

1. Classical data sets

  • KTH (www.nada.kth.se/cvap/action)… :

The classical action recognition data set is also one of the data sets with high usage in the present paper. The data set contains a total of 2391 sets of data, including 6 actions, and each action is completed by 25 characters in 4 different scenes. Therefore, there are 600 video sequences in total, and each video can be divided into 4 sub-sequences. The actions of KTH data set are relatively standard, and the fixed lens is adopted. The number is also relatively rich for the current model training, so it can be said that it is a very useful data set for the task of simple action recognition. At the same time, KTH in GitHub processing open source procedures have a lot of, you can refer to their own needs at any time.

             

  • Weizmann (www.wisdom.weizmann.ac.il/~vision/Spa.) :

Another set of more classical data, the data is also the video of 10 typical actions under the fixed lens, and the data set provides some actions with other objects as interference, which can test the robustness of the model.

At the same time, an official program for background removal is provided, but the data set of 90 groups of conventional data and 21 groups of robust test data is relatively small, which is insufficient for the current model training. However, it may be a suitable data set for models that need to use small data, such as transfer Learning or one-short Learning.

             

  • The Inria XMAS (4 drepository. Inrialpes. Fr/public/view…

    This set of data mainly provides the video data of the same action in multiple sets of camera angles, which can be said to be a simple dynamic background. The dataset provides 13 daily actions, three times each, for 11 actors, who are free to choose positions and directions. The download of this data set is special and requires wGET.

  • UCF sports action dataset (www.crcv.ucf.edu/data/UCF_Sp.) :

A data set mainly about motion, this data set is also a high quality data set, mainly consists of 13 routine movements. The catch is that the number of each data set is relatively small, but this dataset has spawned several subsequent ones such as UCF-50 (crcv.ucf.edu/data/UCF50….

              

  • Hollywood human action dataset. (www.di.ens.fr/~laptev/act.) :

According to the Hollywood film footage of the data set, the data set contains 475 video, a certain amount of data, but the data set has a feature film lens is often not a single action, and a lot of action are mixed together at the same time, at the same time background because switch will be discontinuous, Perhaps such a situation would affect the training of the model. In addition, this set of data has been followed up with even larger data sets: www.di.ens.fr/~laptev/act…

             

Summary: The above is the introduction to classical data sets, which generally have a small amount of data and relatively simple scenes. The proposed time is usually around 2000, and the resolution of videos is generally low. For more detailed introduction, please refer to a review article in 2014: A Survey on vision-based human action Recognition.

2. Medium sized data sets

  • HMDB (serre-lab.clps.brown.edu/resource/hm…

The dataset consists of 51 categories, with an average of 100-200 sets of data per category. From the perspective of data volume and category, it can be seen that there are relatively rich data, but this data set is mainly composed of some movie shots and daily video cameras, so the background is relatively complex, and there are also dynamic shots and switching shots of videos. Therefore, this data set is more suitable for target recognition and detection.

             

  • SVW (cvlab.cse.msu.edu/project-svw…

         

Conclusion: Medium-size data sets generally have more data volume and more categories than classical data sets, which also reflects that with the development of computational scale, prediction models that can be established become more and more complex and can handle more complex tasks.

3. Large-scale data sets suitable for deep learning

  • ActivityNet (github.com/activitynet…

             

  • 20 bn – jester (20 bn.com/datasets/je…

           

  • NTU RGB + D (rose1.ntu.edu.sg/datasets/ac…

The data set provides a rich amount of data, and the video background is relatively fixed, which is suitable for motion recognition. Meanwhile, the data is characterized by providing RGB, depth and bone videos at the same time. The total data of the dataset is up to 1.3TB, and more abundant data sets will be provided later (“NTU RGB+D 120”), but the download of this data needs to apply for an account through the website, but there will be a reply within one day (carefully filled in generally can pass).

     

Summary: Large-scale data sets are mainly characterized by large amount of data and more categories. In addition, websites generally cannot provide direct download, but download by providing similar crawler programs. This kind of data is mainly based on data sets that appeared in the last 3-5 years. The size of data is generally in GB or even TB level, which requires deep models and machines with stronger computing power to build models and process them.

4. Scenario-specific data sets

These are well-known open source datasets that are often used as benchmarks for algorithms. Then for some practical application scenarios, we often need some special data sets. There are many such niche data sets, which we will not collect for introduction, but here is just one example.

Vanessa Driver Detection is a Driver status Detection dataset with 10 states and a total of 22,425 graphs. The size of 4 g. (Keywords can be seen at the end of the data)

             

Data set address:

Https//www.kaggle.com/c/state-far…

Every year, many traffic accidents occur because the driver does not pay attention to the automatic driving, so a good driver assistance system should not only pay attention to the situation outside the car, but also pay attention to the situation of the driver inside the car.

This data set comes from the Kaggle platform and contains 10 states as follows:

c0:safe driving

c1:texting-right

c2:talking on the phone-right

c3:texting-left

c4:talking on the phone-left

c5:operating the ratio

c6:drinking

c7:reaching behind

c8:hair and makeup

c9:talking on passenger

Some samples are shown below, with about 2,000 images in each category, for a total of 22,425 images.

  

**** concludes:

Part of this article is mainly to gesture recognition field data sets made some basic introduction and discussion, many also does not have the actual data processing and application, so it is not enough in-depth, but also hope that this article can have the effect of a topic, more detailed introduction can directly read the directions to the data set’s website and download the study. How to organize your own data to provide support for the algorithm is a key step in doing research. Finally, I hope you can make more wonderful results through these data.

The article stated

Author: Zhou Yan

Editor in charge: Zhou Yan, Guan Jun

Grapes

The article is originally published by “Ingenious OR ingenious”

Note: the menu of the official account includes an AI cheat sheet, which is very suitable for learning on the commute.

Machine learning Online Manual Deep Learning online Manual AI Basic Download (Part 1)4500+ user ID:92416895), please reply to knowledge PlanetCopy the code

Like articles, click Looking at the