Background of 0.
Main video classification data sets are shown in the table:
In this paper, a small and convenient data set HMDB51 is introduced in detail
1. HMDB51 is introduced
HMDB51 contains 51 types of actions, a total of 6849 videos, each action contains at least 51 videos, resolution 320*240,. From YouTube, Google Video, 2G.
Home address: serre-lab.clps.brown.edu/resource/hm… Recommended in the domestic use of thunder download, faster.
The actions mainly include:
1) General facial movements smiling, laughing, chewing and talking.
2) Face operation and object operation: smoking, eating and drinking.
3) General body movements: cartwheel, clap, climb, climb stairs, jump, land on the floor, backhand flip, handstand, jump, pull, push, run, sit down, sit up, somersault, stand up, turn, walk, wave.
4) Interactive actions with objects: combing hair, grabbing, drawing sword, dribbling, golf, hitting something, ball, picking, inverted, pushing something, riding a bike, riding a horse, shooting a ball, shooting a bow, gun, swinging baseball bats, sword exercise, throwing.
5) Body movements: fencing, hugging, kicking someone, kissing, punching, shaking hands, sword fighting.
statistical
Action categories, body parts, camera action, viewpoint | |||
---|---|---|---|
Clip quality, clip duration, number of clip duration | ||
---|---|---|
Video stabilization
One of the major challenges associated with using video clips extracted from real-world videos is the possibility of significant camera/background motion, assuming it interferes with local motion calculations and should be corrected. To eliminate camera motion, we used standard image Mosaic techniques to align the frames of the clip. These techniques estimate the background plane by detecting and then matching significant features in two adjacent frames. Distance measurements including absolute pixel difference and Euler distance of the detection point are used to calculate the correspondence between the two frames. The points with the minimum distance are then matched, and the RANSAC algorithm is used to estimate the geometric transformation between all adjacent frames (independent for each pair of frames). Using this estimate,
The original IMGS | The stability of the IMGS |
---|---|
Other action identification benchmarks
This work began in KTH: the KTH dataset contains six types of operations, each of which contains 100 clips. This is followed by the Weizmann Data set collected by the Weizmann Institute, which contains 10 action categories and 9 fragments per category. Both groups are recorded in controlled and simplified Settings. Then make the first real action dataset collected from movies and annotated from movie scripts in INRIA; The Hollywood Human Activity set contains 8 types of action, varying between 60 clips per action class – 140 per class. Its expanded Hollywood2 Human Actions Set offers a total of 3,669 videos in ten scenarios, distributed across ten categories of Human behavior. The UCF team has also been collecting operational data sets, mainly from YouTube. UCF Sports has 9 types of sports and a total of 182 clips. UCF YouTube contains 11 action classes and UCF50 contains 50 action classes. We will show in this paper that videos from YouTube may be biased by low-level features, meaning low-level features (i.e. colors and points) are more discriminating than intermediate fears (i.e. motion and shape).
The data set | years | # action | #CLIPS PER ACTION |
---|---|---|---|
KTH | In 2004, | 6 | 10 |
weizmann | In 2005, | 9 | 9 |
IXMAS | In 2006, | 11 | 33 |
Hollywood | In 2008, | 8 | 30-140. |
UCF sports | In 2009, | 9 | 14-35 |
Hollywood2 | In 2009, | 12 | 61-278. |
UCF YouTube | In 2009, | 11 | 100 |
MSR | In 2009, | 3 | 14-25 |
The Olympic | 2010 | 16 | 50 |
UCF50 | 2010 | 50 | Minutes. 100 |
HMDB51 | 2011 | 51 | Minutes. 101 |