Background of 0.

Main video classification data sets are shown in the table:

 

In this paper, a small and convenient data set HMDB51 is introduced in detail

 

1. HMDB51 is introduced

HMDB51 contains 51 types of actions, a total of 6849 videos, each action contains at least 51 videos, resolution 320*240,. From YouTube, Google Video, 2G.

Home address: serre-lab.clps.brown.edu/resource/hm… Recommended in the domestic use of thunder download, faster.

The actions mainly include:

1) General facial movements smiling, laughing, chewing and talking.

2) Face operation and object operation: smoking, eating and drinking.

3) General body movements: cartwheel, clap, climb, climb stairs, jump, land on the floor, backhand flip, handstand, jump, pull, push, run, sit down, sit up, somersault, stand up, turn, walk, wave.

4) Interactive actions with objects: combing hair, grabbing, drawing sword, dribbling, golf, hitting something, ball, picking, inverted, pushing something, riding a bike, riding a horse, shooting a ball, shooting a bow, gun, swinging baseball bats, sword exercise, throwing.

5) Body movements: fencing, hugging, kicking someone, kissing, punching, shaking hands, sword fighting.

 

statistical

Action categories, body parts, camera action, viewpoint
Clip quality, clip duration, number of clip duration

Video stabilization

One of the major challenges associated with using video clips extracted from real-world videos is the possibility of significant camera/background motion, assuming it interferes with local motion calculations and should be corrected. To eliminate camera motion, we used standard image Mosaic techniques to align the frames of the clip. These techniques estimate the background plane by detecting and then matching significant features in two adjacent frames. Distance measurements including absolute pixel difference and Euler distance of the detection point are used to calculate the correspondence between the two frames. The points with the minimum distance are then matched, and the RANSAC algorithm is used to estimate the geometric transformation between all adjacent frames (independent for each pair of frames). Using this estimate,

The original IMGS The stability of the IMGS

Other action identification benchmarks

This work began in KTH: the KTH dataset contains six types of operations, each of which contains 100 clips. This is followed by the Weizmann Data set collected by the Weizmann Institute, which contains 10 action categories and 9 fragments per category. Both groups are recorded in controlled and simplified Settings. Then make the first real action dataset collected from movies and annotated from movie scripts in INRIA; The Hollywood Human Activity set contains 8 types of action, varying between 60 clips per action class – 140 per class. Its expanded Hollywood2 Human Actions Set offers a total of 3,669 videos in ten scenarios, distributed across ten categories of Human behavior. The UCF team has also been collecting operational data sets, mainly from YouTube. UCF Sports has 9 types of sports and a total of 182 clips. UCF YouTube contains 11 action classes and UCF50 contains 50 action classes. We will show in this paper that videos from YouTube may be biased by low-level features, meaning low-level features (i.e. colors and points) are more discriminating than intermediate fears (i.e. motion and shape).

The data set years # action #CLIPS PER ACTION
KTH In 2004, 6 10
weizmann In 2005, 9 9
IXMAS In 2006, 11 33
Hollywood In 2008, 8 30-140.
UCF sports In 2009, 9 14-35
Hollywood2 In 2009, 12 61-278.
UCF YouTube In 2009, 11 100
MSR In 2009, 3 14-25
The Olympic 2010 16 50
UCF50 2010 50 Minutes. 100
HMDB51 2011 51 Minutes. 101

On the page