The 20BN-JESTER dataset is a large collection of labeled video clips that show humans performing pre-definded hand gestures in front of a laptop camera or webcam. The dataset was created by a large number of crowd workers. It allows for training robust machine learning models to recognize human hand gestures. It is available free of charge for academic research. Commercial licenses are available upon request.
A paper with supplementary material can be found here.
The video data is provided as one large TGZ archive, split into parts of 1 GB max. The total download size is 22.8 GB. The archive contains directories numbered from 1 to 148092. Each directory corresponds to one video and contains JPG images with height 100px and variable width. The JPG images were extracted from the orginal videos at 12 frames per seconds. The filenames of the JPGs start at 00001.jpg. The number of JPGs varies as the length of the original videos varies.
Total number of videos |
148,092
|
Training Set |
118,562
|
Validation Set |
14,787
|
Test Set (w/o labels) |
14,743
|
Labels |
27
|
Classes
12,416
Doing other things
5,460
Thumb Down
5,457
Thumb Up
5,444
Drumming Fingers
5,434
Pushing Hand Away
5,413
Stop Sign
5,410
Sliding Two Fingers Down
5,379
Pulling Hand In
5,379
Zooming Out With Two Fingers
5,358
Pushing Two Fingers Away
5,355
Zooming In With Two Fingers
5,345
Sliding Two Fingers Left
5,344
No gesture
5,330
Zooming Out With Full Hand
5,315
Pulling Two Fingers In
5,314
Shaking Hand
5,307
Zooming In With Full Hand
5,303
Swiping Down
5,262
Sliding Two Fingers Up
5,244
Sliding Two Fingers Right
5,240
Swiping Up
5,165
Rolling Hand Forward
5,160
Swiping Left
5,066
Swiping Right
5,031
Rolling Hand Backward
4,181
Turning Hand Counterclockwise
3,980
Turning Hand Clockwise
|
Twenty Billion Neurons offers our Crowd Acting™ video dataset collections in three different license types depending on the organization you belong to and the intended use for the data.
Perform research and evaluations in a corporate research lab or for-profit organization.
ProceedIf you have been successful in creating a classification model based on the training set and it performs well on the validation set, we encourage you to run your model on the test set (which is published without any class labels, as you might have noticed). Please prepare a .csv file with the video's id in the first column and your predicted class label (as a string matching the wording used in the training and validation sets). As a separator, please use a semicolon. You can then upload your .csv file here (user login required) to be ranked in the leaderboard and to benchmark your approach against that of other machine learners. We are looking forward to your submission.
MobileNet+NL+SlowFast
ysge
wuyingshou
Fusion_TSN_LSTM
CVPR2020Submission
RFEEN, 20 Crops
MobileNet_v2_NL_16sample
12F rgb
rgb 12F
Ford's Gesture Recognition System
L. Shi, Y. Zhang, C. Jian, and L. Hanqing, "Gesture Recognition using Spatiotemporal Deformable Convolutional Representation" in IEEE International Conference on Image Processing (ICIP), 2019.
Motion Fused Frames (MFFs)
Code: https://github.com/okankop/MFF-pytorch
Article: https://arxiv.org/pdf/1804.07187.pdf
Contact:okankopuklu@gmail.com
Spatiotemporal Two Streams network
3D CNN Architecture
Motion Feature Network (MFNet)
RNP
ResNext 101
SSNet RGB resnet
Short-Term Sampling Neural Networks (9 groups)
TVB
Temporal Pyramid Relation Network for Video-Based Gesture Recognition,2018 25th IEEE International Conference on Image Processing (ICIP)
DIN
TRN - 8 segments
Short-Term Sampling Neural Networks
3D CNN - Multi time scale evaluation
8frames rgb
TRN (CVPR'18 submission)
TRN + BNInception
Anonymous
final test
label+string
3D CNN for transfer learning
Besnet
3D_GesNet
3D-GesNet(only rgb)
ECO
2D and 3D fused network
TRN-E
Inceptionv2 - TRN16
Test 1
test
test rgb 16f
One Stream Modified-I3D
TAN_ep2
199_3D
p3d_199
RT C3D - 16 Frames
Code: https://github.com/fabiopk
TPP sub1
经典双赢
prune_net
63_3D
result_3Dconv_63
Modified C3D
CNN+LSTM
3D CNN
3D ResNet 101
VideoLSTM
3D convolutional neural network
Result_63
ConvLSTM
Twenty Billion Neuron's Jester System
3d+resnet18
ColumnConv3D
Basic finetune MobileNetV2 (pretrained imagenet) + LSTM output
interframe-difference LSTM
Mobilenetv3-LSTM
3D ResNet
CNN-LSTM
3D-ResNet101 trained on Kinetics
3 Temporal Stream Network + ConvRNN
2D+3D
Test
rgb_only
test label (from 0 to 26)
ResNext 101
[test_run] 3D RGB 16F