The 20BN-SOMETHING-SOMETHING dataset is a large collection of labeled video clips that show humans performing pre-defined basic actions with everyday objects. The dataset was created by a large number of crowd workers. It allows machine learning models to develop fine-grained understanding of basic actions that occur in the physical world. It is available free of charge for academic research. Commercial licenses are available upon request.
A paper with supplementary material can be found here.
The video data is provided as one large TGZ archive, split into parts of 1 GB max. The total download size is 25.2 GB. The archive contains directories numbered from 1 to 108499. Each directory corresponds to one video and contains JPG images with height 100px and variable width. The JPG images were extracted from the orginal videos at 12 frames per seconds. The filenames of the JPGs start at 00001.jpg. The number of JPGs varies as the length of the original videos varies.
Total number of videos |
108,499
|
Training Set |
86,017
|
Validation Set |
11,522
|
Test Set (w/o labels) |
10,960
|
Labels |
174
|
Classes
986
Holding something
979
Turning something upside down
924
Turning the camera left while filming something
914
Stacking number of something
914
Turning the camera right while filming something
888
Opening something
885
Approaching something with your camera
877
Picking something up
873
Pushing something so that it almost falls off but doesn't
864
Folding something
863
Moving something away from the camera
858
Closing something
850
Moving away from something with your camera
845
Turning the camera downwards while filming something
841
Pushing something so that it slightly moves
839
Turning the camera upwards while filming something
838
Pretending to pick something up
838
Showing something to the camera
833
Moving something up
830
Plugging something into something
830
Unfolding something
828
Putting something onto something
827
Showing that something is empty
825
Pretending to put something on a surface
825
Taking something from somewhere
824
Putting something next to something
821
Moving something towards the camera
820
Showing a photo of something to the camera
815
Pushing something with something
808
Throwing something
802
Pushing something from left to right
801
Something falling like a feather or paper
801
Throwing something in the air and letting it fall
796
Throwing something against something
793
Lifting something with something on it
788
Taking one of many similar things on the table
785
Showing something behind something
781
Putting something into something
780
Tearing something just a little bit
779
Moving something away from something
778
Tearing something into two pieces
777
Holding something next to something
777
Pushing something from right to left
776
Putting something, something and something on the table
775
Moving something closer to something
775
Pretending to take something from somewhere
774
Pretending to put something next to something
773
Uncovering something
772
Pouring something into something
772
Putting something and something on the table
772
Something falling like a rock
769
Moving something down
769
Pulling something from right to left
767
Throwing something in the air and catching it
763
Tilting something with something on it until it falls off
762
Putting something in front of something
760
Pretending to turn something upside down
759
Putting something on a surface
757
Pretending to throw something
756
Covering something with something
756
Showing something on top of something
753
Squeezing something
752
Putting something similar to other things that are already on the table
751
Lifting up one end of something, then letting it drop down
749
Taking something out of something
747
Moving part of something
745
Pulling something from left to right
744
Lifting something up completely without letting it drop down
743
Attaching something to something
743
Holding something in front of something
743
Moving something and something closer to each other
743
Putting something behind something
742
Pushing something so that it falls off the table
735
Holding something over something
734
Pretending to open something without actually opening it
732
Removing something, revealing something behind
729
Hitting something with something
727
Moving something and something away from each other
727
Touching (without moving) part of something
724
Pretending to put something into something
724
Showing that something is inside something
721
Lifting something up completely, then letting it drop down
720
Pretending to take something out of something
709
Holding something behind something
707
Laying something on the table on its side, not upright
700
Poking something so it slightly moves
699
Pretending to close something without actually closing it
698
Putting something upright on the table
690
Dropping something in front of something
687
Dropping something behind something
685
Lifting up one end of something without letting it drop down
682
Rolling something on a flat surface
677
Throwing something onto a surface
671
Showing something next to something
668
Dropping something onto something
668
Stuffing something into something
662
Dropping something into something
662
Piling something up
660
Letting something roll along a flat surface
658
Twisting something
643
Spinning something that quickly stops spinning
636
Putting number of something onto something
634
Moving something across a surface without it falling down
634
Putting something underneath something
628
Plugging something into something but pulling it right out as you remove your hand
627
Dropping something next to something
606
Poking something so that it falls over
593
Spinning something so it continues spinning
588
Poking something so lightly that it doesn't or almost doesn't move
585
Wiping something off of something
582
Moving something across a surface until it falls down
580
Pretending to poke something
570
Putting something that cannot actually stand upright upright on the table, so it falls on its side
566
Pulling something out of something
565
Scooping something up with something
562
Pretending to be tearing something that is not tearable
543
Burying something in something
542
Tipping something over
533
Tilting something with something on it slightly so it doesn't fall down
528
Pretending to put something onto something
522
Bending something until it breaks
512
Letting something roll down a slanted surface
509
Trying to bend something unbendable so nothing happens
505
Bending something so that it deforms
503
Digging something out of something
502
Pretending to put something underneath something
497
Putting something on a flat surface without letting it roll
479
Putting something on the edge of something so it is not supported and falls down
471
Pretending to put something behind something
471
Spreading something onto something
466
Sprinkling something onto something
463
Something colliding with something and both come to a halt
462
Pushing something off of something
453
Putting something that can't roll onto a slanted surface, so it stays where it is
451
Lifting a surface with something on it until it starts sliding down
433
Pretending or failing to wipe something off of something
433
Trying but failing to attach something to something because it doesn't stick
427
Pulling something from behind of something
423
Pushing something so it spins
420
Pouring something onto something
416
Pulling two ends of something but nothing happens
413
Moving something and something so they pass each other
413
Pretending to sprinkle air onto something
405
Putting something that can't roll onto a slanted surface, so it slides down
395
Something colliding with something and both are being deflected
386
Pretending to squeeze something
367
Pulling something onto something
362
Putting something onto something else that cannot support it so it falls down
358
Lifting a surface with something on it but not enough for it to slide down
358
Pouring something out of something
346
Moving something and something so they collide with each other
341
Tipping something with something in it over, so something in it falls out
339
Letting something roll up a slanted surface, so it rolls back down
318
Pretending to scoop something up with something
311
Pretending to pour something out of something, but something is empty
294
Pulling two ends of something so that it gets stretched
290
Failing to put something into something because something does not fit
288
Pretending or trying and failing to twist something
282
Trying to pour something into something, but missing so it spills next to it
277
Something being deflected from something
273
Poking a stack of something so the stack collapses
267
Spilling something onto something
245
Pulling two ends of something so that it separates into two pieces
229
Pouring something into something until it overflows
220
Pretending to spread air onto something
219
Twisting (wringing) something wet until water comes out
217
Poking a hole into something soft
207
Spilling something next to something
206
Poking a stack of something without the stack collapsing
183
Putting something onto a slanted surface but it doesn't glide down
170
Pushing something onto something
141
Poking something so that it spins around
121
Spilling something behind something
77
Poking a hole into some substance
|
Twenty Billion Neurons offers our Crowd Acting™ video dataset collections in three different license types depending on the organization you belong to and the intended use for the data.
Perform research and evaluations in a corporate research lab or for-profit organization.
ProceedIf you have been successful in creating a classification model based on the training set and it performs well on the validation set, we encourage you to run your model on the test set (which is published without any class labels, as you might have noticed). Please prepare a .csv file with the video's id in the first column and your predicted class label (as a string matching the wording used in the training and validation sets). As a separator, please use a semicolon. You can then upload your .csv file here (user login required) to be ranked in the leaderboard and to benchmark your approach against that of other machine learners. We are looking forward to your submission.
RGB-only 16f+8f (16611_10300)
RGB-only ensemble (10335_10336)
Camera, Two-stream, ResNet-50
TSM ResNet-50 8f, RGB+Flow
GSM
rgb+flow
2stream 101
rgb ensemble
TSM 16f+8f, twice
TSM-16f+8f, 2clips
12F kinetics pretrain rgb+flow
ipcsn 5_crop G_sim
ipcsn, 5_crop
testing2_2clip_3crops resnet-50 8+16segments
two stream 1+1 12F
single RGB model only
8F kinetics pretrain rgb+flow
RGB only
single-8-16
single RGB model, I3D50 bachbone using high order blocks
resnet50/two-stream
16f*2 residual
GSM: trained on train split
testing_first en_1clip_center crop
rgb ensemble
DZLSB
Validation results: 51.4%
RGB only, ResNet-50 + graph operations search
"Adaptive Interaction Modeling via Graph Operations Search"
16f
two stream 1+1
any4
ANY5
two stream
8F kinetics_pretrained rgb
GodBlessThisTrial
tscv
test2
any3
any1
101rgb
en11
two stream diff
META, 8 frame, revaluation (work in progress, the last submitted results are wrong.)
PTnet: only RGB 43.39%
stw 8+16
Submission only for adding description for our previous submission.
The results is still the same as in 03/13/2018:
RGB only, Non-local ResNet-50 + GCN.
Model is pre-trained with:
https://github.com/facebookresearch/video-nonlocal-net
ResNet-50 backbone, enhanced residual units, RGB only model.
two stream
16f-3crop-10clips
ResNet50 RGB-only
Full
8f*2
update
8f*2 residual
16frame_twicesample_fullres
rgb
test1
16f-1crop-1clip
Validation results-(RGB): 46.4
Validation results-(RGB+Flow): 49.5
Test results-(RGB): 42.3
%
"ECO: Efficient Convolutional Network for Online Video Understanding"
https://github.com/mzolfaghari/ECO-efficient-video-understanding
stw-16
8f
RGB Only
50rgb
p2 2s
any2
Mutli-crop rgb 50 layer
16f
32f
only rgb
8f_single
8+16f
24f
TVB
Bug fixes (Apr. 16 2018) - Previous Results: 41.96%
under development
the world
A
DRX3D
ff832
RGB 16 o
check overfitting
RGB-only
2stream TRN
RGB-only
flowTN
META (work in progress)
Previous BSL: 39%
two stream
3D CNN Architecture
sc8
UN_fe
un_e
resnet50 rgb 8
Motion Feature Network (MFNet)
3D-Resnet18, 32 frames (val: 0.42)
testing 1clip 1crop
R50D
zju
2
DIN
Two-channel M(256-256)
https://arxiv.org/pdf/1804.09235.pdf
rgb
A lightweight method
rgb
final test
label+string
R50 2_1D coco TH
Besnet
TwistStream
Zhou, Bolei, et al. "Temporal relational reasoning in videos." Proceedings of the European Conference on Computer Vision (ECCV). 2018.
ResNet-50 (16+8)segments
TSM segment8 crop1
seg8_crop1_epoch25
https://github.com/esc/smth-smth-baseline/tree/train_004
smth-smth-baseline with dropout
Two Stream LSTM
Motion Feature Network (MFNet)
tsn
testing
S8