Cookies help us deliver our services. You can find more information in our Privacy Policy. Learn more


NEW RELEASE


The 20BN-something-something Dataset V2


Introduction

The 20BN-SOMETHING-SOMETHING dataset is a large collection of densely-labeled video clips that show humans performing pre-defined basic actions with everyday objects. The dataset was created by a large number of crowd workers. It allows machine learning models to develop fine-grained understanding of basic actions that occur in the physical world. It is available free of charge for academic research. Commercial licenses are available upon request.

This is the second release of the dataset. The first release is also still available here. The new release features the following updates:

  • Greatly increased number of videos: With 220,847 videos (vs. 108.499 in V1) we release more than twice as many videos.
  • Object annotations and captioning: For each video in the training and validation sets we now also provide object annotations in addition to the video label if applicable. For example, for a label like "Putting [something] onto [something]" there is also an annotated version like "Putting a cup onto a table". In total, there are 318,572 annotations involving 30,408 unique objects.
  • Captioning Leaderboard: The object annotations allow researchers to not only train classifiers on the data, but also more sophisticated models that generate full natural language captions describing what is happening in the scene. To support comparison of results for such models, we have launched and additional caption leaderboard (see bottom of page).
  • Greatly reduced label noise: For the new release we used crowd-sourcing to verify the video quality by asking five different crowd workers for every video to verify that the action shown in the video matches the description given. The release only contains videos where all workers have given a position answer.
  • Greatly increased pixel resolution: The resolution is now increased to a height of 240px (vs. 100px in V1)
  • New download format: The data format for download is now Webm using VP9 as encoding (vs. JPG images in V1)

Papers with supplementary material can be found here and here.

Trying to pour water into a glass, but missing so it spills next to it
Spinning a bracelet so it continues spinning
Pulling two ends of a rubber band so that it gets stretched

Data format

The video data is provided as one large TGZ archive, split into parts of 1 GB max. The total download size is 19.4 GB. The archive contains webm-files using the VP9 codec. Files are numbered from 1 to 220847.

Terms of use

This dataset be used for academic research free of charge under the below license agreement. If you seek to use the data for commercial purposes please contact us.

Download Dataset

Please register or log in to download the dataset.


20BN-SOMETHING-SOMETHING-DATASET
Total number of videos
220,847
Training Set
168,913
Validation Set
24,777
Test Set (w/o labels)
27,157
Labels
174
4,081
Putting something on a surface
3,750
Moving something up
3,530
Covering something with something
3,442
Pushing something from left to right
3,242
Moving something down
3,195
Pushing something from right to left
3,004
Uncovering something
2,969
Taking one of many similar things on the table
2,943
Turning something upside down
2,849
Tearing something into two pieces
2,783
Putting something into something
2,631
Squeezing something
2,626
Throwing something
2,431
Putting something next to something
2,430
Poking something so lightly that it doesn't or almost doesn't move
2,418
Pushing something so that it slightly moves
2,339
Putting something similar to other things that are already on the table
2,315
Showing something behind something
2,298
Moving something and something closer to each other
2,259
Taking something out of something
2,252
Plugging something into something
2,240
Pushing something so that it falls off the table
2,234
Hitting something with something
2,209
Showing that something is empty
2,203
Holding something in front of something
2,079
Something falling like a rock
2,062
Moving something and something away from each other
2,025
Tearing something just a little bit
2,016
Lifting something with something on it
1,998
Stuffing something into something
1,969
Pretending to pick something up
1,911
Pretending to open something without actually opening it
1,908
Pulling something from left to right
1,906
Lifting something up completely without letting it drop down
1,893
Holding something next to something
1,886
Pulling something from right to left
1,869
Opening something
1,858
Something falling like a feather or paper
1,851
Lifting something up completely, then letting it drop down
1,851
Holding something
1,850
Putting something onto something
1,850
Lifting up one end of something, then letting it drop down
1,804
Pushing something with something
1,804
Holding something over something
1,773
Rolling something on a flat surface
1,763
Touching (without moving) part of something
1,644
Pretending to put something on a surface
1,623
Dropping something onto something
1,613
Lifting up one end of something without letting it drop down
1,599
Poking something so it slightly moves
1,587
Spinning something that quickly stops spinning
1,547
Showing that something is inside something
1,542
Folding something
1,530
Pouring something into something
1,482
Closing something
1,475
Throwing something against something
1,463
Stacking number of something
1,456
Picking something up
1,437
Pretending to take something from somewhere
1,428
Putting something behind something
1,426
Moving something closer to something
1,374
Holding something behind something
1,353
Putting something and something on the table
1,352
Moving something away from something
1,349
Approaching something with your camera
1,321
Pushing something so that it almost falls off but doesn't
1,301
Showing something on top of something
1,297
Pretending to put something next to something
1,290
Taking something from somewhere
1,272
Tilting something with something on it until it falls off
1,266
Unfolding something
1,256
Pretending to be tearing something that is not tearable
1,239
Turning the camera left while filming something
1,239
Turning the camera right while filming something
1,232
Dropping something next to something
1,227
Attaching something to something
1,222
Dropping something into something
1,211
Putting something, something and something on the table
1,199
Moving away from something with your camera
1,185
Showing something next to something
1,180
Putting number of something onto something
1,177
Throwing something in the air and catching it
1,176
Plugging something into something but pulling it right out as you remove your hand
1,168
Spinning something so it continues spinning
1,165
Pretending to put something into something
1,163
Letting something roll along a flat surface
1,145
Piling something up
1,131
Twisting something
1,131
Dropping something in front of something
1,123
Scooping something up with something
1,122
Pretending to close something without actually closing it
1,094
Putting something in front of something
1,069
Removing something, revealing something behind
1,061
Showing something to the camera
1,045
Pretending to take something out of something
1,038
Throwing something in the air and letting it fall
1,035
Throwing something onto a surface
1,021
Turning the camera upwards while filming something
1,019
Pretending to throw something
994
Moving something towards the camera
991
Trying to bend something unbendable so nothing happens
991
Dropping something behind something
986
Moving something away from the camera
980
Putting something upright on the table
976
Turning the camera downwards while filming something
950
Laying something on the table on its side, not upright
916
Showing a photo of something to the camera
905
Moving part of something
896
Tipping something over
892
Poking something so that it falls over
888
Pretending to turn something upside down
883
Moving something across a surface until it falls down
876
Letting something roll down a slanted surface
873
Wiping something off of something
856
Pretending to squeeze something
845
Pushing something so it spins
837
Putting something that cannot actually stand upright upright on the table, so it falls on its side
832
Moving something across a surface without it falling down
829
Tilting something with something on it slightly so it doesn't fall down
798
Bending something so that it deforms
754
Pretending to poke something
748
Putting something underneath something
746
Pretending to put something behind something
740
Pretending to put something onto something
736
Pulling something out of something
718
Bending something until it breaks
687
Pushing something off of something
687
Burying something in something
660
Trying but failing to attach something to something because it doesn't stick
653
Something colliding with something and both are being deflected
643
Pulling two ends of something but nothing happens
638
Putting something on the edge of something so it is not supported and falls down
586
Pulling something from behind of something
582
Moving something and something so they pass each other
577
Moving something and something so they collide with each other
553
Putting something on a flat surface without letting it roll
547
Something colliding with something and both come to a halt
543
Pretending to sprinkle air onto something
540
Sprinkling something onto something
535
Spreading something onto something
522
Digging something out of something
514
Pouring something out of something
492
Something being deflected from something
490
Pretending or failing to wipe something off of something
474
Spilling something onto something
447
Tipping something with something in it over, so something in it falls out
447
Putting something that can't roll onto a slanted surface, so it stays where it is
445
Pretending to pour something out of something, but something is empty
442
Putting something that can't roll onto a slanted surface, so it slides down
442
Putting something onto something else that cannot support it so it falls down
441
Letting something roll up a slanted surface, so it rolls back down
438
Pulling two ends of something so that it gets stretched
419
Pushing something onto something
408
Twisting (wringing) something wet until water comes out
405
Lifting a surface with something on it until it starts sliding down
404
Pretending or trying and failing to twist something
403
Pouring something onto something
389
Pretending to scoop something up with something
373
Pretending to put something underneath something
367
Poking a stack of something so the stack collapses
353
Failing to put something into something because something does not fit
352
Pouring something into something until it overflows
343
Pulling something onto something
313
Pulling two ends of something so that it separates into two pieces
276
Poking a stack of something without the stack collapsing
268
Lifting a surface with something on it but not enough for it to slide down
265
Trying to pour something into something, but missing so it spills next to it
258
Poking a hole into something soft
240
Spilling something next to something
225
Pretending to spread air onto something
185
Poking something so that it spins around
183
Putting something onto a slanted surface but it doesn't glide down
143
Spilling something behind something
115
Poking a hole into some substance

Classification Leaderboard

If you have been successful in creating a model based on the training set and it performs well on the validation set, we encourage you to run your model on the test set (which is published without any class labels, as you might have noticed). Please prepare a .csv file with the video's id in the first column and five labels (columns 2 to 6) from the labels.json file. As a separator, please use a semicolon. You can then upload your .csv file here (user login required) to be ranked in the leaderboard and to benchmark your approach against that of other machine learners. We are looking forward to your submission.

Rank
Name
Approach
Top-1
Top-5
1
Raghav Goyal
4 months ago

VGG style 11-layered 3D-CNN with left-right augmentation and fps jittering

50.76%
80.77%



NEW!
Captioning Leaderboard

For the 2nd release of the 20BN-Something-Something dataset we are also publishing the "somethings" that have been used in the videos and not only the action labels. Therefore, it is now possible to move from classification to captioning, creating models that not only recognize the action being performed (1 out of k) but can in parallel also predict the objects that are have been used in the videos (e.g. using a full natural language caption describing the video). So, for this leaderboard, we encourage you to upload a .csv file where the first column is the video's id from the test set and the second column is the caption. As a separator, please use a semicolon. When you upload your file, the leaderboard software will score your submission against ground truth using different metrics that have been popularized in the community. For details, please refer to our paper Fine-grained Video Classification and Captioning.

Rank
Name



Feedback