Not logged in. Login | Signup

ImageNet Large Scale Visual Recognition Challenge 2010 (ILSVRC2010)

Held as a "taster competition" in conjunction with PASCAL Visual Object Classes Challenge 2010 (VOC2010)
Registration   Download   Introduction   Data   Task   Development kit   Timetable   Features   Submission   Citationnew   Organizers   Contact


  • September 2, 2014: A new paper which describes the collection of the ImageNet Large Scale Visual Recognition Challenge dataset, analyzes the results of the past five years of the challenge, and even compares current computer accuracy with human accuracy is now available. Please cite it when reporting ILSVRC2010 results or using the dataset.
  • For latest challenge, please visit here.
  • September 16, 2010: Slides for overview of results are available, along with slides from the two winning teams:

    Winner: NEC-UIUC
    Yuanqing Lin, Fengjun Lv, Shenghuo Zhu, Ming Yang, Timothee Cour, Kai Yu (NEC). LiangLiang Cao, Zhen Li, Min-Hsuan Tsai, Xi Zhou, Thomas Huang (UIUC). Tong Zhang (Rutgers).
    [PDF] NB: This is unpublished work. Please contact the authors if you plan to make use of any of the ideas presented.

    Honorable mention: XRCE
    Jorge Sanchez, Florent Perronnin, Thomas Mensink (XRCE)
    [PDF] NB: This is unpublished work. Please contact the authors if you plan to make use of any of the ideas presented.

  • September 3, 2010: Full results are available. Please join us at the VOC workshop at ECCV 2010 on 9/11/2010 at Crete, Greece. At the workshop we will provide an overview of the results and invite winning teams to present their methods. We look forward to seeing you there.
  • August 9, 2010: Submission deadline is extended to 4:59pm PDT, August 30, 2010. There will be no further extensions.
  • August 8, 2010: Submission site is up.
  • June 16, 2010: Test data is available for download!.
  • May 3, 2010: Training data, validation data and development kit are available for download!.
  • May 3, 2010: Registration is up!. Please register to stay updated.
  • Mar 18, 2010: We are preparing to run the ImageNet Large Scale Visual Recognition Challenge 2010 (ILSVRC2010)


The goal of this competition is to estimate the content of photographs for the purpose of retrieval and automatic annotation using a subset of the large hand-labeled ImageNet dataset (10,000,000 labeled images depicting 10,000+ object categories) as training. Test images will be presented with no initial annotation -- no segmentation or labels -- and algorithms will have to produce labelings specifying what objects are present in the images. In this initial version of the competition, the goal is only to identify the main objects present in images, not to specify the location of objects.


The validation and test data for this competition will consist of 200,000 photographs, collected from flickr and other search engines, hand labeled with the presence or absence of 1000 object categories. The 1000 object categories contain both internal nodes and leaf nodes of ImageNet, but do not overlap with each other. A random subset of 50,000 of the images with labels will be released as validation data included in the development kit along with a list of the 1000 categories. The remaining images will be used for evaluation and will be released without labels at test time.

The training data, the subset of ImageNet containing the 1000 categories and 1.2 million images, will be packaged for easy downloading. The validation and test data for this competition are not contained in the ImageNet training data (we will remove any duplicates).


For each image, algorithms will produce a list of at most 5 object categories in the descending order of confidence. The quality of a labeling will be evaluated based on the label that best matches the ground truth label for the image. The idea is to allow an algorithm to identify multiple objects in an image and not be penalized if one of the objects identified was in fact present, but not included in the ground truth.

There will be two versions of the evaluation criteria: a) non-hierarchical, treating all categories equally, and b) taking into account the hierarchical structure of the set of categories.

For each image, an algorithm will produce 5 labels lj, j=1,...,5. The ground truth labels for the image are gk, k=1,...,n with n objects labeled. The error of the algorithm for that image would be e=1/nΣkminjd(lj,gk). For criteria a)d(x,y)=0 if x=y and 1 otherwise. For criteria b) d(x,y)=height of the lowest common ancestor of x and y in the category hierarchy ( a subset of WordNet ). This is equivalent of predicting a path along the hierarchy and evaluating where the ground truth path and the predicted path diverge. For each criteria the overall error score for an algorithm is the average error over all test images.

Note that for this initial version of the competition, n=1, that is, one ground truth label per image.

Development kit

The development kit will include matlab software to demonstrate training using the ImageNet data (available for download separately from the development kit) and testing on the validation set. This will include routines to compute the overall error score with respect to each criteria.

Timetable (Tentative)

  • 3 May 2010: Development kit (training and validation data plus evaluation software) made available.
  • 16 June 2010: Test set and submission server will be made available
  • 4:59pm PDT, August 30, 2010 . Deadline for submission of results.
  • 11 September 2010: Workshop in assocation with ECCV 2010, Crete.


To facilitate easy participation, a set of baseline features will be provided for the images in the 1000 categories in ImageNet and the validation data and later the test data. Routines to demonstrate using this data will be included in the development kit. For 2010, features will include vector quantized SIFT features suitable for a bag of words or spatial pyramid representation.


Test data will be provided in the same format at the validation data, a directory of image files, but not including the labels. Submissions will consist of a text file with one line per image containing identified categories The format is demonstrated in the development kit.


If you are reporting results of the challenge or using the dataset, please cite:
    Olga Russakovsky*, Jia Deng*, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg and Li Fei-Fei. (* = equal contribution) ImageNet Large Scale Visual Recognition Challenge. IJCV, 2015. paper | bibtex | paper content on arxiv


Alex Berg ( Columbia Unviversity ), Jia Deng ( Princeton University ), Fei-Fei Li ( Stanford Unviersity )


Please feel free to send any questions or comments to