ImageNet Large Scale Visual Recognition Challenge 2011 (ILSVRC2011)

Held in conjunction with PASCAL Visual Object Classes Challenge 2011 (VOC2011)

Introduction Data Task Development kit Timetable Submission Citation^new Organizers Contact Signup

News

September 2, 2014: A new paper which describes the collection of the ImageNet Large Scale Visual Recognition Challenge dataset, analyzes the results of the past five years of the challenge, and even compares current computer accuracy with human accuracy is now available. Please cite it when reporting ILSVRC2011 results or using the dataset.
Nov 27, 2011: Slides for overview of results are available, along with slides from the two winning teams:

Classification Winners: XRCE
Florent Perronnin, Jorge Sanchez
[PDF] Compressed Fisher vectors for Large Scale Visual Recognition

Detection Winners: University of Amsterdam & University of Trento
Koen van de Sande, Jasper Uijlings
Arnold Smeulders, Theo Gevers, Nicu Sebe, Cees Snoek
[PDF] Segmentation as Selective Search for Object Recognition
Nov 5, 2011: Schedule for the workshop added. See you there!!
Oct 26, 2011: Full results released!!
Oct 20, 2011: Submission is closed. Full results to be released soon.
Sep 19, 2011: Submission server is up. The deadline is extended to 4:59pm PDT, Oct 20, 2011. There will be no more extension.
July 31, 2011: Test data is released. The bounding box annotation for the validation set has also been updated. Please visit the same download page sent via email. If you can't access it, please register.
The devkit has been updated to include evaluation routines for the localization task
June 20, 2011: Registration page is up! Please register to obtain download links to data and to stay updated.
Mar 29, 2011: We are preparing to run the ImageNet Large Scale Visual Recognition Challenge 2011 (ILSVRC2011)

Schedule for the Large Scale Visual Recognition workshop, part of the Pascal challenge workshop (W02)

2:30-3:45 Introduction of Large Scale Visual Recognition Competition, Alex Berg
3:45-4:05 Classification Winner -- XRCE team, Florent Perronnin
4:10-4:30 Detection Winner -- University of Amsterdam & University of Trento, Koen van de Sande
4:35- General discussion and planning for next year.

Introduction

The goal of this competition is to estimate the content of photographs for the purpose of retrieval and automatic annotation using a subset of the large hand-labeled ImageNet dataset (10,000,000 labeled images depicting 10,000+ object categories) as training. Test images will be presented with no initial annotation -- no segmentation or labels -- and algorithms will have to produce labelings specifying what objects are present in the images. New test images will be collected and labeled especially for this competition and are not part of the previously published ImageNet dataset. The general goal is to identify the main objects present in images. This year, we also introduce a new task , specifying the location of objects.

More information is available on the webpage for last year's competition here:

ILSVRC 2010 .

Data

The validation and test data for this competition will consist of 150,000 photographs, collected from flickr and other search engines, hand labeled with the presence or absence of 1000 object categories. The 1000 object categories contain both internal nodes and leaf nodes of ImageNet, but do not overlap with each other. A random subset of 50,000 of the images with labels will be released as validation data included in the development kit along with a list of the 1000 categories. The remaining images will be used for evaluation and will be released without labels at test time.

The training data, the subset of ImageNet containing the 1000 categories and 1.2 million images, will be packaged for easy downloading. The validation and test data for this competition are not contained in the ImageNet training data (we will remove any duplicates).

Browse the training images of the 1000 categories here.

Task

Task 1: Classification

For each image, algorithms will produce a list of at most 5 object categories in the descending order of confidence. The quality of a labeling will be evaluated based on the label that best matches the ground truth label for the image. The idea is to allow an algorithm to identify multiple objects in an image and not be penalized if one of the objects identified was in fact present, but not included in the ground truth. There will be two versions of the evaluation criteria: a) non-hierarchical, treating all categories equally, and b) taking into account the hierarchical structure of the set of categories. For each image, an algorithm will produce 5 labels \( l_j, j=1,...,5 \). The ground truth labels for the image are \( g_k, k=1,...,n \) with n classes of objects labeled. The error of the algorithm for that image would be \( e= \frac{1}{n} \cdot \sum_k \min_j d(l_j,g_k) \). For criteria a) \( d(x,y)=0 \) if \( x=y \) and 1 otherwise. For criteria b) \( d(x,y) \) =height of the lowest common ancestor of x and y in the category hierarchy ( a subset of WordNet ), divided by the maximum possible height. This is equivalent of predicting a path along the hierarchy and evaluating where the ground truth path and the predicted path diverge. For each criteria the overall error score for an algorithm is the average error over all test images. Note that for this version of the competition, n=1, that is, one ground truth label per image.

Task 2(taster): Classification with localization

In this task, an algorithm will produce 5 class labels \( l_j, j=1,...,5 \) and 5 bounding boxes \( b_j, j=1,...5 \), one for each class label. The ground truth labels for the image are \( g_k, k=1,...,n \) with n classes labels. For each ground truth class label \(g_k\), the ground truth bounding boxes are \( z_{km}, m=1,...M_k, \) where \( M_k \) is the number of instances of the \( k^{th} \) object in the current image. The error of the algorithm for that image would be \[ e=\frac{1}{n} \cdot \sum_k min_{j} min_{m}^{M_k} max \{d(l_j,g_k), f(b_j,z_{km}) \} \] where \( f(b_j, z_k)=0 \) if \( b_j \) and \( z_{mk} \) has over 50% overlap, and \( f(b_j,z_{mk})=1 \) otherwise. In other words, the error will be the same as defined in task 1 if the localization is correct(i.e. the predicted bounding box overlaps over 50% with the ground truth bounding box, or in the case of multiple instances of the same class, with any of the ground truth bounding boxes). otherwise the error is 1(maximum). There will be two versions of \( d(l_j, g_k) \), just as in task 1.

Development Kit

The development kit includes

Meta data for the competition categories.
Matlab routines for evaluating submissions.

Please be sure to consult the readme file included in the development kit.

Development kit. 3MB.

Timetable

June 20 2011: Development kit (training and validation data plus evaluation software) made available.
June 30 2011: A patch containing more bounding boxes for the training images will be released
October 20, 2011 . Deadline for submission of results.
November 07, 2011: Pascal Challenge Workshop in association with ICCV 2011, Barcelona.

Citation^NEW

If you are reporting results of the challenge or using the dataset, please cite:

ImageNet Large Scale Visual Recognition Challenge

IJCV,

paper

bibtex

paper content on arxiv

Organizers

Alex Berg ( Stony Brook University )
Jia Deng ( Princeton University / Stanford University )
Sanjeev Satheesh ( Stanford Unviersity )
Hao Su ( Stanford Unviersity )
Fei-Fei Li ( Stanford Unviersity )

Contact

Please feel free to send any questions or comments to imagenet.help.desk@gmail.com.