ImageNet Large Scale Visual Recognition Challenge 2017 (ILSVRC2017)

Introduction News History Timetable Challenges FAQ Citation Contact

Introduction

This challenge evaluates algorithms for object localization/detection from images/videos at scale. Most successful and innovative teams will be invited to present at CVPR 2017 workshop.

Object localization for 1000 categories.
Object detection for 200 fully labeled categories.
Object detection from video for 30 fully labeled categories.

News

Jul 26, 2017: We are passing the baton to Kaggle. From now on, all three challenges(LOC-CLS, DET, VID) will be hosted on Kaggle!
Jul 17, 2017: Results announced.
Jun 25, 2017: Submission server for VID is open, new additional train/val/test images for VID is available now, deadline for VID is extended to July 7, 2017 5pm PDT.
Jun 18, 2017: Submission server for CLS-LOC and DET is open.
Jun 15, 2017: Taster challenges with amazon bin image dataset will not be held. There were some issues on final dataset release. We sincerely apologize to the teams that have been working on this challenge.
Jun 12, 2017: New additional test set(5,500 images) for object detection is available now.
Mar 31, 2017: Register your team and download data at here.
Mar 31, 2017: Tentative time table is announced.
Mar 13, 2017: Stay tuned. Coming soon.

History

2016, 2015, 2014, 2013, 2012, 2011, 2010

Tentative Timetable

Mar 31, 2017: Development kit, data, and registration made available.
Jun 30, 2017, 5pm PDT: Submission deadline.
July 5, 2017: Challenge results will be released.
July 26, 2017: Most successful and innovative teams present at CVPR 2017 workshop.

The data for the classification and localization tasks will remain unchanged from ILSVRC 2012 . The validation and test data will consist of 150,000 photographs, collected from flickr and other search engines, hand labeled with the presence or absence of 1000 object categories. The 1000 object categories contain both internal nodes and leaf nodes of ImageNet, but do not overlap with each other. A random subset of 50,000 of the images with labels will be released as validation data included in the development kit along with a list of the 1000 categories. The remaining images will be used for evaluation and will be released without labels at test time. The training data, the subset of ImageNet containing the 1000 categories and 1.2 million images, will be packaged for easy downloading. The validation and test data for this competition are not contained in the ImageNet training data.

In this task, given an image an algorithm will produce 5 class labels $c_i, i=1,\dots 5$ in decreasing order of confidence and 5 bounding boxes $b_i, i=1,\dots 5$, one for each class label. The quality of a localization labeling will be evaluated based on the label that best matches the ground truth label for the image and also the bounding box that overlaps with the ground truth. The idea is to allow an algorithm to identify multiple objects in an image and not be penalized if one of the objects identified was in fact present, but not included in the ground truth.

The ground truth labels for the image are $C_k, k=1,\dots n$ with $n$ class labels. For each ground truth class label $C_k$, the ground truth bounding boxes are $B_{km},m=1\dots M_k$, where $M_k$ is the number of instances of the $k^\text{th}$ object in the current image.

Let $d(c_i,C_k) = 0$ if $c_i = C_k$ and 1 otherwise. Let $f(b_i,B_k) = 0$ if $b_i$ and $B_k$ have more than $50\%$ overlap, and 1 otherwise. The error of the algorithm on an individual image will be computed using:

\[ e=\frac{1}{n} \cdot \sum_k min_{i} min_{m} max \{d(c_i,C_k), f(b_i,B_{km}) \} \] The winner of the object localization challenge will be the team which achieves the minimum average error across all test images.

II: Object detection

The training and validation data for the object detection task will remain unchanged from ILSVRC 2014. The test data will be partially refreshed with new images based upon last year's competition(ILSVRC 2016). There are 200 basic-level categories for this task which are fully annotated on the test data, i.e. bounding boxes for all categories in the image have been labeled. The categories were carefully chosen considering different factors such as object scale, level of image clutterness, average number of object instance, and several others. Some of the test images will contain none of the 200 categories. Browse all annotated detection images here.

For each image, algorithms will produce a set of annotations $(c_i, s_i, b_i)$ of class labels $c_i$, confidence scores $s_i$ and bounding boxes $b_i$. This set is expected to contain each instance of each of the 200 object categories. Objects which were not annotated will be penalized, as will be duplicate detections (two annotations for the same object instance). The winner of the detection challenge will be the team which achieves first place accuracy on the most object categories.

III: Object detection from video

This is similar in style to the object detection task. We will partially refresh the validation and test data for this year's competition. There are 30 basic-level categories for this task, which is a subset of the 200 basic-level categories of the object detection task. The categories were carefully chosen considering different factors such as movement type, level of video clutterness, average number of object instance, and several others. All classes are fully labeled for each clip. Browse all annotated train/val snippets here.

For each video clip, algorithms will produce a set of annotations $(f_i, c_i, s_i, b_i)$ of frame number $f_i$, class labels $c_i$, confidence scores $s_i$ and bounding boxes $b_i$. This set is expected to contain each instance of each of the 30 object categories at each frame. The evaluation metric is the same as for the objct detection task, meaning objects which are not annotated will be penalized, as will duplicate detections (two annotations for the same object instance). The winner of the detection from video challenge will be the team which achieves best accuracy on the most object categories.

FAQ

1. Are challenge participants required to reveal all details of their methods?

Entries to ILSVRC2017 can be either "open" or "closed." Teams submitting "open" entries will be expected to reveal most details of their method (special exceptions may be made for pending publications). Teams may choose to submit a "closed" entry, and are then not required to provide any details beyond an abstract. The motivation for introducing this division is to allow greater participation from industrial teams that may be unable to reveal algorithmic details while also allocating more time at the Beyond ImageNet Large Scale Visual Recognition Challenge Workshop to teams that are able to give more detailed presentations. Participants are strongly encouraged to submit "open" entries if possible.

2. Can additional images or annotations be used in the competition?

Entries submitted to ILSVRC2017 will be divided into two tracks: "provided data" track (entries only using ILSVRC2017 images and annotations from any aforementioned tasks, and "external data" track (entries using any outside images or annotations). Any team that is unsure which track their entry belongs to should contact the organizers ASAP. Additional clarifications will be posted here as needed.

3. How many entries can each team submit per competition?

Participants who have investigated several algorithms may submit one result per algorithm (up to 5 algorithms). Changes in algorithm parameters do not constitute a different algorithm (following the procedure used in PASCAL VOC).