ImageNet Large Scale Visual Recognition Challenge 2010 (ILSVRC2010)

Held as a "taster competition" in conjunction with PASCAL Visual Object Classes Challenge 2010 (VOC2010)

Citation^NEW

When using the dataset, please cite both of the following two papers as the data depends on significant contributions from both:

Imagenet: A Large-Scale Hierarchical Image Database

CVPR

bibtex

ImageNet Large Scale Visual Recognition Challenge

arXiv:1409.0575,

paper

bibtex

Patch for images and precomputed features ( updated on 7/17/2010 )

Due to some error in our image collection process, a very small portion of packaged images are blank images returned from websites where the original images have become unavailable. We found 970 (out of 1.2M) such images in training, 9 (out of 50K) in validation and 19 (out of 150K) in test. Although this should not noticeably impact training and testing, we release a patch that contains the correct images (6MB) and the correct pre-computed features (80MB). Please go to the "images" and "features" download sections to download the patches.

To apply the patch to the images, simply replace the old images with the new ones in the patch. For precomputed features,we provide a matlab program to modify your old feature files. Please consult the readme files for details.

Development Kit

The development kit includes

Meta data for the competition categories.
Matlab routines for evaluating submissions.
A demo implementing and evaluating a simple baseline system using precomputed SIFT[1,2] features and LIBLINEAR[3].
Code for computing the features used in the baseline demo.

Please be sure to consult the readme file included in the development kit.

Development kit. 3MB.

Images

Features

We have computed dense SIFT[1] features for all iamges -- training, validation and test. They are available for download (features for test data will be made available later).

Each image is resized to have a max side length of 300 pixel (smaller images are not enlarged). SIFT descriptors are computed on 20x20 overlapping patches with a spacing of 10 pixels. Images are further downsized (to 1/2 the side length and then 1/4 of the side length) and more descriptors are computed. We use the VLFeat[2] implemenation of dense SIFT (version 0.9.4.1).

We then perform k-means clustering of a random subset of 10 million SIFT descriptors to form a visual vocabulary of 1000 visual words. Each SIFT descriptor is quantized into a visual word using the nearest cluster center.

We provide both raw SIFT features (vldsift) and the visual codewords (sbow). Spatial coordiates of each descriptor/codeword are also included.

To run the demo system included in the development kit, you need to download the visual words features( for train and validation). Note that the raw SIFT features are not needed to run the demo code.

Please consult the readme file in the development kit for more details.

^New! Patch for all features.. 80MB.

Visual words (sbow) for training. 5.1GB. MD5: 0e0257af7a524aee89a2ce6246798a3f

Visual words (sbow) for validation. 205MB. MD5: b20164d925280b45219b51c2122cbd61

Visual words (sbow) for test. 613MB. MD5: de53389fd1972e2bb32cf5083efe01dc

Raw SIFT features (vldsift) for training. 375GB. MD5: aa2fdaa6fb119a451a23acd55bc57831

Raw SIFT features (vldsift) for validation. 15GB. MD5: c1b343347d8add28875332fc0f97e398

Raw SIFT features (vldsift) for test. 45GB. MD5: a3d348d9eba5db60ab1474ed10dd2bec

If you already have the ImageNet Fall09 Release...

If you already have the ImageNet 2009 Fall Release, you can download only the newly added/changed images, although extra work is needed to ensure you have the exact set of images.

Training images (Upgrade from Fall09). 23GB. MD5: f8def328bcf88cecbb2153dcdcd4da03

Terms of use: by downloading the image data from the above ULR, you agree to the following terms:

You will use the data only for non-commercial research and educational purposes.

You will NOT distribute the above URL(s).
Stanford University and Princeton University make no representations or warranties regarding the data, including but not limited to warranties of non-infringement or fitness for a particular purpose.

You accept full responsibility for your use of the data and shall defend and indemnify Stanford University and Princeton University, including their employees, officers and agents, against any and all claims arising from your use of the data, including but not limited to your use of any copies of copyrighted images that you may create from the data.

After you download the upgrade tar files, untar it and overwite your existing images. Then you can get the exact set of training images using the following list of file names. MD5 values for each file are also included to help you check.

List of file names and MD5 values for all training images. 29MB.

References

David G. Lowe, Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, 2004. pdf
A. Vedaldi and B. Fulkerson. VLFeat: An Open and Portable Library of Computer Vision Algorithms. 2008. http://www.vlfeat.org
R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research 9(2008), 1871-1874. http://www.csie.ntu.edu.tw/~cjlin/liblinear/