Not logged in. Login | Signup

ImageNet Large Scale Visual Recognition Challenge 2010 (ILSVRC2010)

Held as a "taster competition" in conjunction with PASCAL Visual Object Classes Challenge 2010 (VOC2010)
Back to Main page  


When using the dataset, please cite:
    Olga Russakovsky*, Jia Deng*, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg and Li Fei-Fei. (* = equal contribution) ImageNet Large Scale Visual Recognition Challenge. arXiv:1409.0575, 2014. paper | bibtex

Ground truth for test data

Patch for images and precomputed features ( updated on 7/17/2010 )

Due to some error in our image collection process, a very small portion of packaged images are blank images returned from websites where the original images have become unavailable. We found 970 (out of 1.2M) such images in training, 9 (out of 50K) in validation and 19 (out of 150K) in test. Although this should not noticeably impact training and testing, we release a patch that contains the correct images (6MB) and the correct pre-computed features (80MB). Please go to the "images" and "features" download sections to download the patches.

To apply the patch to the images, simply replace the old images with the new ones in the patch. For precomputed features,we provide a matlab program to modify your old feature files. Please consult the readme files for details.

Development Kit

The development kit includes
  • Meta data for the competition categories.
  • Matlab routines for evaluating submissions.
  • A demo implementing and evaluating a simple baseline system using precomputed SIFT[1,2] features and LIBLINEAR[3].
  • Code for computing the features used in the baseline demo.

Please be sure to consult the readme file included in the development kit.


Please login to download the original images


We have computed dense SIFT[1] features for all iamges -- training, validation and test. They are available for download (features for test data will be made available later).

Each image is resized to have a max side length of 300 pixel (smaller images are not enlarged). SIFT descriptors are computed on 20x20 overlapping patches with a spacing of 10 pixels. Images are further downsized (to 1/2 the side length and then 1/4 of the side length) and more descriptors are computed. We use the VLFeat[2] implemenation of dense SIFT (version

We then perform k-means clustering of a random subset of 10 million SIFT descriptors to form a visual vocabulary of 1000 visual words. Each SIFT descriptor is quantized into a visual word using the nearest cluster center.

We provide both raw SIFT features (vldsift) and the visual codewords (sbow). Spatial coordiates of each descriptor/codeword are also included.

To run the demo system included in the development kit, you need to download the visual words features( for train and validation). Note that the raw SIFT features are not needed to run the demo code.

Please consult the readme file in the development kit for more details.

If you already have the ImageNet Fall09 Release...

If you already have the ImageNet 2009 Fall Release, you can download only the newly added/changed images, although extra work is needed to ensure you have the exact set of images.

    Training images (Upgrade from Fall09). 23GB. MD5: f8def328bcf88cecbb2153dcdcd4da03

    Terms of use: by downloading the image data from the above ULR, you agree to the following terms:

    1. You will use the data only for non-commercial research and educational purposes.
    2. You will NOT distribute the above URL(s).
    3. Stanford University and Princeton University make no representations or warranties regarding the data, including but not limited to warranties of non-infringement or fitness for a particular purpose.
    4. You accept full responsibility for your use of the data and shall defend and indemnify Stanford University and Princeton University, including their employees, officers and agents, against any and all claims arising from your use of the data, including but not limited to your use of any copies of copyrighted images that you may create from the data.
After you download the upgrade tar files, untar it and overwite your existing images. Then you can get the exact set of training images using the following list of file names. MD5 values for each file are also included to help you check.


  1. David G. Lowe, Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, 2004. pdf
  2. A. Vedaldi and B. Fulkerson. VLFeat: An Open and Portable Library of Computer Vision Algorithms. 2008.
  3. R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research 9(2008), 1871-1874.