ImageNet Large Scale Visual Recognition Challenge 2010 (ILSVRC2010)
-
Olga Russakovsky*, Jia Deng*, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg and Li Fei-Fei. (* = equal contribution) ImageNet Large Scale Visual Recognition Challenge. arXiv:1409.0575, 2014.
paper |
bibtex
Patch for images and precomputed features ( updated on 7/17/2010 )
To apply the patch to the images, simply replace the old images with the new ones in the patch. For precomputed features,we provide a matlab program to modify your old feature files. Please consult the readme files for details.
The development kit includes- Meta data for the competition categories.
- Matlab routines for evaluating submissions.
- A demo implementing and evaluating a simple baseline system using precomputed SIFT[1,2] features and LIBLINEAR[3].
- Code for computing the features used in the baseline demo.
Please be sure to consult the readme file included in the development kit.
Development kit. 3MB.
Each image is resized to have a max side length of 300 pixel (smaller images are not enlarged). SIFT descriptors are computed on 20x20 overlapping patches with a spacing of 10 pixels. Images are further downsized (to 1/2 the side length and then 1/4 of the side length) and more descriptors are computed. We use the VLFeat[2] implemenation of dense SIFT (version 0.9.4.1).
We then perform k-means clustering of a random subset of 10 million SIFT descriptors to form a visual vocabulary of 1000 visual words. Each SIFT descriptor is quantized into a visual word using the nearest cluster center.
We provide both raw SIFT features (vldsift) and the visual codewords (sbow). Spatial coordiates of each descriptor/codeword are also included.
To run the demo system included in the development kit, you need to download the visual words features( for train and validation). Note that the raw SIFT features are not needed to run the demo code.
Please consult the readme file in the development kit for more details.
New!  Patch for all features.. 80MB.
Visual words (sbow) for training. 5.1GB. MD5: 0e0257af7a524aee89a2ce6246798a3f
Visual words (sbow) for validation. 205MB. MD5: b20164d925280b45219b51c2122cbd61
Visual words (sbow) for test. 613MB. MD5: de53389fd1972e2bb32cf5083efe01dc
Raw SIFT features (vldsift) for training. 375GB. MD5: aa2fdaa6fb119a451a23acd55bc57831
Raw SIFT features (vldsift) for validation. 15GB. MD5: c1b343347d8add28875332fc0f97e398
Raw SIFT features (vldsift) for test. 45GB. MD5: a3d348d9eba5db60ab1474ed10dd2bec
If you already have the ImageNet Fall09 Release...
- You will use the data only for non-commercial research and educational purposes.
- You will NOT distribute the above URL(s).
- Stanford University and Princeton University make no representations or warranties regarding the data, including but not limited to warranties of non-infringement or fitness for a particular purpose.
- You accept full responsibility for your use of the data and shall defend and indemnify Stanford University and Princeton University, including their employees, officers and agents, against any and all claims arising from your use of the data, including but not limited to your use of any copies of copyrighted images that you may create from the data.
Training images (Upgrade from Fall09). 23GB. MD5: f8def328bcf88cecbb2153dcdcd4da03
Terms of use: by downloading the image data from the above ULR, you agree to the following terms:
- David G. Lowe, Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, 2004. pdf
- A. Vedaldi and B. Fulkerson. VLFeat: An Open and Portable Library of Computer Vision Algorithms. 2008. http://www.vlfeat.org
- R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research 9(2008), 1871-1874. http://www.csie.ntu.edu.tw/~cjlin/liblinear/