
    <!-- Which section should this abstract be included in?
    1.) Architecture, Systems and Networks
    2.) Theory
    3.) Language, Learning, Vision and Grapics
    4.) Physical, Biological and Social Systems
    -->


    <!-- Capitalize appropriate first letters of your title
    use commas between authors and an & (&amp;) instead of the word "and"
     -->
 <html>
	<body style="text-align: left">

<div align="center">
<title>Tiny Images Dataset</title>
<h1><font face="Times New Roman" >Tiny Images Dataset</font></h1>

<a href="http://cs.nyu.edu/~fergus"> <font color="#3333CC">Rob Fergus [1]</font> </a>
&nbsp
<a href="http://web.mit.edu/torralba/www"><font
color="#3333CC">Antonio Torralba [2]</font></a> &nbsp
<a href="http://people.csail.mit.edu/billf"> <font
color="#3333CC">William T. Freeman [2]</font> </a>

<br>
<br>
<p style="margin-top: 0; margin-bottom: 0"><font face="Times New Roman">
<a href="http://cs.nyu.edu/"><font color="#000000">
<span style="text-decoration: none">[1] Dept. of Computer Science,
Courant Institute, New York University</span></font></a></font>
<p style="margin-top: 0; margin-bottom: 6pt"><font face="Times New Roman">
<a href="http://www.csail.mit.edu"><font color="#000000">
<span style="text-decoration: none">[2] Computer Science and Artificial
Intelligence Lab, Massachusetts Institute of Technology</span></font></a></font></p>
<br>
<br>
    <!--
    * Feel free to add as many section headers and paragraphs of text as you like
    * separate paragraphs with <p> tags
    * include <a href="http://link.to.page">links</a> where ever possible
    * use 12pt helvetica font gif images for inline mathematical equations
    for the best alignment of equations, use the align="absmiddle" tag (example below)
    <img main="math12pt.gif" alt="equation" width="40" height="16" align="absmiddle">
    * for more math tips, including a very handy "cheat sheet" for
    html mathematical characters and a list of conversion tools,
    go to http://abstracts.csail.mit.edu/math-converters.html
    -->
</font></a></font></div>

<h5 style="text-align: left"><font face="Georgia" style="font-size: 16pt">
Overview</font></h5>

<p style="text-align: left"><font face="Times New Roman">
This page has links for downloading the Tiny Images dataset, which
consists of 79,302,017 images, each being a 32x32 color image. This data is stored in the form of
large binary files which can be accesed by a Matlab toolbox that
we have written. You will need around 400Gb of free disk space to
store all the files. In total there are 5 files that need to be
downloaded, 3 of which are large binary files consisting of (i) the
images themselves; (ii) their associated metadata (filename, search
engine used, ranking etc.); (iii) Gist descriptors for each image. The
other two files are the Matlab toolbox and index data file that
together let you easily load in data from the binaries.
<br>
<br>

</div>



<h5 style="text-align: left">
<font face="Georgia" style="font-size:16pt"> 
Downloads
</font>
</h5>

Note that these files are very large and will take a considerable time
to download. Please ensure you have sufficient disk space before
commencing the download.

<br>
<br>

&nbsp 1. Image binary (227Gb) &nbsp<a href="http://horatio.cs.nyu.edu/mit/tiny/data/tiny_images.bin">
Download </a>

<br>
<br>

&nbsp 2. Metadata binary (57Gb)&nbsp <a href="http://horatio.cs.nyu.edu/mit/tiny/data/tiny_metadata.bin">
Download </a>

<br>
<br>

&nbsp 3. Gist binary (114Gb)&nbsp <a href="http://horatio.cs.nyu.edu/mit/tiny/data/tinygist80million.bin">
Download </a>

<br>
<br>

&nbsp 4. Index data (7Mb)&nbsp <a href="http://horatio.cs.nyu.edu/mit/tiny/data/tiny_index.mat">
Download </a>

<br>
<br>

&nbsp 5. Matlab Tiny Images toolbox (150Kb)&nbsp <a href="http://horatio.cs.nyu.edu/mit/tiny/code/distrib/tiny_code.zip">
Download </a>

<br>
<br>

<h5 style="text-align: left">
<font face="Georgia" style="font-size:16pt"> 
Instructions
</font>
</h5>

Overview<br>
--------<br>
The 79 million images are stored in one giant binary file, 227Gb in
size. The metadata accompanying each image is also in a single giant
file, 57Gb in size. To read images/metadata from these files, we have
provided some Matlab wrapper functions.<br>
<br>
There are two versions of the functions for reading image data:
<br>
(i) loadTinyImages.m - plain Matlab function (no MEX), runs under
32/64bits. Loads images in by image number. Use this by default.
<br>
(ii) read_tiny_big_binary.m - Matlab wrapper for 64-bit MEX
function. A bit faster and more flexible than (i), but requires a 64-bit machine.
<br>
<br>
There are two types of annotation data:
<br>
(i) Manual annotation data, sorted in annotations.txt, that holds the
label of images manually inspected to see if image content agrees with
noun used to collect it. Some other information, such as search
engine, is also stored. This data is available for only a very small
portion of images.<br>

(ii) Automatic annotation data, stored in tiny_metadata.bin,
consisting of information relating the gathering of the image,
e.g. search engine, which page, url to thumbnail etc. This data is
available for all 79 million images.
<br>
<br>
Requirements<br>
------------<br>
1. Around 300Gb of disk space.<br>
<br>
2. If you want to use the MEX versions of the code for reading in the
  data, you will need a 64-bit machine. But for most purposes, the
  Matlab implementation (loadTinyImages.m), which can use either 32 or
  64bits will work perfectly well. To discover if you have a 32/64bit machine, type 'uname -a' in an xterm (if using linux). 
<br>
<br>
Files<br>
-----<br>
<br>
The .tgz file should contain 10 files<br>
<br>
1. loadTinyImages.m -- read tiny image data, pure Matlab version.<br>
2. loadGroundTruth.m -- read annotations.txt file holding manual annotations<br>
3. read_tiny_big_binary.m -- read tiny image data, 64-bit Matlab/MEX version<br>
4. read_tiny_big_metadata.m -- read tiny image metadata, 64-bit Matlab/MEX version<br>
5. read_tiny_gist_binary.m -- read tiny Gist, 64-bit Matlab/MEX version<br>
6. read_tiny_binary_big_core.c -- 64-bit MEX source code for image reading<br>
7. read_tiny_metadata_big_core.c -- 64-bit MEX source code for metadata reading<br>
8. read_tiny_binary_gist_core.c -- 64-bit MEX source code for gist reading<br>
9. compute_hash_function.m -- utility function to do fast string searching
                as used by read_tiny_big_binary.m and read_tiny_big_metadata.m<br>
10. fast_str2num.m -- utility function for --
		   -- read_tiny_big_metadata.m<br>
11. annotations.txt -- text file holding list of annotated images<br>
12. README.txt -- this file<br>
<br>
Separately, you should have downloaded the following files<br>
<br>
1. tiny_images.bin - 227Gb file holding 79,302,017 images<br>
2. tiny_metadata.bin - 57Gb file holding metadata for all 79,302,017 images<br>
3. tinygist80million.bin - 114Gb file holding 384-dim Gist descriptors
for all 79,302,017 images<br>
4. tiny_index.mat - 7Mb file holding index info, including:<br>
&nbsp &nbsp &nbsp &nbsp    word - cell array of all 75,846 nouns for which we have images in tiny_images.bin<br>
&nbsp &nbsp &nbsp &nbsp     num_imgs - vector with #images per noun for all 75,846 nouns         <br>        
<br>
Preliminaries<br>
-------------<br>
Before the functions can be used you must do two things:<br>
<br>
1. Set the absolute paths to the binary files in the Matlab functions.
There are a total of 7 lines that must be set:<br>
<br>
  (i) loadTinyImages.m, line 14 -- set path to tiny_images.bin file<br>
 (ii) read_tiny_big_binary.m, line 40 -- set path to tiny_images.bin file <br> 
(iii) read_tiny_big_binary.m, line 42 -- set path to tiny_index.mat file  <br> 
(iv) read_tiny_big_metadata.m, line 63 -- set path to tiny_metadata.bin file  <br>  
 (v) read_tiny_big_metadata.m, line 65 -- set path to tiny_index.mat file  <br> 
(vi) read_tiny_gist_binary.m, line 36 -- set path to tiny_index.mat file  <br> 
(vii) read_tiny_gist_binary.m, line 38 -- set path to tiny_metadata.bin file  <br><br>  
2. If using the MEX versions, they must be compiled with the commands:<br>
  (i)      mex read_tiny_binary_big_core.c<br>
 (ii)     mex read_tiny_metadata_big_core.c<br>
 (iii)     mex read_tiny_binary_gist_core.c<br>
<br>
Usage<br>
-----<br>
<br>
Here are some examples of the scripts in use. Please look at the
comments at the top of each file for more extensive explanations.<br>
<br>
loadTinyImages.m<br>
---------------<br>
<br>
% load in first 10 images from 79,302,017 images<br>
img = loadTinyImages([1:10]);<br>
<br>
% load in 10 images at random q = randperm(79302017);<br>
img = loadTinyImages(q(1:10));<br>
%% N.B. function does NOT sort indices, so sorting beforehand would<br>
%% improve speed.<br>
<br>
<br>
loadGroundTruth.m<br>
-----------------<br>
<br>
% read in contents of annotation.txt file<br>
[imageFileName, keyword, correct, engine, ind_engine, image_ndx]=loadGroundTruth;<br>
%%% the labeling convention in correct is:<br>
% -1 = Incorrect, 0 = Skipped, 1 = Correct<br>
% Note that this different to the 'label' field produced by % read_tiny_big_metadata
below (meaning of -1 and 0 are swapped)<br>
% but the annotation.txt file information should be used in preference to<br>
% that from read_tiny_big_metadata.m<br>
<br>
<br>64-bit MEX versions:
<br>--------------------
<br>
<br>
read_tiny_big_metadata.m<br>
----------------------<br>
<br>
% load in filenames of first 10 images<br>
data = read_tiny_big_metadata([1:10],{'filename'});<br>
<br>
% load in search engine used for<br>
% first 10 images from noun 'aardvark';<br>
<br>
data = read_tiny_big_metadata('aardvark',[1:10],{'engine'});<br>
<br>
read_tiny_big_binary.m<br>
----------------------<br>
<br>
% load in first 10 images from 79,302,017 images<br>
img = read_tiny_big_binary([1:10]);<br>
% note output dimension is 3072x10, rather than 32x32x3x10 % as for loadTinyImages.m<br>
<br>
% load in first 10 images from noun 'dog';<br>
q = randperm(79302017);<br>
img = read_tiny_big_binary('dog',q(1:10));<br>
% function sorts indices internally for speed<br>
<br>
% load in images for different nouns<br>
img = read_tiny_big_binary({'dog','cat','mouse','pig'},{[1:5],[1:2:10],[8
13],[4:-1:1]});<br>
<br>
 


<!-- Site Meter -->
<p align="right">
<script type="text/javascript" src="http://s27.sitemeter.com/js/counter.js?site=s27robfergus">
</script>
<noscript>
<a href="http://s27.sitemeter.com/stats.asp?site=s27robfergus" target="_top">
<img src="http://s27.sitemeter.com/meter.asp?site=s27robfergus" alt="Site Meter" border="0"/></a></noscript>
<!-- Copyright (c)2006 Site Meter --> </p>
</body>

</html>
