The data set used is SIF-128-euclidean.hdf5. Download website: http://corpus-texmex.irisa.fr/