Keywords: olfaction, cross-modal learning, computer vision
Abstract: Olfaction---the ability to sense volatile molecules in the air---is a key way that animals, and to a lesser extent humans, perceive the world. However, this rich ``chemical world,'' is largely imperceptible to machines. One of the major obstacles to applying this approach to olfaction is the lack of suitable data and high quality feature representations. We address this problem in two ways. First, we propose a dataset of paired natural olfactory-visual data that is significantly more diverse and extensive than prior work. To capture it, we probe objects in natural indoor and outdoor environments with a smell sensor, while simultaneously recording video. Second, we use this dataset to learn self-supervised olfactory representations, by learning a joint embedding between visual and olfaction signals. We show that the resulting representation successfully transfers to a variety of downstream smell recognition tasks, such as recognizing different scenes, materials, and objects, and for making fine-grained distinctions between different types of grass.
Supplementary Material: pdf
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 531
Loading