- Abstract: Expectation learning is a continuous learning process which uses known multisensory bindings to modulate unisensory perception. When perceiving an event, we have an expectation on what we should see or hear which affects our unisensory perception. Expectation learning is known to enhance the unisensory perception of previously known multisensory events. In this work, we present a novel hybrid deep recurrent model based on audio/visual autoencoders, for unimodal stimulus representation and reconstruction, and a recurrent self-organizing network for multisensory binding of the representations. The model adapts concepts of expectation learning to enhance the unisensory representation based on the learned bindings. We demonstrate that the proposed model is capable of reconstructing signals from one modality by processing input of another modality for 43,500 Youtube videos in the animal subset of the AudioSet corpus. Our experiments also show that when using expectation learning, the proposed model presents state-of-the-art performance in representing and classifying unisensory stimuli.
- Keywords: multisensory binding, expectation learning, unsupervised learning, Deep autoencoder, Growing-When-Required Network, animal recognition
- TL;DR: A hybrid deep neural network which adapts concepts of expectation learning for improving unisensory recognition using multisensory binding.