TL;DR: Feature vectors from SoundNet can predict brain activity of subjects watching a movie in auditory and language related brain regions.
Keywords: neuroimaging, deep learning, transfer learning, audio, encoding models
Abstract: The purpose of an encoding model is to predict brain activity given a stimulus. In this contribution, we attempt at estimating a whole brain encoding model of auditory perception in a naturalistic stimulation setting. We analyze data from an open dataset, in which 16 subjects watched a short movie while their brain activity was being measured using functional MRI. We extracted feature vectors aligned with the timing of the audio from the movie, at different layers of a Deep Neural Network pretrained on the classification of auditory scenes. fMRI data was parcellated using hierarchical clustering on 500 parcels, and encoding models were estimated using a fully connected neural network with one hidden layer, trained to predict the signals for each parcel from the DNN features. Individual encoding models were successfully trained and predicted brain activity on unseen data, in parcels located in the superior temporal lobe, as well as dorsolateral prefrontal regions, which are usually considered as areas involved in auditory and language processing. Taken together, this contribution extends previous attempts on estimating encoding models, by showing the ability to model brain activity using a generic DNN (ie not specifically trained for this purpose) to extract auditory features, suggesting a degree of similarity between internal DNN representations and brain activity in naturalistic settings.