Natural Language-guided Neural Encoding Benchmark for Vision

Published: 10 Oct 2024, Last Modified: 20 Nov 2024NeuroAI @ NeurIPS 2024 PosterEveryoneRevisionsBibTeXCC BY-NC 4.0
Keywords: neural encoding, language-vision models, visual cortex, NeuroAI, multimodal AI
Abstract: Understanding the link between visual stimuli and their neural representations is key to advancing Human-Computer Interaction, particularly for therapeutic and assistive technologies. Developing language-guided visual response systems could significantly enhance support for individuals with visual impairments, providing personalized assistance through descriptive language for daily tasks. Advancements in generative multimodal networks highlight the promise of image captioning models for such systems. However, evaluating their biological plausibility requires a rigorous benchmark that assesses how well these models produce captions that align with neural encoding in the visual cortex. In this paper, we present a novel benchmarking approach to evaluate the alignment of image captioning models with neural activity patterns, using a dataset of visual exposures and neural recordings from primates and mice. This method allows for a comparison of various models based on their congruence with biological neural responses, aiding in the development of assistive technologies for visually impaired individuals. Our work extends beyond computational vision, providing valuable insights for designing neuro-inspired generative multimodal networks. These advancements hold transformative potential for health-related applications, including natural language-driven visual aids and therapeutic interventions for individuals with visual impairments.
Submission Number: 34
Loading