NAACA: Training-Free NeuroAuditory Attentive Cognitive Architecture with Oscillatory Working Memory for Salience-Driven Attention Gating
Abstract: Audio provides critical situational cues, yet current Audio Language Models (ALMs) face an attention bottleneck in long-form recordings where dominant background patterns can dilute rare, salient events. We introduce NAACA, a training-free NeuroAuditory Attentive Cognitive Architecture that reframes attention allocation as an auditory salience filtering problem. At its core is OWM, a neuro-inspired Oscillatory Working Memory that maintains stable attractor-like states and triggers higher-cognition ALM processing only when adaptive energy fluctuations signal perceptual salience, triggering higher-level reasoning. On XD-Violence, NAACA improves AudioQwen’s average precision (AP) from 53.50\% to 70.60\% while reducing unnecessary ALM invocations. Furthermore, qualitative case studies on the Urban Soundscapes of the World (USoW) dataset show that OWM captures novel events and subcategory shifts while remaining robust to transient pauses and ambient urban noise.
Lay Summary: Audio recordings often contain long periods of background sound, but important events such as shouting, fighting, sirens, or sudden environmental changes may occur only briefly. Current audio language models can struggle with this setting because they process long recordings with limited attention, so rare events may be overlooked or become diluted by earlier background sounds.
We propose NAACA, a training-free system that helps audio language models focus on the most important moments in a recording. Its core component is Oscillatory Working Memory, a bio-inspired mechanism that keeps track of stable sound patterns and reacts when the audio changes in a meaningful way. Instead of sending every audio segment to a large model, NAACA selects segments that are likely to contain salient changes and forwards only those for higher-level interpretation.
This makes long-form audio understanding more accurate and efficient. On a violence detection benchmark, NAACA improves AudioQwen’s performance while reducing unnecessary model calls, suggesting a practical path toward real-time audio monitoring in resource-limited settings.
Originally Submitted Supplementary Material: zip
Link To Code: https://github.com/zjyuan1208/NAACA-Oscillatory-Working-Memory
Primary Area: Applications->Neuroscience, Cognitive Science
Keywords: Neuro-inspired Architecture, Auditory Salience Filtering, Oscillatory Working Memory, Attentional Bottleneck, Training-free Adaptation
Originally Submitted PDF: pdf
Submission Number: 832
Loading