Task and Perception-aware Distributed Source Coding for Correlated Speech under Bandwidth-constrained Channels

25 Nov 2024 (modified: 29 Dec 2024)AAAI 2025 Workshop AI4WCN SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: source coding, speech, neural distributed principal component analysis, rate distortion, perception
TL;DR: This paper proposes a neural distributed PCA-aided source coding algorithm for correlated speech, optimizing bandwidth, perceptual realism, and task-specific performance in dynamic wireless AR/VR applications.
Abstract: Emerging wireless AR/VR applications require real-time transmission of correlated high-fidelity speech from multiple resource-constrained devices over unreliable, bandwidth-limited channels. Existing autoencoder-based speech source coding methods fail to address the combination of the following - (1) dynamic bitrate adaptation without retraining the model, (2) leveraging correlations among multiple speech sources, and (3) balancing downstream task loss with realism of reconstructed speech. We propose a neural distributed principal component analysis (NDPCA)-aided distributed source coding algorithm for correlated speech sources transmitting to a central receiver. Our method includes a perception-aware downstream task loss function that balances perceptual realism with task-specific performance. Experiments show significant PSNR improvements under bandwidth constraints over naive autoencoder methods in task-agnostic (19%) and task-aware settings (52%). It also approaches the theoretical upper bound, where all correlated sources are sent to a single encoder, especially in low-bandwidth scenarios. Additionally, we present a rate-distortion-perception trade-off curve, enabling adaptive decisions based on application-specific realism needs.
Submission Number: 14
Loading