Keywords: spoken language understanding, resource-constrained devices, privacy-preserving
TL;DR: We provide a lightweight, privacy-preserving encoder that can be efficiently embedded into low-power audio devices.
Abstract: Speech serves as a ubiquitous input interface for embedded mobile devices.
Cloud-based solutions, while offering powerful speech understanding services, raise significant concerns regarding user privacy.
To address this, disentanglement-based encoders have been proposed to remove sensitive information from speech signals without compromising the speech understanding functionality.
However, these encoders demand high memory usage and computation complexity, making them impractical for resource-constrained wimpy devices.
Our solution is based on a key observation that speech understanding hinges on long-term dependency knowledge of the entire utterance, in contrast to privacy-sensitive elements that are short-term dependent.
Exploiting this observation, we propose SILENCE, a lightweight system that selectively obscuring short-term details, without damaging the long-term dependent speech understanding performance.
The crucial part of SILENCE is a differential mask generator derived from interpretable learning to
automatically configure the masking process.
We have implemented SILENCE on the STM32H7 microcontroller and evaluate its efficacy under different attacking scenarios.
Our results demonstrate that SILENCE offers speech understanding performance and privacy protection capacity comparable to existing encoders, while achieving up to 53.3$\times$ speedup and 134.1$\times$ reduction in memory footprint.
Primary Area: Speech and audio
Submission Number: 3804
Loading