DONUT: CTC-based Query-by-Example Keyword Spotting

Anonymous

DONUT: CTC-based Query-by-Example Keyword Spotting

Anonymous

Published: 16 Nov 2018, Last Modified: 05 May 2023NIPS 2018 Workshop IRASL Blind SubmissionReaders: Everyone

Abstract: Keyword spotting—or wakeword detection—is an essential feature for hands-free operation of modern voice-controlled devices. With such devices becoming ubiquitous, users might want to choose a personalized custom wakeword. In this work, we present DONUT, a CTC-based algorithm for online query-by-example keyword spotting that enables custom wakeword detection. The algorithm works by recording a small number of training examples from the user, generating a set of label sequence hypotheses from these training examples, and detecting the wakeword by aggregating the scores of all the hypotheses given a new audio recording. Our method combines the generalization and interpretability of CTC-based keyword spotting with the user-adaptation and convenience of a conventional query-by-example system. DONUT has low computational requirements and is well-suited for both learning and inference on embedded systems without requiring private user data to be uploaded to the cloud.

TL;DR: We propose an interpretable model for detecting user-chosen wakewords that learns from the user's examples.

Keywords: keyword spotting, query-by-example, wakeword detection, CTC

7 Replies

Loading