DONUT: CTC-based Query-by-Example Keyword SpottingDownload PDF


22 Oct 2018, 22:22 (modified: 10 Sept 2019, 21:48)NIPS 2018 Workshop IRASL Blind SubmissionReaders: Everyone
Abstract: Keyword spotting—or wakeword detection—is an essential feature for hands-free operation of modern voice-controlled devices. With such devices becoming ubiquitous, users might want to choose a personalized custom wakeword. In this work, we present DONUT, a CTC-based algorithm for online query-by-example keyword spotting that enables custom wakeword detection. The algorithm works by recording a small number of training examples from the user, generating a set of label sequence hypotheses from these training examples, and detecting the wakeword by aggregating the scores of all the hypotheses given a new audio recording. Our method combines the generalization and interpretability of CTC-based keyword spotting with the user-adaptation and convenience of a conventional query-by-example system. DONUT has low computational requirements and is well-suited for both learning and inference on embedded systems without requiring private user data to be uploaded to the cloud.
TL;DR: We propose an interpretable model for detecting user-chosen wakewords that learns from the user's examples.
Keywords: keyword spotting, query-by-example, wakeword detection, CTC
7 Replies