Multi-view Recurrent Neural Acoustic Word Embeddings

Wanjia He; Weiran Wang; Karen Livescu

Multi-view Recurrent Neural Acoustic Word Embeddings

Wanjia He, Weiran Wang, Karen Livescu

Published: 06 Feb 2017, Last Modified: 12 Oct 2025ICLR 2017 PosterReaders: Everyone

Abstract: Recent work has begun exploring neural acoustic word embeddings–fixed dimensional vector representations of arbitrary-length speech segments corresponding to words. Such embeddings are applicable to speech retrieval and recognition tasks, where reasoning about whole words may make it possible to avoid ambiguous sub-word representations. The main idea is to map acoustic sequences to fixed-dimensional vectors such that examples of the same word are mapped to similar vectors, while different-word examples are mapped to very different vectors. In this work we take a multi-view approach to learning acoustic word embeddings, in which we jointly learn to embed acoustic sequences and their corresponding character sequences. We use deep bidirectional LSTM embedding models and multi-view contrastive losses. We study the effect of different loss variants, including fixed-margin and cost-sensitive losses. Our acoustic word embeddings improve over previous approaches for the task of word discrimination. We also present results on other tasks that are enabled by the multi-view approach, including cross-view word discrimination and word similarity.

Conflicts: uchicago.edu, ttic.edu

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/multi-view-recurrent-neural-acoustic-word/code)

9 Replies

Loading