Self-Supervised Spoken Question Understanding and Speaking with Automatic Vocabulary Learning

Keisuke Toyoda, Yusuke Kimura, Mingxin Zhang, Kent Hino, Kosuke Mori, Takahiro Shinozaki

2021 (modified: 26 Apr 2023)O-COCOSDA 2021Readers: Everyone

Abstract: Spoken language acquisition involves automatically developing symbolic word concepts grounding their meaning to the world, recognizing the words in spoken utterances, and pronouncing them. Previous research only partly covered these aspects. One of the most comprehensive agent systems supported the word concept acquisition from pairs of raw speech and image and utterance pronunciation. However, the agent listened to nothing when interacting with the world, only pronouncing a food name to choose a favorite one among two shown images. In this work, we add a function to the agent to recognize a verbal question. Namely, we design a task where the agent must recognize a question in a sound utterance and understand the logical “not” concept. Experimental results show that the agent successfully learns the task. It appropriately behaves even for unseen combinations of images, correctly answering the food names it wants or the opposite one according to the question.

0 Replies