Abstract: Spoken language acquisition involves automatically developing symbolic word concepts grounding their meaning to the world, recognizing the words in spoken utterances, and pronouncing them. Previous research only partly covered these aspects. One of the most comprehensive agent systems supported the word concept acquisition from pairs of raw speech and image and utterance pronunciation. However, the agent listened to nothing when interacting with the world, only pronouncing a food name to choose a favorite one among two shown images. In this work, we add a function to the agent to recognize a verbal question. Namely, we design a task where the agent must recognize a question in a sound utterance and understand the logical “not” concept. Experimental results show that the agent successfully learns the task. It appropriately behaves even for unseen combinations of images, correctly answering the food names it wants or the opposite one according to the question.
0 Replies
Loading