Interactive Artistic Text-To-Voice: Tungnaá and Bla Blavatar vs Jaap Blonk

Published: 27 Sept 2025, Last Modified: 09 Nov 2025NeurIPS Creative AI Track 2025EveryoneRevisionsBibTeXCC BY 4.0
Track: Paper
Keywords: text to voice, interactive artistic text to voice, voice synthesis, real time
TL;DR: we define a creative AI task for interactive voice synthesis, demonstrate an open-source implementation, and describe an artistic application
Abstract: Advances in deep learning have enabled speech synthesis to rival human speech in realism. While many artists have experimented with these technologies, real-time applications have been limited. We define a new task, interactive artistic text-to-voice (IATV), in order to bridge this gap. We also present a novel IATV system which achieves low-latency synthesis, interactivity, and controllability while allowing for exploration of unconventional vocal expressions. It leverages a character-level text encoder, Tacotron2-based streaming alignment, and a RAVE streaming vocoder. Tungnaá is our open source Python package implementing IATV training and real-time inference, plus a graphical interface for experimental music performance with IATV models. We report on strategies for low-resource training on artist-created datasets, and on an artistic application of Tungnaá in collaboration with sound poet Jaap Blonk.
Video Preview For Artwork: mp4
Submission Number: 139
Loading