Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesis

25 Sept 2024 (modified: 29 Oct 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: text-to-speech, deep generative model, audio modeling
TL;DR: Parameter-efficient training and fine-tuning of linear-attention based text-to-speech model.
Abstract: Neural codec language models have demonstrated state-of-the-art performance in text-to-speech (TTS) synthesis. Leveraging scalable architectures like autoregressive transformers, they capitalize on the availability of large speech datasets. When framing voice cloning as a prompt continuation task, these models excel at cloning voices from short audio samples. However this approach can't be extended to multiple speech excerpts and is limited since the concatenation of source and target speech must fall within the maximum context length which is determined during training. In this work, we propose a model that replaces transformers with emergent recurrent architecture such as Gated Linear Attention (GLA). Our model, Lina-Speech, outperforms or matches the baseline models that are up to 4x it's size. We showcase intial-state tuning as a parameter-efficient fine-tuning technique that optimizes the initial state of the recurrent layers, resulting in compact and expressive speaker embedding with fine-grained control over the speech style. Compared to prompt continuation, it allows voice cloning from multiple speech excerpts and full usage of the context window for synthesis. This approach is fast, deployable and does not rely on auxiliary modules. It also demonstrates extensive adaptation to out-of-domain data. We will release publicly our code and checkpoints. Audio samples are available at \url{https://anonymsubm.github.io}.
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4680
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview