Amphion: An Open-Source Audio, Music and Speech Generation Toolkit

Xueyao Zhang; Liumeng Xue; Yuancheng Wang; Yicheng Gu; Xi Chen; Zihao Fang; Haopeng Chen; Lexiao Zou; Chaoren Wang; Jun Han; Kai Chen; Haizhou Li; Zhizheng Wu

Amphion: An Open-Source Audio, Music and Speech Generation Toolkit

Xueyao Zhang, Liumeng Xue, Yuancheng Wang, Yicheng Gu, Xi Chen, Zihao Fang, Haopeng Chen, Lexiao Zou, Chaoren Wang, Jun Han, Kai Chen, Haizhou Li, Zhizheng Wu

Published: 01 Jan 2023, Last Modified: 30 Sept 2024CoRR 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Amphion is an open-source toolkit for Audio, Music, and Speech Generation, targeting to ease the way for junior researchers and engineers into these fields. It presents a unified framework that includes diverse generation tasks and models, with the added bonus of being easily extendable for new incorporation. The toolkit is designed with beginner-friendly workflows and pre-trained models, allowing both beginners and seasoned researchers to kick-start their projects with relative ease. The initial release of Amphion v0.1 supports a range of tasks including Text to Speech (TTS), Text to Audio (TTA), and Singing Voice Conversion (SVC), supplemented by essential components like data preprocessing, state-of-the-art vocoders, and evaluation metrics. This paper presents a high-level overview of Amphion. Amphion is open-sourced at https://github.com/open-mmlab/Amphion.

Loading