Towards Scalable Sign Production: Leveraging Co-Articulated Gloss Dictionary for Fluid Sign Synthesis
Abstract: Sign Language Production (SLP) systems can significantly improve accessibility for Deaf communities by translating spoken or written language into sign language videos. For millions of Indian Sign Language (ISL) users, such systems could bridge persistent communication gaps in education, healthcare, and public services. However, SLP in ISL is hindered by the scarcity of annotated datasets and the expressive complexity of the language. Existing ISL datasets, such as iSign \cite{joshi2024isign} and ISL-CSLRT \cite{elakkiya2021islcsltr}, are designed for recognition and lack gloss-level\footnote{Gloss is word representation of sign.} annotations, forcing full-sentence modeling that struggles to generalize to new constructions.
We propose an interpolation-based approach that stitches together individual gloss-level signs using modern text-to-motion frameworks like FramePack \cite{zhang2025framepack}. This produces contextually accurate and grammatically consistent sign sequences while retaining fine-grained articulation. To support this pipeline, we curate a 1,000-sentence ISL dataset, translated by a native interpreter and annotated with precise gloss boundaries. Results show that with high-quality gloss-level supervision, interpolation-based synthesis offers a practical, scalable path for inclusive SLP in low-resource sign languages like ISL. The project page can be found here
Loading