Sign Language Video Segmentation Using Temporal Boundary Identification

Kavu Maithri Rao; Yasser HAMIDULLAH; Eleftherios Avramidis

Sign Language Video Segmentation Using Temporal Boundary Identification

Kavu Maithri Rao, Yasser HAMIDULLAH, Eleftherios Avramidis

Published: 22 Jun 2025, Last Modified: 17 Jul 2025ACL-SRW 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: sign language, video segmentation

TL;DR: We present on of the first methods related to segmenting videos of sign language on the subtitle-level, using temporal boundary indentification based on optical flow and sequence-to-sequence methods.

Abstract: Sign language segmentation focuses on identifying temporal boundaries within sign language videos. As compared to previous segmentation techniques that have depended on frame-level and phrase-level segmentation, our study emphasizes on subtitle-level segmentation, using synchronized subtitle data to facilitate temporal boundary recognition. Based on Beginning-Inside-Outside (BIO) tagging for subtitle unit delineation, we train a sequence-to-sequence (Seq2Seq) model with and without attention for subtitle boundary identification. Training on optical flow data and aligned subtitles from BOBSL and YouTube-ASL, we show that the Seq2Seq model with attention outperforms baseline models, achieving improved percentage of segments, F1 and IoU score. An additional contribution is the development of an method for subtitle temporal resolution, aiming to facilitate manual annotation.

Archival Status: Archival

Acl Copyright Transfer: pdf

Paper Length: Long Paper (up to 8 pages of content)

Submission Number: 346

Loading