Sign Language Recognition Based on Subspace Representations in the Spatio-Temporal Frequency Domain

Ryota Sato, Suzana Rita Alves Beleza, Erica K. Shimomoto, Matheus Silva de Lima, Nobuko Kato, Kazuhiro Fukui

Published: 2024, Last Modified: 22 May 2025ICPRAM 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This paper proposes a subspace-based method for sign language recognition in videos. Typical subspace-based methods represent a video as a low-dimensional subspace generated by applying principal component analysis (PCA) to a set of images from the video. Such representation is compact and practical for motion recognition under few learning data. However, given the complex motion and structure in sign languages, subspace-based methods need to improve performance as they do not consider temporal information like the order of frames. To address this issue, we propose processing time-domain information on the frequency-domain by applying the three-dimensional fast Fourier transform (3D-FFT) to sign videos, where a sign video is represented as a 3D amplitude spectrum tensor, which is invariant to deviations in the spatial and temporal directions of target objects. Further, a 3D amplitude spectral tensor is regarded as one point on the Product Grassmann Manifold (PGM). By unfolding the te