Abstract: Gait analysis offers a non-invasive, scalable approach to infer individual health information using vision-based methods. In this work, we propose a multi-task learning framework that simultaneously estimates several health-related indicators grouped into four categories: biometric identification, body composition, body measures, and physical activity traits. Our model operates solely on silhouette gait sequences, avoiding the need for wearable sensors, depth cameras, or pose annotations. Unlike prior approaches that focus on isolated tasks or rely on multimodal inputs, our method leverages shared spatiotemporal representations to jointly predict diverse health factors from video alone. We conduct extensive experiments using the Health&Gait dataset, which includes 398 individuals walking naturally in indoor conditions with clinically relevant annotations. We show that grouping tasks by physiological correlation improves performance across model backbones, revealing structure in the health representation space. Results demonstrate that multi-task learning improves prediction accuracy for most tasks compared to single-task baselines, particularly benefiting from correlations between related attributes. These findings support the viability of gait-based health modeling as a contactless and privacy-conscious tool for comprehensive health profiling. Our work lays the groundwork for the development of generalist models for preventive health monitoring.
External IDs:dblp:conf/caip/AguilarOrtegaYMM25
Loading