A Whisker of Truth: A Multimodal Interdisciplinary Machine Learning Approach to Vocal, Visual, and Tactile Signals in the Domestic Cat
Keywords: multimodal machine learning, pose estimation, digital bioacoustics, computer vision, cross-modal attention, animal behaviour
TL;DR: A multimodal machine learning framework combining audio, video, and tactile data to observe and analyse domestic cat behaviour enabling better welfare monitoring and early health detection through interdisciplinary collaboration.
Abstract: We propose a multimodal deep learning framework for automated analysis of cat–human communication, integrating acoustic, visual, and tactile signals through transformer-based fusion. Using the largest expert-annotated dataset of its kind and interdisciplinary collaboration, we combine semi-supervised learning with ethological and phonetic expertise to detect subtle behavioural and phonetic cues, enable early welfare assessment, and establish species-generalisable methods.
Submission Number: 6
Loading