2nd CASTLE Multimodal Analytics Challenge

Published: 03 Apr 2026, Last Modified: 03 Apr 2026ACMMM2026-MGC-ProposalEveryoneRevisionsCC BY 4.0
Keywords: CASTLE Dataset, Egocentric Vision, Visual Question Answering, Video Search
Abstract: The increasing availability of mobile computing devices equipped with a variety of sensors enables us to capture and quantify more and more aspects of the human condition. Automatically drawing insights from such captured data has the potential to open doors to a wide range of new applications, but this remains challenging, in particular when information needs to be combined from different points along the timeline. The CASTLE challenge aims to act as a catalyst in the development of methods for multimodal understanding by providing a rich multimodal dataset that serves as a basis for a range of analysis tasks. Research on lifelog retrieval and analysis has so far focused on longitudinal data of a single person, or multi-person data over a short time range. The CASTLE challenge scales the problem to multi-person and multi-day data, aiming to model real-world problem settings more closely, and advancing the state of the art in multimodal understanding of human activity video data.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 20
Loading