everyone
since 20 Jul 2024">EveryoneRevisionsBibTeXCC BY 4.0
Our paper introduces a novel video dataset specifically designed for Temporal Intention Localization (TIL), aimed at identifying hidden abnormal intention in densely populated and dynamically complex environments. Traditional Temporal Action Localization (TAL) frameworks, focusing on overt actions within constrained temporal intervals, often miss the subtleties of pre-abnormal actions that unfold over extended periods. Our dataset comprises 228 videos with 5790 clips, each annotated to capture fine-grained actions within ambiguous temporal boundaries using a Joint-Linear-Assignment methodology. This comprehensive approach enables detailed analysis of the evolution of abnormal intention over time. To address the detection of subtle, hidden intention, we developed the Intention-Action Fusion module, an creative approach that integrates dynamic feature fusion across 11 behavioral subcategories, significantly enhancing the model's ability to discern nuanced intention. This enhancement has led to performance improvements of up to 139% in specific scenarios, dramatically boosting the model's sensitivity and interpretability, which is crucial for advancing the capabilities of proactive surveillance systems. By pushing the boundaries of current technology, our dataset and methodologies foster the development of proactive surveillance systems capable of preemptively identifying potential threats from nuanced behavioral patterns, encouraging further exploration into the complexities of intention beyond observable actions.