Abstract: Egocentric human-object interaction (Ego-HOI) detection is essential for intelligent agents to understand and assist human activities from a first-person perspective. However, progress has been hindered by the lack of dedicated benchmarks and methods robust to severe egocentric challenges like hand-object occlusion. This work bridges this gap through three key contributions. Firstly, we introduce Ego-HOIBench, a pioneering benchmark for real-world Ego-HOI detection, comprising over 27K real images with explicit, fine-grained <hand, verb, object> triplet annotations. The dataset is derived from HOI4D with enhanced annotations for active objects and hand distinctions. Secondly, we propose Hand Geometry and Interactivity Refinement (HGIR), a novel plug-and-play module that captures the structural geometry of hands to learn occlusion-robust, pose-aware interaction representations. Thirdly, comprehensive experiments show that HGIR significantly improves Ego-HOI detection performance across multiple methods, achieving state-of-the-art results and providing a solid foundation for future research in egocentric vision. Project page: https://dengkunyuan.github.io/EgoHOIBench/
External IDs:doi:10.1016/j.eswa.2025.130216
Loading