Interaction-aware Representation Modeling With Co-Occurrence Consistency for Egocentric Hand-Object Parsing

ICLR 2026 Conference Submission6582 Authors

Published: 26 Jan 2026, Last Modified: 26 Jan 2026ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Egocentric vision, human-environment interaction, hand-object parsing, consistency
Abstract: Fine-grained understanding of egocentric human-environment interactions is critical for developing next-generation embodied agents. A fundamental challenge in this area involves accurately parsing hands and active objects. While transformer-based architectures have demonstrated considerable potential for such tasks, several key limitations remain unaddressed: 1) existing query initialization mechanisms lack adaptability to diverse categories of contacting objects, impairing the localization and recognition of interactive entities; 2) over-reliance on pixel-level semantic features incorporates interaction-irrelevant noise, degrading segmentation accuracy; and 3) prevailing models are susceptible to "interaction illusion", producing physically inconsistent predictions. To handle these issues, we propose the Interaction-aware Transformer (InterFormer), which integrates three key components, i.e., a Prototypical Query Generator (PQG), a Dual-context Feature Selector (DFS), and the Conditional Co-occurrence (CoCo) loss. The PQG fuses learnable parameters with interaction-relevant context to construct robust and adaptive queries for different active objects. The DFS explicitly combines interactive and semantic cues to filter irrelevant information and generate discriminative interaction embeddings. The CoCo loss incorporates hand-object relationship priors to enhance physical consistency in prediction. Our model achieves state-of-the-art performance on both the EgoHOS and the challenging out-of-distribution mini-HOI4D datasets, demonstrating its effectiveness and strong generalization ability.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 6582
Loading