COHESIV: Contrastive Object and Hand Embedding Segmentation In VideoDownload PDF

21 May 2021, 20:50 (edited 21 Jan 2022)NeurIPS 2021 PosterReaders: Everyone
  • Keywords: Hand Object Interaction, Object Segmentation, Contrastive Learning, Attention, Embeddings
  • TL;DR: In this paper we learn to segment hands and hand-held objects using attention and contrastive-based learning.
  • Abstract: In this paper we learn to segment hands and hand-held objects from motion. Our system takes a single RGB image and hand location as input to segment the hand and hand-held object. For learning, we generate responsibility maps that show how well a hand's motion explains other pixels' motion in video. We use these responsibility maps as pseudo-labels to train a weakly-supervised neural network using an attention-based similarity loss and contrastive loss. Our system outperforms alternate methods, achieving good performance on the 100DOH, EPIC-KITCHENS, and HO3D datasets.
  • Supplementary Material: pdf
  • Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.
14 Replies

Loading