MMCOMET: A Holistic Multimodal Commonsense Knowledge

MMCOMET: A Holistic Multimodal Commonsense Knowledge

ACL ARR 2025 February Submission4776 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: We present MMCOMET, the first multimodal commonsense knowledge graph (MMKG) that integrates physical, social, and eventative knowledge. This new resource addresses a major limitation of existing MMKGs in supporting complex reasoning tasks like image captioning and storytelling. MMCOMET extends the ATOMIC2020 knowledge graph to include a visual dimension, through an efficient image retrieval process, resulting in over 900K triples. Through a standard visual storytelling experiment, we show that our holistic approach enables generating richer and more contextually aware stories.

Paper Type: Long

Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond

Research Area Keywords: multimodality,cross-modal information extraction,knowledge graphs

Contribution Types: Data resources

Languages Studied: English

Submission Number: 4776

Loading