Hierarchical and Multimodal Representation Learning for Irregular and Long ESG Reports

10 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: ESG report parsing, Information extraction, Multimodal data processing, Hierarchical document modeling, Dataset
TL;DR: We propose a framework to reconstruct the reading order and semantic hierarchy of irregular ESG reports, and concurrently release Atlas-ESG, a new large-scale, multi-level annotated dataset for this task.
Abstract: Environmental, Social, and Governance (ESG) principles are reshaping global financial governance, influencing capital allocation, regulation, and systemic risk coordination. However, ESG reports, the primary medium for assessing corporate ESG performance, remain challenging to parse at scale due to (1) irregular reading orders from slide-like layouts and fragmented content, and (2) implicit hierarchies hidden in lengthy, weakly structured narratives. We introduce Compass-ESG, a unified framework that transforms ESG reports into structured representations through three core innovations: (1) reading order modeling, which integrates a page-level layout framework with block-level sequence ordering to recover coherent global-to-local flows; (2) ToC-guided hierarchical reconstruction, where ToC-RAP parses visually complex tables of contents and ToC-ALIGN anchors entries to body content, enabling accurate recovery of explicit and implicit hierarchies; and (3) context-aware visual-to-text representation, which integrates visual and structural cues under hierarchical guidance to transform images and layout elements into grounded natural language. Extensive experiments on annotated benchmarks show that Compass-ESG significantly outperforms both specialized document parsers and general-purpose multimodal models. In addition, we release Atlas-ESG, the first large-scale ESG dataset with multi-level annotations from China, Hong Kong, and the U.S., providing a landmark resource for structured ESG analysis and future research.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 3732
Loading