Keywords: AI in finance, financial NLP, regulatory risk disclosures, 10-K Item 1A, weak supervision, unsupervised topic discovery, risk taxonomy, benchmark
TL;DR: A taxonomy-grounded, risk-aware benchmark that weakly labels 100K+ 10-K sentences at span level to rigorously evaluate unsupervised topic discovery in finance.
Abstract: Risk categorization in 10-K risk disclosures matters for oversight and investment, yet no public benchmark evaluates unsupervised topic models for this task. We present GRAB, a finance-specific benchmark with 1.61M sentences from 8,247 filings and span-grounded sentence labels produced without manual annotation by combining FinBERT token attention, YAKE keyphrase signals, and taxonomy-aware collocation matching. Labels are anchored in a risk taxonomy mapping 193 terms to 21 fine-grained types nested under five macro classes; the 21 types guide weak supervision, while evaluation is reported at the macro level. GRAB unifies evaluation with fixed dataset splits and robust metrics—Accuracy, Macro-F1, Topic BERTScore, and the entropy-based Effective Number of Topics. The dataset, labels, and code enable reproducible, standardized comparison across classical, embedding-based, neural, and hybrid topic models on financial disclosures.
Submission Number: 47
Loading