Semantic Reranking at Inference Time for Hard Examples in Rhetorical Role Labeling

Semantic Reranking at Inference Time for Hard Examples in Rhetorical Role Labeling

ACL ARR 2026 January Submission9518 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: classification, semantic reranking, specialized domains

Abstract: Rhetorical Role Labeling (RRL) assigns a functional role to each sentence in a document and is widely used in legal, medical, and scientific domains. While language models (LMs) achieve strong average performance, they remain unreliable on hard examples, where prediction confidence is low. Existing approaches typically handle uncertainty implicitly and treat labels as discrete identifiers, overlooking the semantic information encoded in label names. We introduce **RiSE**, an inference-time semantic reranking framework that leverages label semantics to refine predictions on hard instances. RiSE automatically identifies low-confidence predictions and reranks model outputs using contrastively learned label representations, without retraining or modifying the underlying model. Experiments on eight domain-specific RRL datasets with seven LMs, including encoder-based and causal architectures, show an average gain of $+9.15$ macro-F1 points on hard examples. For explainability, we further propose manual hardness annotations to study difficulty from both model and human perspectives, revealing a moderate agreement with Cohen’s $\kappa = 0.40$.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: legal NLP, NLP in resource-constrained settings

Contribution Types: Approaches to low-resource settings

Languages Studied: english

Submission Number: 9518

Loading