Coupling Local Context and Global Semantic Prototypes via a Hierarchical Architecture for Rhetorical Roles Labeling
Abstract: Rhetorical Role Labeling (RRL) aims to identify the functional role of each sentence within a document, a task critical for discourse understanding in domains such as law, medicine, and science. While hierarchical models capture local, intra-document dependencies effectively, they struggle to model global, corpus-level regularities. To bridge this gap, we propose two methods that couple local context with global representations in the form of semantic prototypes. Prototype-Based Regularization (PBR) learns soft prototypes through a distance-based auxiliary loss to structure the latent space. Prototype-Conditioned Modulation (PCM) constructs a priori prototypes from the corpus and injects them during both training and inference. We also introduce SCOTUS-Law, the first dataset of U.S. Supreme Court opinions annotated with rhetorical roles at three levels of granularity: category, rhetorical function, and step. Experiments across legal, medical, and scientific benchmarks demonstrate that modeling both local and global perspectives leads to consistent gains over strong baselines, particularly on low-frequency roles, achieving an average gain of $\sim$4 points in Macro-F1.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: legal NLP, corpus creation, NLP in resource-constrained settings,
Contribution Types: Approaches to low-resource settings, Data resources
Languages Studied: english
Reassignment Request Area Chair: This is not a resubmission
Reassignment Request Reviewers: This is not a resubmission
A1 Limitations Section: This paper has a limitations section.
A2 Potential Risks: N/A
A2 Elaboration: 9
B Use Or Create Scientific Artifacts: Yes
B1 Cite Creators Of Artifacts: Yes
B1 Elaboration: 5
B2 Discuss The License For Artifacts: Yes
B2 Elaboration: 1
B3 Artifact Use Consistent With Intended Use: Yes
B3 Elaboration: 5
B4 Data Contains Personally Identifying Info Or Offensive Content: No
B4 Elaboration: The data used in this study was collected from CourtListener, an open-access resource. No anonymization was required.
B5 Documentation Of Artifacts: Yes
B5 Elaboration: Appendix C
B6 Statistics For Data: Yes
B6 Elaboration: Appendix C
C Computational Experiments: Yes
C1 Model Size And Budget: Yes
C1 Elaboration: 6.6
C2 Experimental Setup And Hyperparameters: Yes
C2 Elaboration: 5
C3 Descriptive Statistics: Yes
C3 Elaboration: 4
C4 Parameters For Packages: Yes
C4 Elaboration: Appendix B
D Human Subjects Including Annotators: Yes
D1 Instructions Given To Participants: Yes
D1 Elaboration: 4
D2 Recruitment And Payment: N/A
D3 Data Consent: Yes
D3 Elaboration: 4
D4 Ethics Review Board Approval: N/A
D5 Characteristics Of Annotators: Yes
D5 Elaboration: 4
E Ai Assistants In Research Or Writing: Yes
E1 Information About Use Of Ai Assistants: N/A
Author Submission Checklist: yes
Submission Number: 1114
Loading