From Numbers to Narratives: Efficient Language Model-Based Detection for Safety-Critical Minority Classes

ACL ARR 2025 July Submission127 Authors

23 Jul 2025 (modified: 04 Sept 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Safety-critical classification tasks face a persistent challenge: traditional models achieve high overall accuracy but inadequate performance on critical minority classes. We introduce a ''numbers to narratives'' framework that transforms tabular data into contextually rich descriptions, enabling language models to leverage pre-trained knowledge for minority class detection. Our approach integrates structured verbalization, linguistically-informed augmentation, and parameter-efficient fine-tuning to address the ''minority class blind spot'' in high-consequence domains. Using a significantly more efficient model architecture than existing approaches, our framework achieves superior minority class F1-scores: 78.76\% for machine failures (+7.42 points over XGBoost), 65.87\% for at-risk students (+12.12 points over MLP), and 32.00\% for semiconductor failures (+1.01 points over XGBoost, despite 14:1 class imbalance). Our approach also improves overall accuracy by up to 22.43\% in five of six datasets while maintaining computational feasibility. Ablation studies confirm that narrative-based verbalization enables effective reasoning about tabular data by contextualizing abstract numerical features. This work provides a practical, resource-efficient approach for enhancing minority class performance in safety-critical domains.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: Language Models, Minority Class Detection, Structured Verbalization, Class Imbalance, Computational Efficiency
Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency
Languages Studied: English
Reassignment Request Area Chair: This is not a resubmission
Reassignment Request Reviewers: This is not a resubmission
A1 Limitations Section: This paper has a limitations section.
A2 Potential Risks: Yes
A2 Elaboration: Section 6
B Use Or Create Scientific Artifacts: Yes
B1 Cite Creators Of Artifacts: Yes
B1 Elaboration: Section 3.2 and Appendix A.4
B2 Discuss The License For Artifacts: No
B2 Elaboration: We did not discuss artifact licenses, as the focus was on methodological contributions, and datasets/models are publicly available with licensing details provided by their original sources
B3 Artifact Use Consistent With Intended Use: No
B3 Elaboration: Datasets (e.g., A4H 2020, Sklearn, SECOM) and models like ChaGPT4, intended for numerical classification and general text generation respectively, are repurposed for narrative-based classification through our "numbers to narratives" framework (Section 3.1), which transforms tabular data into textual representations, an unconventional application not aligned with their original design.
B4 Data Contains Personally Identifying Info Or Offensive Content: No
B4 Elaboration: The datasets (e.g., A4H 2020, A4H 2021, Student, Google, Sklearn, SECOM) utilized in the paper, as described in Section 3.2 and Table 1, consist of numerical or categorical features relevant to specific domains (e.g., machine failure, student performance, semiconductor manufacturing) and do not include personally identifying information such as names, addresses, or sensitive personal attributes. Additionally, the data processing and transformation into narratives (Section 3.1 and Section A.4.1) focus on abstract feature representations without generating or incorporating offensive content, ensuring compliance with ethical data usage standards.
B5 Documentation Of Artifacts: Yes
B5 Elaboration: Section 3.3
B6 Statistics For Data: Yes
B6 Elaboration: Section 4
C Computational Experiments: Yes
C1 Model Size And Budget: Yes
C1 Elaboration: Section 4
C2 Experimental Setup And Hyperparameters: Yes
C2 Elaboration: Appendix A.4.2, A.4.3, and A.4.4
C3 Descriptive Statistics: Yes
C3 Elaboration: Section 4.1, 4.2, and 4.3
C4 Parameters For Packages: Yes
C4 Elaboration: Section 4
D Human Subjects Including Annotators: No
D1 Instructions Given To Participants: N/A
D2 Recruitment And Payment: N/A
D3 Data Consent: N/A
D4 Ethics Review Board Approval: N/A
D5 Characteristics Of Annotators: N/A
E Ai Assistants In Research Or Writing: Yes
E1 Information About Use Of Ai Assistants: No
Author Submission Checklist: yes
Submission Number: 127
Loading