Framing Bias in Arithmetic Reasoning: How Language and Identity Cues Steer LLM Outputs in Objective Tasks

Framing Bias in Arithmetic Reasoning: How Language and Identity Cues Steer LLM Outputs in Objective Tasks

ACL ARR 2025 July Submission797 Authors

28 Jul 2025 (modified: 19 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: LLMs are expected to reason reliably over objective, verifiable facts, especially in contrast to subjective or open-ended tasks. We introduce MATHCOMP, a diagnostic benchmark comprising over 29,000 prompted instances derived from 300 controlled arithmetic comparison scenarios, systematically varied across 14 linguistic framings and multiple demographic identity conditions (e.g., ``a woman'', ``a Black person''). Across six LLMs and multiple prompting formats, we observe consistent framing bias, i.e., systematic, directional shifts in model predictions caused by terms like more, less, or equal, even when logically redundant. Demographic references further amplify these shifts. Chain-of-thought prompting reduces framing effects in free-form outputs, though structured reasoning formats can reintroduce bias by echoing prompt cues. MATHCOMP reveals how even grounded, symbolic tasks are shaped by linguistic and social framing, expanding the evaluation of LLM robustness and ultimately fairness beyond standard accuracy metrics and common benchmarks focused on affective or identity-laden content.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: math QA, logical reasoning, model bias/fairness evaluation, mathematical NLP

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data resources, Data analysis

Languages Studied: English

Previous URL: https://openreview.net/forum?id=gPRyCsUtYl

Explanation Of Revisions PDF: pdf

Reassignment Request Area Chair: Yes, I want a different area chair for our submission

Reassignment Request Reviewers: Yes, I want a different set of reviewers

Justification For Not Keeping Action Editor Or Reviewers: We respectfully request a new Action Editor and new reviewers for this submission. The paper has undergone extensive revisions, including updated analyses, additional section(s), and restructuring of the framing and writing, making it effectively a new submission. To ensure a fresh and unbiased evaluation of the current version, we believe that a new set of reviewers and AC with relevant expertise would provide the most constructive and fair assessment.

Data: zip

A1 Limitations Section: This paper has a limitations section.

A2 Potential Risks: No

B Use Or Create Scientific Artifacts: Yes

B1 Cite Creators Of Artifacts: Yes

B1 Elaboration: Section 3

B2 Discuss The License For Artifacts: Yes

B2 Elaboration: The dataset would be open source and publicly available

B3 Artifact Use Consistent With Intended Use: N/A

B4 Data Contains Personally Identifying Info Or Offensive Content: N/A

B5 Documentation Of Artifacts: Yes

B5 Elaboration: Section 3

B6 Statistics For Data: Yes

B6 Elaboration: Main paper and in the appendix

C Computational Experiments: Yes

C1 Model Size And Budget: Yes

C1 Elaboration: Section 4

C2 Experimental Setup And Hyperparameters: Yes

C2 Elaboration: Section 4

C3 Descriptive Statistics: Yes

C3 Elaboration: Sections 5, 6, and 7

C4 Parameters For Packages: N/A

D Human Subjects Including Annotators: Yes

D1 Instructions Given To Participants: Yes

D1 Elaboration: Appendix A.4 Annotation and Filtering

D2 Recruitment And Payment: No

D2 Elaboration: the task was not subjective.

D3 Data Consent: Yes

D3 Elaboration: The annotators were among the authors.

D4 Ethics Review Board Approval: N/A

D5 Characteristics Of Annotators: No

D5 Elaboration: The task is objective and we do not collect or report the demographics of the annotators

E Ai Assistants In Research Or Writing: Yes

E1 Information About Use Of Ai Assistants: Yes

E1 Elaboration: Section 3

Author Submission Checklist: yes

Submission Number: 797

Loading