Robust Native Language Identification through Agentic Decomposition

Robust Native Language Identification through Agentic Decomposition

ACL ARR 2025 May Submission4271 Authors

19 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large language models (LLMs) often achieve high performance in native language identification (NLI) benchmarks by leveraging superficial contextual cues such as names, locations, and cultural stereotypes, rather than the underlying linguistic patterns indicative of native language (L1) influence. To improve robustness, previous work has instructed LLMs to disregard such clues. In this work, we demonstrate that this strategy is unreliable and predictions can be easily altered by misleading hints. To address this problem, we introduce an agentic NLI pipeline inspired by forensic linguistics, where specialized agents accumulate and categorize diverse linguistic evidence before a final overall assessment. A goal-aware coordinating agent then synthesizes this evidence to make the NLI prediction. On two benchmark datasets, our approach significantly enhances NLI robustness and performance consistency against misleading contextual cues compared to standard prompting methods.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: educational applications, grammatical error correction, NLP for social good

Contribution Types: Model analysis & interpretability, NLP engineering experiment

Languages Studied: English

Submission Number: 4271

Loading