Addressing the Ecological Fallacy in Larger LLMs with the Author's Context

Addressing the Ecological Fallacy in Larger LLMs with the Author's Context

ACL ARR 2025 May Submission5524 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Language model training and inference ignores a fundamental fact about language-- the dependence between sequences of text that come from the same person. Prior work had shown that addressing this form of \textit{ecological fallacy} can greatly improve performance of a smaller language model, a 110 M parameter GPT-2 model. In this work, we ask if addressing the ecological fallacy by modeling the author's language context with a specific LM task (called HuLM) can provide similar benefits for a larger scale model, an 8B Llama model. To this end, we explore variants which process an author's language in the context of their other temporally ordered texts. We study the effect of pre-training with this author context using the HuLM objective, as well as using it during fine-tuning with author context. Empirical comparisons show that addressing the ecological fallacy during fine-tuning alone improves the performance of the larger 8B model over standard fine-tuning, as well as prompting with an instruction-tuned variant. These results indicate the utility and importance of modeling language in the context of its original generators, the authors.

Paper Type: Long

Research Area: Human-Centered NLP

Research Area Keywords: author context, human language models, human LLMs, large huma LLMs, human context

Contribution Types: Model analysis & interpretability, Publicly available software and/or pre-trained models, Data resources, Data analysis

Languages Studied: English

Submission Number: 5524

Loading