Abstract: Language model training and inference ignores a fundamental fact about language-- the dependence between sequences of text that come from the same person. Prior work had shown that addressing this form of \textit{ecological fallacy} can greatly improve performance of a smaller language model, a 110 M parameter GPT-2 model. In this work, we ask if addressing the ecological fallacy by modeling the author's language context with a specific LM task (called HuLM) can provide similar benefits for a larger scale model, an 8B Llama model. To this end, we explore variants which process an author's language in the context of their other temporally ordered texts. We study the effect of pre-training with this author context using the HuLM objective, as well as using it during fine-tuning with author context.
Empirical comparisons show that addressing the ecological fallacy during fine-tuning alone improves the performance of the larger 8B model over standard fine-tuning, as well as prompting with an instruction-tuned variant.
These results indicate the utility and importance of modeling language in the context of its original generators, the authors.
Paper Type: Long
Research Area: Human-Centered NLP
Research Area Keywords: author context, human language models, human LLMs, large huma LLMs, human context
Contribution Types: Model analysis & interpretability, Publicly available software and/or pre-trained models, Data resources, Data analysis
Languages Studied: English
Submission Number: 5524
Loading