Keywords: model evaluation, model interpretability, sensory language, model creativity
TL;DR: Popular large language models fail to replicate human use of sensory language, an important feature of storytelling.
Abstract: Sensory language expresses embodied experiences ranging from taste and sound to excitement and stomachache. It is of interest to scholars from a wide range of domains including robotics, narratology, linguistics, and cognitive science. In this work, we explore whether language models, which are not embodied, can approximate human use of embodied language. To do this, we extend an existing corpus of parallel human and model responses to short story prompts with an additional 18,000 stories generated by 18 popular language models. We find that all models generate stories that differ significantly from human usage of sensory language. However, the direction of these differences varies considerably between model families; Gemini models use significantly more sensory language than humans along most axes whereas most models from the remaining five families use significantly less. Linear probes run on five models suggest that they are capable of \textit{identifying} sensory language, meaning an inability to recognize sensory content is unlikely to be the cause of the observed differences. Instead, we find preliminary evidence indicating that instruction tuning may discourage usage of sensory language in some models. To support further work, we release \href{https://github.com/srhm-ca/sensorylanguage}{our expanded story dataset.}
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html
Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html
Submission Number: 1078
Loading