Does multimodal pre-activation influence linguistic expectations in LLMs and humans?
Keywords: multimodality, language models, sentence processing, surprisal, reading times, word meaning, grounded semantics
TL;DR: When "watch" is highly expected in a context, LLMs do not show lower surprisal for "compass" (unexpected in context, visually similar to a watch) compared to "dog" (unexpected, dissimilar visually). We are testing if humans do.
Submission Number: 10
Loading