Evaluating Model Alignment with Human Perception: A Study on Shitsukan in LLMs and LVLMs

Daiki Shiono; Ana Brassard; Yukiko Ishizuki; Jun Suzuki

Evaluating Model Alignment with Human Perception: A Study on Shitsukan in LLMs and LVLMs

Daiki Shiono, Ana Brassard, Yukiko Ishizuki, Jun Suzuki

Published: 01 Jan 2025, Last Modified: 30 Jun 2025COLING 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We evaluate the alignment of large language models (LLMs) and large vision-language models (LVLMs) with human perception, focusing on the Japanese concept of *shitsukan*, which reflects the sensory experience of perceiving objects. We created a dataset of *shitsukan* terms elicited from individuals in response to object images. With it, we designed benchmark tasks for three dimensions of understanding *shitsukan*: (1) accurate perception in object images, (2) commonsense knowledge of typical *shitsukan* terms for objects, and (3) distinction of valid *shitsukan* terms. Models demonstrated mixed accuracy across benchmark tasks, with limited overlap between model- and human-generated terms. However, manual evaluations revealed that the model-generated terms were still natural to humans. This work identifies gaps in culture-specific understanding and contributes to aligning models with human sensory perception. We publicly release the dataset to encourage further research in this area.

Loading