Evaluating Vision-Language Models on the TriangleCOPA Benchmark

Published: 01 Jan 2024, Last Modified: 17 Jul 2025FLAIRS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The TriangleCOPA benchmark consists of 100 textual questions with videos depicting the movements of simple shapes in the style of the classic social-psychology film created by Fritz Heider and Marianne Simmel in 1944. In our experiments, we investigate the performance of current vision-language models on this challenging benchmark, assessing the capability of these models for visual anthropomorphism and abstract interpretation.
Loading