Abstract: Large language models (LLMs) show potential in understanding visualizations and may capture design knowledge. However, their ability to predict human feedback remains unclear. To explore this, we conduct three studies evaluating the alignment between LLM-based agents and human ratings in visualization tasks. The first study replicates a human-subject study, showing promising agent performance in human-like reasoning and rating, and informing further experiments. The second study simulates six prior studies using agents and finds that alignment correlates with experts’ pre-experiment confidence. The third study tests enhancement techniques, such as input preprocessing and knowledge injection, revealing limitations in robustness and potential bias. These findings suggest that LLM-based agents can simulate human ratings when guided by high-confidence hypotheses from expert evaluators. We also demonstrate the usage scenario in rapid prototyping study designs and discuss future directions. We note that simulation may only serve as complements and cannot replace user studies.
External IDs:dblp:journals/cga/ShaoSHYWZZC25
Loading