Abstract: Limitations in a scientific article refer to the inherent shortcomings, constraints, or weaknesses of a study that can affect its results or limit the generalizability of its findings. One type of limitation is visual-related limitations that focus on issues such as unclear charts, diagrams, captions, or descriptions in scientific papers. In this work, we focus on generating image descriptions based on some questions from charts and graphs using multi-modal Large Language Models (LLMs) such as QWen, Llama, Llava, Pali-GEMMA, and GPT-4o. Using an LLM-as-a-judge evaluation approach, where two LLMs acted as evaluators, we found that GPT-4o outperformed the other models in generating accurate and coherent chart descriptions.
Loading