FruitMMBench: A Multi-modal Benchmark for Fruit Quality Assessment

Published: 01 Jan 2025, Last Modified: 20 Jul 2025ICASSP 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The rapid advancement of Large Vision-Language Models (LVLMs) has brought notable improvements in tasks like visual recognition and multi-modal understanding, demonstrating significant potential in real-world applications. However, their performances on issues related to daily life such as fruit quality assessment have never been explored due to the lack of a benchmark for fruit quality assessment. To bridge this gap, we introduce FruitMMBench, a comprehensive multi-modal benchmark designed to assess the ability of fruit quality assessment. A comprehensive metric is carefully designed by jointly considering many aspects. The resulting large-scale FruitMMBench contains 5,465 collected fruit images and dedicated quality labels including the following aspects: fruit classification, quantity recognition, maturity, surface condition, quality status, and edibility recommendation. By extensive evaluations on FruitMMBench, we find popular LVLMs struggle to provide reliable results in fruit quality assessment and their performances vary greatly. The test results reveal that the ability of current LVLMs to analyze fruit quality in real-world scenarios is still weak and needs to be paid attention to and enhanced in the future. All evaluation codes and dataset will be publicly accessible shortly.
Loading