Keywords: image set description, blind and low vision, vision and language
Abstract: Recent advances in generative AI have enabled new tasks in Computer Vision, such as generating textual summaries of entire image collections. While single-image and group captioning have been widely explored for accessibility applications, in this work, we present ImageSet2Text, a system designed to produce high-level descriptions of image sets, and investigate its applicability for visually impaired users. After pilot interviews with members of the visually impaired community, we adapt ImageSet2Text's pipeline by integrating the NCAM principles of accessibility and perform preliminary evaluations through an LLM-as-a-judge. Finally, we outline key future directions, including broader evaluation strategies.
Submission Number: 20
Loading