Keywords: Image & Video Synthesis, Multi-modal Large Language Models
TL;DR: We fine-tune VLMs to identify key regions on potentially AI-generated images that, upon closer observation, can yield a more grounded, explainable and accurate classification result..
Abstract: na
Primary Area: foundation or frontier models, including LLMs
Submission Number: 6032
Loading