Abstract: In this work, we bring new insights into the honesty of vision-language models,
particularly in visual question answering (VQA). After a throughout revisit of the
existing ‘lie’ behavior in pure language models, our work makes an unprecedented
extension of ’lies’ to vision-language models. The results indicate that the lie
prefixes have a more obvious misleading effect on vision-language models than
on language models. We also propose a novel visual prefix and prove that the
consistent vision-language prefix is more threatening to vision-language models.
To defend the models from the stated ’lies’, we put forward an unsupervised
framework based on Gaussian mixture modeling and obtain improvement with 3%
against the language prefix and 12% against the vision-language prefix.
1 Reply
Loading