Keywords: causal reasoning, visual causal reasoning, counterfactuals, prompting, LLMs
TL;DR: We present a method for counterfactual reasoning on images using an image feature extractor, scene graph construction and a transformer decoder. We show our method works well for counterfactual and causal VQA
Abstract: We propose a new visual causal reasoning framework
that leverages compositional visual representations and
language prompts to reason about counterfactuals. Our
model learns to decompose visual scenes into objects and
events, represent them compositionally, and generate natural language explanations describing potential causal relationships between them. These explanations are then used
to infer counterfactuals in response to language prompts.
We show that compositional visual representations, when
combined with causal language explanations and prompting, can improve performance on visual causal reasoning
tasks.
Submission Number: 1
Loading