Keywords: Causal Axioms, Transformers, Causal Reasoning, Generalization
Abstract: For text-based AI systems to interact in the real world, causal reasoning is an essential skill. Since interventional data is costly to generate, we study to what extent an agent can learn causal reasoning from passive data. We consider an axiomatic training setup where an agent learns from multiple demonstrations of a causal axiom (or rule), rather than incorporating the axiom as an inductive bias or inferring it from data values. A key question is whether transformers could learn to generalize from the axiom demonstrations to larger and more complex scenarios. Our results, based on a novel axiomatic training scheme, indicate that such generalization is possible. We consider the task of inferring whether a variable causes another variable, given a causal graph structure. We find that a 67 million parameter transformer model, when trained on linear causal chains (along with some variations) can generalize well to new kinds of graphs, including longer causal chains, causal chains with reversed order, and graphs with branching; even when it is not explicitly trained for such settings. Our model performs at par (or better) than many larger language models such as GPT-4, Gemini Pro, and Phi-3. Overall, the axiomatic training framework provides a new paradigm of learning causal reasoning from passive data that can be used to learn arbitrary axioms.
Submission Number: 62
Loading