Abstract: Our goal is a question-answering (QA) system
that can show how its answers are implied by
its own internal beliefs via a systematic chain
of reasoning. Such a capability would allow
better understanding of why a model produced
the answer it did. Our approach is to recursively
combine a trained backward-chaining
model, capable of generating a set of premises
entailing an answer hypothesis, with a verifier
that checks that the model itself believes those
premises (and the entailment itself) through
self-querying. To our knowledge, this is the
first system to generate multistep chains that
are both faithful (the answer follows from the
reasoning) and truthful (the chain reflects the
system’s own internal beliefs). In evaluation
using two different datasets, users judge that
a majority (70%+) of generated chains clearly
show how an answer follows from a set of facts
- substantially better than a high-performance
baseline - while preserving answer accuracy.
By materializing model beliefs that systematically
support an answer, new opportunities
arise for understanding the model’s system of
belief, and diagnosing and correcting its misunderstandings
when an answer is wrong.
0 Replies
Loading