- Abstract: In this paper, we formulate hypothesis verification as a reinforcement learning problem. Specifically, we aim to build an agent that, given a hypothesis about the dynamics of the world can take actions to generate observations which can help predict whether the hypothesis is true or false. Our first observation is that agents trained end-to-end with the reward fail to learn to solve this problem. In order to train the agents, we exploit the underlying structure in the majority of hypotheses -- they can be formulated as triplets (pre-condition, action sequence, post-condition). Once the agents have been pretrained to verify hypotheses with this structure, they can be fine-tuned to verify more general hypotheses. Our work takes a step towards a ``scientist agent'' that develops an understanding of the world by generating and testing hypotheses about its environment.
- Original Pdf: pdf