Evaluating Inductive Reasoning Capabilities of Large Language Models With The One Dimensional Abstract Reasoning Corpus

Dr Cédric Mesnage

Published: 13 Oct 2024, Last Modified: 11 Nov 2024HYDRA 2024 3rd International Workshop on HYbrid Models for Coupling Deductive and Inductive ReAsoning - The workshop is co-located with the 27th European Conference on Artificial Intelligence (ECAI 2024).EveryoneCC BY 4.0

Abstract: We present an initial automated test to evaluate LLMs’ capacity to perform inductive reasoning tasks. We use the GPT-3.5 and GPT-4 models to create a system which generates Python code as hypotheses for inductive reasoning to transform sequences of the One Dimensional Abstract Reasoning Corpus (1D-ARC) challenge. We experiment with three prompting techniques, namely standard prompting, Chain of Thought (CoT), and direct feedback. We provide results and an analysis of cost-to-success rate and benefit-cost ratio. Our best result is an overall 25% success rate with our CoT prompting on GPT-4, significantly surpassing the standard prompting approach. We discuss potential avenues to improve our experiments and test other strategies, and combine deductive reasoning with LLM-based inductive reasoning.