AugARC: Augmented Abstraction and Reasoning Benchmark for Large Language Models

Published: 13 Dec 2024, Last Modified: 19 Feb 2025Good-DataEveryoneRevisionsBibTeXCC BY 4.0
Student Lead Author Indication: No
Keywords: Data Augmentation, Abstraction and Reasoning Benchmark, Fine-Tuning of LLMs
TL;DR: We introduce augmented ARC datasets and a new benchmark (AugARC) for Large Language Models (LLMs), which measures abstraction and reasoning.
Abstract: The Abstraction and Reasoning Corpus (ARC) benchmarks broad generalization, and poses a significant challenge to existing machine learning models. In this work, we introduce augmented ARC datasets and a new benchmark (AugARC) for Large Language Models (LLMs), which measures abstraction and reasoning. We evaluate the accuracy of base LLMs on AugARC and show a consistent improvement in performance compared to the normal ARC benchmark. Using augmented ARC data, we fine-tune LLMs and observe a significant gain in ARC accuracy after training. Due to the limited size of the ARC training dataset (400 tasks), previous studies have not attempted to train LLMs on ARC. Our augmentation of ARC allows us to overcome this limitation. Using a reflection approach, we combine LLMs and a previous domain specific language (DSL) solver. Our work introduces an augmented version of ARC - AugARC, and motivates further research into enhancing data quality for better reasoning in AI systems.
Submission Number: 12
Loading