A Minimalist Approach for Exploring Transformer Robustness to In-Distribution and Out-Of-Distribution Samples

17 Sept 2025 (modified: 01 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Transformers, Robustness, Code Reasoning
TL;DR: We propose a minimalist approach that trains small transformers with precisely controlled train/test distributions. We use this to learn about in-distribution and out-of-distribution robustness.
Abstract: Despite their strong performance across tasks, large language models (LLMs) still have limitations in their ability to generalize. Recent studies show that even state-of-the-art LLMs exhibit significant accuracy fluctuations when evaluated on superficially modified versions of the same benchmarks, suggesting potential gaps in their ability to generalize. We argue that current evaluation methods, which rely heavily on large Transformer-based models trained on massive and often opaque datasets, often make it difficult to disentangle whether limitations arise from architecture, data coverage, or other factors. While addressing this question in full requires considerable computational resources, we propose a cost-effective, preliminary investigation. Our approach involves training a tiny Transformer-based decoder-only language model (with tens of millions of parameters) from scratch on a custom code reasoning task. To train this model, we generate data using a synthetic data generation tool that allows precise control over data distribution and volume. We perform multiple experiments using our framework to study the in-distribution and out-of-distribution robustness of these models, revealing their behavior under controlled settings.
Supplementary Material: zip
Primary Area: datasets and benchmarks
Submission Number: 8907
Loading