Transformers with RL or SFT  Provably Learn Sparse Boolean Functions, But Differently

Bochen Lyu; Yiyang Jia; Xiaohao Cai; Zhanxing Zhu

Transformers with RL or SFT Provably Learn Sparse Boolean Functions, But Differently

Bochen Lyu, Yiyang Jia, Xiaohao Cai, Zhanxing Zhu

19 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: transformer, learning dynamics, rl, supervised fine-tuning, parity

TL;DR: We analyze the learning dynamics of fine-tuning the transformer via either RL or SFT and provide a comparison between them.

Abstract: Transformers can acquire chain-of-thought (CoT) capabilities to solve complex reasoning tasks through fine-tuning. Reinforcement learning (RL) and supervised fine-tuning (SFT) are two primary approaches to this end, yet their underlying mechanisms and differences remain theoretically unclear. In this work, we examine these aspects specifically for learning $k$-sparse Boolean functions with a one-layer transformer and intermediate supervision that is akin to CoT. In particular, we analyze the learning dynamics of fine-tuning the transformer via either RL or SFT with CoT to identify sufficient conditions for it to provably learn $k$-sparse Boolean functions. We verify that these conditions hold for three basic instances, including $k$-PARITY, $k$-AND, and $k$-OR, thus demonstrating the learnability of both approaches. Notably, we reveal that RL and SFT exhibit distinct learning behaviors: RL learns the whole CoT chain simultaneously, whereas SFT learns the CoT chain step-by-step. Overall, our findings provide theoretical insights into the underlying mechanisms of RL and SFT as well as how they differ in triggering the CoT capabilities of transformers.

Primary Area: learning theory

Submission Number: 17653

Loading