The Transformer Cookbook

Andy Yang; Christopher Watson; Anton Xue; Satwik Bhattamishra; Jose Llarena; William Merrill; Emile Dos Santos Ferreira; Anej Svete; David Chiang

The Transformer Cookbook

Andy Yang, Christopher Watson, Anton Xue, Satwik Bhattamishra, Jose Llarena, William Merrill, Emile Dos Santos Ferreira, Anej Svete, David Chiang

Published: 02 Feb 2026, Last Modified: 02 Feb 2026Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: We present the transformer cookbook: a collection of techniques for directly encoding algorithms into a transformer's parameters. This work addresses the steep learning curve of such endeavors, a problem exacerbated by a fragmented literature where key results are scattered across numerous papers. In particular, we synthesize this disparate body of findings into a curated set of recipes that demonstrate how to implement everything from basic arithmetic in feed-forward layers to complex data routing via self-attention. Our mise en place of formulations is for both newcomers seeking an accessible entry point and experts in need of a systematic reference. This unified presentation of transformer constructions provides a foundation for future work spanning theoretical research in computational complexity to empirical investigations in architecture design and interpretability. We provide code implementations of each construction in numpy alongside a suite of generative unit tests.

Submission Length: Long submission (more than 12 pages of main content)

Changes Since Last Submission: # List of Changes - expand exposition around transformer architecture in section 2.2 - Add in the ReLU / GELU approximation behavior in section 4.6 - fixed typos - added citation to RASP/Tracr in the introduction - added remark about architectural assumptions into the introduction - fixed an error in section 5.7 - added comment about temperature scaling into section 9.1.1 - updated table 2 and created table 3

Code: https://github.com/JoseLlarena/transformer-cookbook

Assigned Action Editor: ~Tim_Genewein1

Submission Number: 6052

Loading