AttentionSmithy: A Modular Framework for Rapid Transformer Development

Caleb Cranney; Jesse G Meyer

AttentionSmithy: A Modular Framework for Rapid Transformer Development

Caleb Cranney, Jesse G Meyer

Published: 30 May 2025, Last Modified: 30 May 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Transformer architectures have revolutionized a broad spectrum of AI applications by leveraging attention mechanisms for parallelized and long-range sequence processing. Despite their remarkable success, building and customizing transformers remains prohibitively complex for many domain experts who lack deep knowledge of low-level implementations. We introduce AttentionSmithy, a modular software package that lowers the barrier to transformer innovation by decomposing key components---attention modules, feed-forward networks, normalization layers, and positional encodings---into reusable building blocks. By disentangling architectural elements into well-defined interfaces, users can rapidly prototype, adapt, and evaluate transformer variants without extensive coding overhead. Our framework currently supports four distinct positional encoding strategies (sinusoidal, learned, rotary, and ALiBi), offers modular integration of multiple attention methods (including standard attention, Longformer, and Linformer), and integrates seamlessly with neural architecture search (NAS) for automated design exploration. The system is designed to support future extensions with minimal overhead. We validate AttentionSmithy by replicating the original ``Attention Is All You Need'' transformer under resource constraints, demonstrating robust performance on a machine translation task. Leveraging the package’s integrated NAS capability, we identified an optimized model configuration that outperformed our baseline, demonstrating the framework’s effectiveness for automated architecture search and model improvement. We further illustrate AttentionSmithy's adaptability through gene-specific modeling, where a variant of a BERT-style architecture achieves over 95\% accuracy on downstream cell type classification tasks using ranked transcriptomic data. These case studies underscore AttentionSmithy's core advantage: enabling specialized experimentation across diverse application domains---from natural language processing to genomic analysis---by obviating the need for labor-intensive, low-level framework manipulation. We anticipate that AttentionSmithy will serve as a foundation for creative transformer-based solutions, expediting research and development in numerous scientific and industrial fields.

Submission Length: Regular submission (no more than 12 pages of main content)

Previous TMLR Submission Url: https://openreview.net/forum?id=XtMHy3iRXE&noteId=XtMHy3iRXE

Changes Since Last Submission: In preparation for publication, we made the following updates to the paper as a camera-ready revision: 1. We set the TMLR template to "accepted," showing author name and contact information at the top of the paper. 2. We edited Section 2.3, Code Availability, with links to the AttentionSmithy GitHub repository, in addition to repositories relating to the machine translation and geneformer tasks. 3. We decreased the font size of text in Supplementary Figure 1, as advised by reviewer 5n9H. We are including a link to the AttentionSmithy GitHub repository in our revision submission form as well.

Code: https://github.com/xomicsdatascience/AttentionSmithy

Supplementary Material: zip

Assigned Action Editor: ~Shay_B_Cohen1

Submission Number: 4519

Loading