AttentionSmithy: A Modular Framework for Rapid Transformer Development and Customization

TMLR Paper4519 Authors

19 Mar 2025 (modified: 04 Apr 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Transformer architectures have revolutionized a broad spectrum of AI applications by leveraging attention mechanisms for parallelized and long-range sequence processing. Despite their remarkable success, building and customizing transformers remains prohibitively complex for many domain experts who lack deep knowledge of low-level implementations. We introduce AttentionSmithy, a modular software package that lowers the barrier to transformer innovation by decomposing key components—attention modules, feed-forward networks, normalization layers, and positional encodings—into reusable building blocks. By disentangling architectural elements into well-defined interfaces, users can rapidly prototype, adapt, and evaluate transformer variants without extensive coding overhead. Our framework supports four distinct positional encoding strategies (sinusoidal, learned, rotary, and ALiBi) and integrates seamlessly with neural architecture search (NAS) for automated design exploration. We validate AttentionSmithy by replicating the original “Attention Is All You Need” transformer under resource constraints, demonstrating near state-of-the-art performance on a machine translation task. Leveraging the package’s integrated NAS capability, we made the unexpected discovery that machine translation performance is maximized by combining all available positional encoding methods—highlighting the complementary benefits of each strategy. We further illustrate AttentionSmithy’s adaptability through gene-specific modeling, where a variant of a BERT-style architecture achieves over 95% accuracy on downstream cell type classification tasks using ranked transcriptomic data. These case studies underscore AttentionSmithy’s core advantage: enabling specialized experimentation across diverse application domains—from natural language processing to genomic analysis—by obviating the need for labor-intensive, low-level framework manipulation. We anticipate that AttentionSmithy will serve as a foundation for creative transformer-based solutions, expediting research and development in numerous scientific and industrial fields.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=XtMHy3iRXE&noteId=XtMHy3iRXE
Changes Since Last Submission: We used the LaTex template this time, we further anonymized the manuscript removing reference to github, and we made the package pip installable with a supplemental example of how to fit the machine translation model.
Assigned Action Editor: ~Shay_B_Cohen1
Submission Number: 4519
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview