Abstract: Transformer architectures have revolutionized a broad spectrum of AI applications by leveraging
attention mechanisms for parallelized and long-range sequence processing. Despite
their remarkable success, building and customizing transformers remains prohibitively complex
for many domain experts who lack deep knowledge of low-level implementations. We
introduce AttentionSmithy, a modular software package that lowers the barrier to transformer
innovation by decomposing key components—attention modules, feed-forward networks,
normalization layers, and positional encodings—into reusable building blocks. By
disentangling architectural elements into well-defined interfaces, users can rapidly prototype,
adapt, and evaluate transformer variants without extensive coding overhead. Our framework
supports four distinct positional encoding strategies (sinusoidal, learned, rotary, and ALiBi)
and integrates seamlessly with neural architecture search (NAS) for automated design exploration.
We validate AttentionSmithy by replicating the original “Attention Is All You Need”
transformer under resource constraints, demonstrating near state-of-the-art performance on
a machine translation task. Leveraging the package’s integrated NAS capability, we made
the unexpected discovery that machine translation performance is maximized by combining
all available positional encoding methods—highlighting the complementary benefits of each
strategy. We further illustrate AttentionSmithy’s adaptability through gene-specific modeling,
where a variant of a BERT-style architecture achieves over 95% accuracy on downstream
cell type classification tasks using ranked transcriptomic data. These case studies underscore
AttentionSmithy’s core advantage: enabling specialized experimentation across diverse
application domains—from natural language processing to genomic analysis—by obviating
the need for labor-intensive, low-level framework manipulation. We anticipate that AttentionSmithy
will serve as a foundation for creative transformer-based solutions, expediting
research and development in numerous scientific and industrial fields.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=XtMHy3iRXE¬eId=XtMHy3iRXE
Changes Since Last Submission: We used the LaTex template this time, we further anonymized the manuscript removing reference to github, and we made the package pip installable with a supplemental example of how to fit the machine translation model.
Assigned Action Editor: ~Shay_B_Cohen1
Submission Number: 4519
Loading