Capacity Matters: Investigating Transformer Models for Real-World Data Memorization

ACL ARR 2025 February Submission361 Authors

06 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Transformer models' memorization capacity studies often focus on theoretical bounds or use synthetic datasets that lack real-world complexity. This study systematically evaluates how model architecture and data configurations influence the capacity of decoder transformers using datasets derived from the Systematized Nomenclature of Medicine (SNOMED) knowledge graph: triplets, representing static connections, and sequences, simulating complex relation patterns. Our findings highlight key factors affecting training dynamics and memorization. Embedding size is the primary determinant of learning speed and capacity, while additional layers provide limited benefits and may hinder performance on simpler datasets. Activation functions play a crucial role, with Softmax demonstrating greater stability and capacity. Additionally, increased dataset complexity enhances final memorization. These insights improve our understanding of transformer memory mechanisms and provide a framework for optimizing model design with structured real-world data.
Paper Type: Long
Research Area: Language Modeling
Research Area Keywords: pre-training, scaling, applications, robustness
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches to low-resource settings
Languages Studied: English
Submission Number: 361
Loading