Abstract: Large language model performance has advanced rapidly in recent years, driven by technical improvements in areas like model architecture, scaling, and reinforcement learning. However, much of our understanding of these models remains rooted in static evaluations calculated post-training. While informative, these snapshots offer limited insight into how models learn, adapt, and transform internally during training, overlooking dynamic processes and representational shifts that occur throughout fine-tuning, potentially concealing important aspects of model behavior. We aim to contribute to ongoing efforts to open the ‘black box’of language models by analyzing temporal information dynamics during fine-tuning. Our findings suggest that tracking these internal dynamics demonstrates both training-regimespecific and task-specific differences in learning and may eventually contribute to applications such as change point detection or adaptive training strategies. Ultimately, this work moves toward a more nuanced, mathematical formulation of what learning does to a model, highlighting the constant flux of representational change that underlies seemingly stable performance improvements.
Loading