Attention Scheme Inspired Softmax Regression

Zhihang Li; Zhizhou Sha; Zhao Song; Mingda Wan

Attention Scheme Inspired Softmax Regression

Zhihang Li, Zhizhou Sha, Zhao Song, Mingda Wan

Published: 06 Mar 2025, Last Modified: 15 Apr 2025ICLR 2025 DeLTa Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: long paper (up to 8 pages)

Keywords: Attention mechanism, Softmax regression

Abstract: In this work, we introduce AttReg (Attention-Inspired Softmax Regression), a novel theoretical framework designed to advance the understanding of attention mechanisms within large language models (LLMs). In the area of convex optimization such as using the central path method to solve linear programming, the softmax function has been used as a crucial tool for controlling the progress and stability of potential functions [Cohen, Lee and Song STOC 2019, Brand SODA 2020]. By redefining the softmax regression problem through an attention-inspired approach, we establish a regularized variant, RAttReg (Regularized Attention-Inspired Softmax Regression), which incorporates an exponential activation function tailored for enhanced convergence and efficiency. Our comprehensive analysis encompasses the formulation of new problem definitions, the derivation of first and second-order derivatives to understand gradient dynamics, and a theoretical investigation into the convergence properties of the proposed models. We also develop an efficient computational approach using an adapted Newton method, supported by a sparsification technique, to address the challenges of high dimensionality and data sparsity inherent in LLMs. The implications of this study are significant, offering deeper insights into the operational dynamics of attention mechanisms and opening new avenues for optimizing the training processes of advanced neural network architectures. In a certain sense, our provable convergence result provides theoretical support for why we can use greedy algorithms to train the softmax function in practice.

Submission Number: 64

Loading