Scaled Signed Averaging Improves In-Context and Early Learning Benchmark Performance in Small Transformers

Scaled Signed Averaging Improves In-Context and Early Learning Benchmark Performance in Small Transformers

ACL ARR 2026 January Submission3877 Authors

04 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: In-context learning, Generalization, Attention, Softmax

Abstract: While Large Language models' abilities for in-context learning (ICL) have had much success, they have limitations on simple semantic tasks involving quantifiers like every and some, as well as on tasks with linear functions. We analyze those limitations and identify Softmax, the scoring function in the attention mechanism, as a contributing factor to these limitations. Our scaled signed averaging (SSA), a novel scoring function mitigates these limitations. SSA significantly improves performance on our ICL tasks. In addition, SSA outperforms transformer models with Softmax on several early learning NLP benchmarks and linguistic probing tasks on zero and few-shot settings.

Paper Type: Long

Research Area: Machine Learning for NLP

Research Area Keywords: Generalization, Model architectures, Analysis, Few-shot learning

Contribution Types: Model analysis & interpretability

Languages Studied: English

Submission Number: 3877

Loading