UniForm: A Reuse Attention Mechanism for Efficient Transformers on Resource-Constrained Edge Devices

Seul-Ki Yeom; Tae-Ho Kim

UniForm: A Reuse Attention Mechanism for Efficient Transformers on Resource-Constrained Edge Devices

Seul-Ki Yeom, Tae-Ho Kim

Published: 05 Mar 2025, Last Modified: 14 Apr 2025SCOPE - ICLR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: Main paper track (up to 5 pages excluding references and appendix)

Keywords: Efficient Transformers, Attention Mechanism, Memory Optimization

Abstract: Transformer-based architectures have demonstrated remarkable success across various domains but remain challenging to deploy on edge devices due to high memory and computational demands. In this paper, we propose UniForm (Unified TransFormer), a novel transformer architecture that unifies multi-head attention computations into a shared attention mechanism, Reuse Attention, and integrates it into a lightweight, scalable backbone for efficient inference on edge devices, without compromising accuracy. By consolidating redundant operations into a unified representation, UniForm effectively reduces memory overhead and computational complexity, enabling seamless deployment in resource-constrained environments. Experiments on ImageNet-1K and downstream tasks show that UniForm achieves state-of-the-art accuracy while improving inference speed and memory efficiency. Notably, UniForm-l attains 76.7% Top-1 accuracy on ImageNet-1K with a 21.8ms inference time on Jetson Nano, achieving up to a 5x speedup over competing benchmarks. These results highlight UniForm’s versatility across GPUs and edge platforms, demonstrating its potential for real-time AI applications in low-resource settings. Code available at https://github.com/seulkiyeom/uniform.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Submission Number: 12

Loading