Sequentia12D: Organizing Center of Skip Connections for Transformers

Harsh Nilesh Pathak; Randy C. Paffenroth; Quincy Hershey

Sequentia12D: Organizing Center of Skip Connections for Transformers

Harsh Nilesh Pathak, Randy C. Paffenroth, Quincy Hershey

Published: 01 Jan 2023, Last Modified: 10 Jan 2025ICMLA 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Skip connections (SCs) are commonly employed in neural networks to facilitate gradient-based training and often lead to improved performance in deep learning. To implement SCs, a user writes custom modules along with torch.nn.Sequential which limits the extensibility of neural network design with SC. Despite their versatility and user-friendliness, there is an opportunity to create a standardized approach for organizing SCs in various applications. Establishing a clear organizing framework would significantly enhance the widespread adoption of this powerful technique, amplifying its impact. We introduce Sequential2D as an enhanced orchestrator which provides in-sights into not only the width and depth of the network but also a neural network's intra-connections such as SCs. The functionality provided by the Sequential2D architecture is a strict super-set of the functionality found in the common torch.nn.Sequential network. The 2D in Sequential2D refers to the two-dimensional matrix of functions, as opposed to the one-dimensional cascading array of functions in a Pytorch or Keras sequential container. As an example of our proposed approach, we demonstrate the effectiveness of our Sequential2D architecture by substantially improving the fine-tuning performance of the GPT-2 model with less than 1% additional parameters.

Loading