Abstract: Skip connections (SCs) are commonly employed in neural networks to facilitate gradient-based training and often lead to improved performance in deep learning. To implement SCs, a user writes custom modules along with torch.nn.Sequential which limits the extensibility of neural network design with SC. Despite their versatility and user-friendliness, there is an opportunity to create a standardized approach for organizing SCs in various applications. Establishing a clear organizing framework would significantly enhance the widespread adoption of this powerful technique, amplifying its impact. We introduce Sequential2D as an enhanced orchestrator which provides in-sights into not only the width and depth of the network but also a neural network's intra-connections such as SCs. The functionality provided by the Sequential2D architecture is a strict super-set of the functionality found in the common torch.nn.Sequential network. The 2D in Sequential2D refers to the two-dimensional matrix of functions, as opposed to the one-dimensional cascading array of functions in a Pytorch or Keras sequential container. As an example of our proposed approach, we demonstrate the effectiveness of our Sequential2D architecture by substantially improving the fine-tuning performance of the GPT-2 model with less than 1% additional parameters.
Loading