GraViT-E: Gradient-based Vision Transformer Search with Entangled Weights

Rhea Sanjay Sukthanker; Arjun Krishnakumar; Sharat Patil; Frank Hutter

GraViT-E: Gradient-based Vision Transformer Search with Entangled Weights

Rhea Sanjay Sukthanker, Arjun Krishnakumar, Sharat Patil, Frank Hutter

Published: 21 Oct 2022, Last Modified: 05 May 2023NeurIPS 2022 Workshop MetaLearn PosterReaders: Everyone

Keywords: Neural Architecture Search, Transformers, Weight Entanglement

TL;DR: Efficient Differentiable Neural Architecture Search method with Weight Entanglement for Transformers

Abstract: Differentiable one-shot neural architecture search methods have recently become popular since they can exploit weight-sharing to efficiently search in large architectural search spaces. These methods traditionally perform a continuous relaxation of the discrete search space to search for an optimal architecture. However, they suffer from large memory requirements, making their application to parameter-heavy architectures like transformers difficult. Recently, single-path one-shot methods have been introduced which often use weight entanglement to alleviate this issue by sampling the weights of the sub-networks from the largest model, which is itself the supernet. In this work, we propose a continuous relaxation of weight entanglement-based architectural representation. Our Gradient-based Vision Transformer Search with Entangled Weights (GraViT-E) combines the best properties of both differentiable one-shot NAS and weight entanglement. We observe that our method imparts much better regularization properties and memory efficiency to the trained supernet. We study three one-shot optimizers on the Vision Transformer search space and observe that our method outperforms existing baselines on multiple datasets while being upto 35% more parameter efficient on ImageNet-1k.

0 Replies

Loading