ShrinkNAS : Single-Path One-Shot Operator Exploratory Training for Transformer with Dynamic Space ShrinkingDownload PDF

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: Neural Architecture Search (NAS) for Transformer has shown its growing capabilities in exploiting the benefits of various Transformer architecture configurations. Recent studies envision the diverse potential of introducing unprecedented Transformer operators (OPs, such as Convolution) to its structure, yet the existing methods of doing so are all time-consuming. Traditionally, Single-Path One-Shot (SPOS) models enable efficient search over a vast set of OPs. However, existing SPOS methods on Transformer focus only on dimensional configurations of the vanilla Transformer OP (e.g., Multi-head Attention), and did not consider introducing other OPs. This paper explores the possibility of including OPs in the Transformer-based SPOS architecture search to discover better Transformer structures with the high efficiency facilitated in the SPOS category. To achieve that, we propose Dynamic Space Shrinking (DSS), a novel method that resolves problems brought from newly added OPs by dynamically keeping the current sample space containing subnets with good configurations and performance. We implemented DSS in ShrinkNAS, the first SPOS one-shot inter-OP model for Transformer. Our evaluation shows that ShrinkNAS is of much higher elasticity by finding a better structure beating the human-designed ones under tight constraint (<10M parameters), while existing intra-OP SPOS methods are not even close.
0 Replies

Loading