Abstract: Taylor-Series-Expansion (TSE) is a mathematics theorem. It proves that the expansion of the first few finite Taylor Series is a good approximation of a nonlinear function in most cases. Inspired by the TSE theorem, a brand-new TSE-based vision transformer is designed. TSE-based vision transformer uses the shared first-order TSE transformer block’s weight (in analogy with the Taylor-Series first-order term), its finite multiple multiplications (in analogy with the Taylor-Series expanded high-order terms), and the corresponding learnable TSE coefficients to approximate the naive vision transformer. In this manner, the TSE-based vision model reduces the memory burden but keeps a similar accuracy as the naive counterpart. Derived from adding the Taylor skip mechanism in training, the TSE-based vision transformer has good dynamic expansion capability. Experiment results show TSE-based models can boost actual deployment latency by 1.30-1.36× on A100 GPU and 1.34-1.45× on AGX Orin with negligible accuracy degradation on ImageNet classification, COCO detection, and ADE20K segmentation benchmarking tasks. Moreover, TSE-based optimization is orthogonal to model compression. Combining with the state-of-the-art vision transformer compression method, it can boost actual deployment performance by 1.70-1.87× and 3.29-3.61× of latency and throughput on A100 GPU, and 1.67-1.74× and 2.76-2.94× improvement of latency and throughput on AGX Orin.
External IDs:dblp:journals/pami/YuCG25
Loading