DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training

Joya Chen; Kai Xu; Yuhui Wang; Yifei Cheng; Angela Yao

DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training

Joya Chen, Kai Xu, Yuhui Wang, Yifei Cheng, Angela Yao

Published: 01 Feb 2023, Last Modified: 14 Jan 2026ICLR 2023 posterReaders: Everyone

Keywords: dropping intermediate tensors, dropping activations, activation compressed training, top-k, vision transformer, cnn

Abstract: A standard hardware bottleneck when training deep neural networks is GPU memory. The bulk of memory is occupied by caching intermediate tensors for gradient computation in the backward pass. We propose a novel method to reduce this footprint - Dropping Intermediate Tensors (DropIT). DropIT drops min-k elements of the intermediate tensors and approximates gradients from the sparsified tensors in the backward pass. Theoretically, DropIT reduces noise on estimated gradients and therefore has a higher rate of convergence than vanilla-SGD. Experiments show that we can drop up to 90\% of the intermediate tensor elements in fully-connected and convolutional layers while achieving higher testing accuracy for Visual Transformers and Convolutional Neural Networks on various tasks (e.g., classification, object detection, instance segmentation). Our code and models are available at https://github.com/chenjoya/dropit.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Applications (eg, speech processing, computer vision, NLP)

TL;DR: DropIT can save memory & improve accuracy, providing a new perspective of dropping in activation compressed training than quantization.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 4 code implementations](https://www.catalyzex.com/paper/dropit-dropping-intermediate-tensors-for/code)

15 Replies

Loading