1. The PyTorch-style implementation is provided in TDRL.py.
2. In TDRL.py, you can use the class of "TDRL" to replace arbitrary linear layers in ViTs or oher models during traing, and perform .merge() function to merge re-parameterized architectures.
3. To set the pyramid-wise version defined as P-WNS in the paper, you can set type='pyramid', and use the width variables to control the N.