This code is tested on torch==2.1.0.dev20230514+cu118 and transformers==4.28.1 with Python 3.9.7. We suggest maintaining the package combination for reimplementation attempts. There is an issue describing a performance drop if using higher versions of the packages on the original repository based upon which we develop our codes. It is believed that the issue is due to the behavior change of the transformers package.