Code for Experiments

Run cross_training.py for different synthetic tasks and modifications

Run "test.py" to produce the training process of the dual model

We implemented the fast_attention mechanism in the non-causal setting under the Performer architecture [1] using PyTorch, as referenced in "fast_attention.py".

For more realistic task, we use the code form [transformers/examples/pytorch/text-classification at main · huggingface/transformers · GitHub](https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification) implemented by huggingface and run the example "run_glue.py" for our modifications. 

We modified the code related to the attention mechanism in BERT, and the changes can be found in bert_modified.py.



[1] Choromanski K, Likhosherstov V, Dohan D, et al. Rethinking attention with performers[J]. arXiv preprint arXiv:2009.14794, 2020.