This is supplementary source codes for "Generalized Probabilistic Attention Mechanism in Transformers" submitted to ICLR2025.

* The ./data/ directory contains small amount of IWSLT14 English-German dataset (10K pairs).
* We will publish all of our data and source codes on the web for reproducibility after the paper gets published.

(0) Required Python Packages (Python version: 3.8.18, CUDA:12.2)
--- torch : 2.2.1+cu121
--- numpy : 1.24.4
--- pandas : 2.0.3
--- sacrebleu : 2.4.0
--- sacremoses : 0.1.1

(1) File descriptions
--- nmt_run.py : run of our experiments
--- nmt_main.py (or nmt_main_negatt.py) : main iterator of training, it defines model, optimizer and run training with a given dataset.
--- nmt_model_preln.py (or [nmt_model_negatt_preln_ind, nmt_model_negatt_preln_variants, nmt_model_admin, nmt_model_negatt_admin_ind].py) : model classes ('preln_nmt', 'indnegatt_preln_nmt', 'variant_preln_nmt', 'admin_nmt', 'indnegatt_admin_nmt') # Our proposed dual-attention GPAM models are 'indnegatt_preln_nmt' and 'indnegatt_admin_nmt'. Mainly, 'MultiHeadAttention_Neg' contains the major modificaitons compared to its baseline.
--- nmt_trans.py : testing methods such as translation and computing BLEU scores
--- dataset.py : scripts of dataset loaders

--- train_iwslt14_ende.sh : shell script of running IWSLT14 English-German training (important arguments are below)
----- 'NEGATT_MODE': type of alternative attetions (e.g., const, separam, coda, non,nap, periodic, shaped, valueskip) The types, 'const' and 'separam', are our proposed methods.
----- 'POS_LAMBDA' & 'NEG_LAMBDA': pre-defined values of lambda (for 'const' mode)
--- test_iwslt14_ende.sh : shell script of running IWSLT14 English-German testing
----- 'TEST_SUBDIR': the directory of trained model that you are going to test


(2) Training of NMT model on IWSLT14 English-German dataset

--- command : bash train_iwslt14_ende.sh {GPU_IDs} {# of GPUs} {model_name}
----- examples
        bash train_iwslt14_ende.sh 0 1 preln_nmt
        bash train_iwslt14_ende.sh 0 1 indnegatt_preln_nmt
        bash train_iwslt14_ende.sh 0,1 2 indnegatt_preln_nmt    #for multi-GPU

(3) Testing of NMT model on IWSLT14 English-German dataset

- You should set 'TEST_SUBDIR' argument in 'test_iwslt14_ende.sh' script as the directory of the trained model.
* it doesn't run with multi-GPU
--- command : bash test_iwslt14_ende.sh {GPU_ID} {1} {model_name}
----- examples
        bash test_iwslt14_ende.sh 0 1 preln_nmt
        bash test_iwslt14_ende.sh 0 1 indnegatt_preln_nmt
