# Federated Contrastive GFlowNets 

Code for reproducing the experiments. 

The appendix is contained in the file `ICLR2024appendix.pdf`. 

## Virtual environment 

Create a virtual environment by executing 

```sh 
$ pyenv virtualenv 3.11.1 fcgfn
$ pyenv activate fcgfn 
```

Then, install the packages through 

```sh 
$ pip install -r requirements.txt  
```

This enables the reproduction of the experiments

## Experiments 

The following commands are responsible for training and carrying out inference over the domains we considered throughout our experiments. 

**FC-GFlowNet for generation of multisets.** For the experiments regarding the distributed generation of multisets using FC-GFlowNets, execute 

```sh 
cd experiments 
DISABLE_TQDM=True python -u federated.py \
                    --domain multisets \
                    --seed 42 \
                    --criterion cb \
                    --multisets_size 8 \
                    --multisets_warehouse_size 10 \
                    --epochs 5000 \
                    --lr 3e-3 \
                    --batch_size_train 512 \
                    --batch_size_eval 1024 \
                    --num_batches_eval 1024 \
                    --num_clients 5 \
                    --hidden_dim 64 \
                    --emb_dim 10 \
                    --device cuda
```

**FC-GFlowNet for generation of sequences.** The experiments corresponding to the generation of sequences of limited size is reproducible by 

```sh 
DISABLE_TQDM=True python -u federated.py \
                    --domain sequences \
                    --seed 42 \
                    --criterion cb \
                    --sequences_max_size 6 \
                    --sequences_vocab_size 6 \
                    --epochs 5000 \
                    --lr 3e-3 \
                    --batch_size_train 512 \
                    --batch_size_eval 1024 \
                    --num_batches_eval 1024 \
                    --num_clients 5 \
                    --hidden_dim 64 \
                    --emb_dim 6 \
                    --device cuda
```

**FC-GFlowNet for Bayesian phylogenetic inference.** To undertake distributed Bayesian inference in the space of phylogenetic trees, execute 

```sh 
DISABLE_TQDM=True python -u federated.py \
                    --domain phylogenetics \
                    --seed 42 \
                    --criterion cb \
                    --phylogenetics_num_leaves 7 \
                    --phylogenetics_vocab_size 4 \
                    --epochs 5000 \
                    --lr 3e-3 \
                    --batch_size_train 512 \
                    --batch_size_eval 1024 \
                    --num_batches_eval 100 \
                    --num_clients 5 \
                    --hidden_dim 64 \
                    --device cuda
```

**FC-GFlowNet to simulate the Grid World.** To simulate an agent moving around a 12x12 grid, execute 

```sh 
cd experiments 
DISABLE_TQDM=True python -u federated.py \
                    --domain grid \
                    --seed 42 \
                    --criterion cb \
                    --grid_width 12 \
                    --grid_height 12 \
                    --epochs 128 \
                    --lr 3e-3 \
                    --batch_size_train 128 \
                    --batch_size_eval 128 \
                    --num_batches_eval 128 \
                    --num_clients 3 \
                    --hidden_dim 64 \
                    --device cuda
```

**Comparison between the different criteria.** To undertake a comparative analysis of the differently available criteria for training GFlowNets, execute 

```sh 
DISABLE_TQDM=True python -u criteria.py \
                    --domain $domain \
                    --seed $seed \
                    --criterion $criterion \
                    --grid_width 12 \
                    --grid_height 12 \
                    --multisets_size 8 \
                    --multisets_warehouse_size 10 \
                    --sequences_max_size 6 \
                    --sequences_vocab_size 6 \
                    --phylogenetics_num_leaves 7 \
                    --phylogenetics_vocab_size 4 \
                    --epochs_per_step 20 \
                    --epochs 512 \
                    --lr 3e-3 \
                    --batch_size_train 512 \
                    --batch_size_eval 512 \
                    --num_batches_eval 32 \
                    --hidden_dim 64 \
                    --emb_dim 16 \
                    --device cuda
```

for `seed` in (42, 84, 168), `domain` in (grid, sequences, multisets), and `criterion` in (tb, db, cb, fl). Noticeably, the learning rate for the TB constraint is hard-coded within the scripts. 

**Variational approximations to the pooled distributions.** To assess the effectiveness of a global approximation to the pooled distribution, execute 

```sh 
DISABLE_TQDM=True python -u variational.py \
                    --domain $domain \
                    --seed $seed \
                    --criterion cb \
                    --grid_width 12 \
                    --grid_height 12 \
                    --multisets_size 8 \
                    --multisets_warehouse_size 10 \
                    --sequences_max_size 6 \
                    --sequences_vocab_size 6 \
                    --phylogenetics_num_leaves 7 \
                    --phylogenetics_vocab_size 4 \
                    --epochs 5000 \
                    --lr 3e-3 \
                    --batch_size_train 256 \
                    --batch_size_eval 512 \
                    --num_batches_eval 1000 \
                    --num_clients 5 \
                    --hidden_dim 64 \
                    --emb_dim 16 \
                    --device cuda
```

for `domain` in (grid, multisets, sequences) and `seed` in (42, 84, 168). 

**Remark.** If you happen to have access to a Slurm managed cluster, the files at `experiments/*.slrm` enable you to orchestrate the processes and execute them in parallel. 
