# Cohort Squeeze: Beyond a Single Communication Round per Cohort in Cross-Device Federated Learning

This is the official implementation of the paper SPPM-AS, ICLR 2025 Review Only. All rights reserved by the No. 398 Submission Authors.

## Environment Setup
```bash
pip install -r requirements.txt
```

## Logistic Regression Experiments
The main directory denoted as $MAIN for this set of experiments is `./convex_reg`.
  
### Datasets
```
cd $MAIN/datasets/
```
* **w6a** dataset ```wget https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/w6a```,
* **ijcnn1.bz2** dataset ```wget https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/ijcnn1.bz2```,
* **mushrooms** dataset ```wget https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/mushrooms```,
* **a6a** dataset ```wget https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/a6a```.

### Experiments
- Exploring various inexact proximal steps: datasets=("a6a" "mushrooms" "ijcnn1.bz2" "a9a"), opt_iters=(1 2 4 8 16 32 64 128), lrs=(1e-5 1e-4 1e-3 1e-2 1e-1 1 1e1 1e2 1e3 1e4 1e5).

```
python minibatch_SPPM_batch_inexact.py --dataset \"$dataset\" --inexact --opt_iter \"$opt_iter\" --max_it 5000 --opt_criterion maxiter --lr \"$lr\
```

- Different local optimizer: datasets=("a6a" "a9a"), methods=("BFGS" "CG"), opt_iters=(1 2 4 8 16 32 64), lrs=(1e-3 1e-2 1e-1 1 1e1 1e2 1e3).

```
"python minibatch_SPPM_batch_inexact_opt_methods.py --dataset \"$dataset\" --inexact --opt_iter \"$opt_iter\" --max_it 2000 --opt_criterion maxiter --opt_method \"$method\"  --lr \"$lr\"
```

The combination of the above parameters can verify and emphasize different parts of the results.

## Deep Learning Experiments

### Python Version
We recommend using the latest stable version of Python, which is **3.12.3** as of April 17th, according to the official Python documentation: <https://devguide.python.org/versions/#supported-versions> and <https://docs.python.org/3/whatsnew/3.12.html>.

To install Python 3.12.3 using Pyenv, run:
```bash
pyenv install 3.12.3
```

### Environment Creation

We use Pyenv to manage our environment. To install Pyenv, follow the instructions on the official Pyenv GitHub page: <https://github.com/pyenv/pyenv>.

Once Pyenv is installed, you may need to install and activate the Pyenv virtualenv plugin using the following commands:
```bash
git clone https://github.com/pyenv/pyenv-virtualenv.git $(pyenv root)/plugins/pyenv-virtualenv
eval "$(pyenv init --path)"
eval "$(pyenv virtualenv-init -)"
source ~/.bashrc  # or ~/.zshrc
```
Then, create a virtual environment with the latest stable release of Python:
```bash
pyenv virtualenv 3.12.3 sppm_as
```
To activate and deactivate the environment, use:
```bash
pyenv activate sppm_as
pyenv deactivate
```

### Dependencies

Install dependencies in the activated environment by running:
```bash
pip install -r requirements.txt
```

### Using FedLab Datasets**

To use FedLab datasets, install the dependencies in the `FedLab` submodule by running:
```bash
pip install -r FedLab/requirements.txt
```
Additionally, make the following changes to the FedLab code:

* In `FedLab/datasets/pickle_dataset.py`, update line 28 to:
```python
from datasets.leaf_datasets import FemnistDataset, ShakespeareDataset, CelebADataset, Sent140Dataset
```
* In `FedLab/datasets/leaf_dataset.py`, update line 5 to:
```python
from datasets.nlp_utils.util import Tokenizer, Vocab
```
* In `FedLab/datasets/femnist/preprocess/data_to_json.py`, update line 64 to:
```python
gray.thumbnail(size, Image.Resampling.LANCZOS)
```
* In `FedLab/datasets/gen_pickle_dataset.sh`, update line 15 to:
```bash
python3 pickle_dataset.py \
```
To generate the FEMNIST split, run the script in the `FedLab/datasets/femnist` folder:
```bash
bash ./preprocess.sh -s niid --sf 1.0 -k 0 -t sample --tf 0.9 --smlpseed 42 --spltseed 42
```
Create the pickle dataset in the `FedLab/datasets` folder by running:
```bash
./gen_pickle_dataset.sh "femnist" "../datasets" "./pickle_datasets"
```

### Running the Experiment Script

To launch the experiment script, use `scripts/script.py`. Here's an example command:
```bash
python3 scripts/script.py \
        --seed 42 \
        --folder logs/ \
        --epochs 1000 \
        --wandb \
        --gamma 1.0 \
        --cohort_optimizer Prox \
        --worker_optimizer Adam \
        --worker_optimizer_steps 3 \
        --worker_optimizer_hparams lr=$lr \
        --server_optimizer Adam \
        --server_optimizer_steps 3 \
        --server_optimizer_hparams lr=$lr \
        --sampler Nice \
        --minibatch_size 10 \
        --device cuda \
        --process_count 1 \
        --dataset FEMNIST_original \
        --workers_count 100
```
To get a description of all possible parameters, run:
```bash
python3 scripts/script.py -h
```