# OVA-LP: A Simple and Efficient Framework for Federated Learning on Non-IID Data

This repository contains the official implementation for the paper "OVA-LP: A SIMPLE AND EFFICIENT FRAMEWORK FOR FEDERATED LEARNING ON NON-IID DATA," submitted to ICLR 2026.

## Overview

Federated Fine-tuning (FFT) is a powerful paradigm for adapting pretrained models to decentralized data. However, its performance degrades under heterogeneous (Non-IID) client distributions due to "local drift"—a phenomenon where client-level update divergences introduce systematic bias and amplified variance into the global model.

**OvA-LP**, the framework presented in this work, is a minimalist approach designed not to correct drift post-hoc, but to suppress it at its source.

## Key Features

* **High Robustness on Non-IID Data**: OvA-LP retains **95.9%** of its IID accuracy on average across extreme Non-IID settings (shard-1, shard-2, and Dirichlet partitions). This significantly outperforms state-of-the-art baselines like PFPT (10.1%) and FFT-MoE (34.5%).
* **Drift Suppression at the Source**: By preserving pretrained feature geometry and decoupling logits with a One-vs-All (OvA) head, the framework prevents the mechanisms that amplify drift from arising in the first place.
* **Efficiency**: The framework precomputes encoder features only once, making the per-round training cost nearly independent of the encoder's size.
* **Robustness to Label Noise**: OvA-LP demonstrates innate robustness, consistently reducing accuracy degradation under both symmetric and asymmetric label noise.

## How it Works

OvA-LP is built on the combination of three core components:
1.  **Frozen Encoder**: A pretrained encoder is kept frozen to preserve its powerful feature representations. Client data is passed through it once to generate precomputed feature vectors.
2.  **One-vs-All (OvA) Heads**: The multi-class classification task is decomposed into independent binary classifiers for each class. This eliminates the cross-class coupling caused by the softmax function, which prevents the bias and variance amplification caused by label skew.
3.  **Two-Stage Training**: A two-stage schedule is used to stabilize training and accelerate convergence.
    * **Stage 1 (Positive-Only)**: In the initial round(s), each client trains its corresponding OvA heads using only positive examples for the classes it holds. This quickly pulls the classifier weights toward the class centroids.
    * **Stage 2 (Positive + Negative)**: In subsequent rounds, both positive and negative samples are used to refine the decision margins while maintaining the stability achieved in Stage 1.

![OvA-LP Architecture](./OvA-LP.png)
*Figure 1: Overall structure of OvA-LP*

## Configuration

Experiments are configured using a dictionary, typically named `cfg` in `main.py` or `examples.ipynb`. Below is a guide to the key parameters.

* `model_name`: Specifies the model to use.
    * `"lp_ova"`: The main model proposed in the paper.
    * `"pfpt"`, `"fmoe"`: Baseline models used for comparison.
    * `"lp_softmax"`: An ablation variant using a standard softmax head.
* `aggregator_name`: Specifies the server-side aggregation algorithm (e.g., `"fedavg"`).
* `data`: Controls the dataset and data partitioning.
    * `dataset`: `"cifar100"` or `"tinyimagenet"`.
    * `num_clients`: The total number of clients in the federation.
    * `partition`: Defines the Non-IID data distribution. Can be a string like `"iid"` or a dictionary for more complex setups (e.g., `{"type": "dirichlet", "alpha": 0.01}`). Supported types include `"iid"`, `"dirichlet"`, `"shard1"`, `"shard2"`, `"zipf"`, and `"feature_skew"`.
* `encoder`: Configures the frozen backbone model.
    * `type`: `"vit"` or `"dinov2"`.
    * `size`: The model size, e.g., `"base"`, `"large"`.
    * `patch`: The patch size of the Vision Transformer, e.g., `16`, `32`.
    * `precompute`: If `True`, client features are precomputed once before training begins, which is a core part of the OvA-LP's efficiency.
* `train`: Sets the training hyperparameters.
    * `num_rounds`: The total number of communication rounds.
    * `local_epochs`: The number of training epochs each client performs per round.
    * `lr`: The learning rate for the local optimizer.
    * `active_client_ratio`: The fraction of clients that participate in each round (e.g., `1.0` for full participation).
* `model`: Contains model-specific parameters.
    * For `lp_ova`: includes `num_classes` and `num_stage1_rounds`. Setting `num_stage1_rounds` to `1` runs the **2-stage** version from the paper, while setting it to `0` runs the **w/o 2-stage** version.
    * This section will contain different keys for other models like `pfpt` (e.g., `num_tokens`) or `fmoe` (e.g., `num_experts`).

## Setup

Follow these steps to set up the project.

1.  **Create Environment**
    ```bash
    conda create -n ovalp python=3.10
    conda activate ovalp
    ```

2.  **Install Dependencies**
    ```bash
    pip install -r requirements.txt
    ```

3.  **Prepare Datasets**
    * The CIFAR-10/100 datasets will be downloaded automatically by `torchvision` upon the first run.
    * To use the TinyImageNet dataset, you must first download it and then run the provided script to organize its validation set into the `ImageFolder` format.
        ```bash
        python prepare_tinyimagenet.py
        ```

## Running Experiments & Reproducing Results

You can reproduce the main results from the paper by running the `examples.ipynb` Jupyter Notebook. The notebook contains the code and configurations for key experiments, including the ablation studies and performance comparisons against baseline models.

## Citation

If you find this work useful for your research, please consider citing our paper:

```bibtex
@article{anonymous2026ovalp,
  title={{OVA-LP}: A Simple and Efficient Framework for Federated Learning on Non-IID Data},
  author={Anonymous Authors},
  journal={Under review as a conference paper at ICLR 2026},
  year={2026}
}
```
