# FedPS

FedPS is a Python module designed for data preprocessing in Federated Learning, primarily leveraging aggregated statistics. The preprocessing workflow involves the following five steps:

1. Local Statistics Estimation: Clients estimate local statistics from their local data.
2. Aggregation: The server receives the local statistics and performs aggregation.
3. Global Parameter Calculation: The server calculates the global preprocessing parameters.
4. Parameter Distribution: Clients receive the preprocessing parameters from the server.
5. Data Preprocessing: Clients apply the preprocessing to their local data.

<div align=center>
    <img src="doc/overview.svg", alt="Overview", width="60%">
</div>

## Installation

### Dependencies

- Python (>= 3.9)
- Scikit-learn (>= 1.4)
- NumPy (>= 1.20)
- DataSketches
- PyZMQ

### Building from source

1. Create a Python env

```bash
conda create --name fedps python=3.9
conda activate fedps
```

2. Clone this project

```bash
git clone https://github.com/primihub/fedps.git
```

3. Build the project

```bash
cd fedps
pip install .
```

## Usage

1. Set up communication channels

```python
# Client1 channel
from fedps.channel import ClientChannel

channel = ClientChannel(
    local_ip="127.0.0.1", local_port=5556,
    remote_ip="127.0.0.1", remote_port=5555,
)
```

```python
# Client2 channel
from fedps.channel import ClientChannel

channel = ClientChannel(
    local_ip="127.0.0.1", local_port=5557,
    remote_ip="127.0.0.1", remote_port=5555,
)
```

```python
# Server channel
from fedps.channel import ServerChannel

channel = ServerChannel(
    local_ip="127.0.0.1", local_port=5555,
    remote_ip=["127.0.0.1", "127.0.0.1"],
    remote_port=[5556, 5557],
)
```

2. Specify `FL_type` and `role` in the preprocessor

- `FL_type`: "H" (Horizontal) or "V" (Vertical)

- `role`: "client" or "server"

```python
# Client1 code example
from fedps.preprocessing import MinMaxScaler

X = [[-1, 2], [-0.5, 6]]
est = MinMaxScaler(FL_type="H", role="client", channel=channel)
Xt = est.fit_transform(X)
print(Xt)
```

```python
# Client2 code example
from fedps.preprocessing import MinMaxScaler

X = [[0, 10], [1, 18]]
est = MinMaxScaler(FL_type="H", role="client", channel=channel)
Xt = est.fit_transform(X)
print(Xt)
```

```python
# Server code example
from fedps.preprocessing import MinMaxScaler

est = MinMaxScaler(FL_type="H", role="server", channel=channel)
est.fit()
```

3. Run the script

```bash
# Run in three terminals
python client1.py
python client2.py
python server.py
```

PS: See more cases in the [example](example) folder.

## Available preprocessing modules

- Discretization
  - [`KBinsDiscretizer`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.KBinsDiscretizer.html)

- Encoding
  - [`LabelEncoder`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html)
  - [`LabelBinarizer`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelBinarizer.html)
  - [`MultiLabelBinarizer`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MultiLabelBinarizer.html)
  - [`OneHotEncoder`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html)
  - [`OrdinalEncoder`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OrdinalEncoder.html)
  - [`TargetEncoder`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.TargetEncoder.html)

- Scaling
  - [`MaxAbsScaler`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MaxAbsScaler.html)
  - [`MinMaxScaler`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html)
  - [`Normalizer`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.Normalizer.html)
  - [`RobustScaler`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.RobustScaler.html)
  - [`StandardScaler`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html)

- Transformation
  - [`PowerTransformer`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PowerTransformer.html)
  - [`QuantileTransformer`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.QuantileTransformer.html)
  - [`SplineTransformer`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.SplineTransformer.html)

- Imputation
  - [`IterativeImputer`](https://scikit-learn.org/stable/modules/generated/sklearn.impute.IterativeImputer.html) (experimental)
  - [`KNNImputer`](https://scikit-learn.org/stable/modules/generated/sklearn.impute.KNNImputer.html)
  - [`SimpleImputer`](https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html)

## Differences from Scikit-learn

- Currently, this library does not support sparse data.
- [`KBinsDiscretizer`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.KBinsDiscretizer.html), [`StandardScaler`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html), and [`SplineTransformer`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.SplineTransformer.html) cannot set the `sample_weight` parameter in their fit methods.
- [`IterativeImputer`](https://scikit-learn.org/stable/modules/generated/sklearn.impute.IterativeImputer.html) does not support the `sample_posterior` and `n_nearest_features` parameters.
- [`KNNImputer`](https://scikit-learn.org/stable/modules/generated/sklearn.impute.KNNImputer.html) does not support custom weight funtion and distance metric.
