# Readme
This code is a frozen version code generated the experiments after the review.
It will be updated at https://github.com/DavidLapous/multipers.

## Installation
#### Requirements
```
conda create -n python311
conda activate python311
conda install python=3.11 cxx-compiler tbb tbb-devel numpy matplotlib gudhi cython shapely cycler tqdm boost-cpp setuptools pytest llvm-openmp cmake scikit-learn -c conda-forge
pip install filtration-domination pykeops
```
#### Installation
In a terminal with the required dependencies,
```
pip install .
```

#### For mac users
Due to the clang compiler, one may have to disable a compilator optimization to compile `multipers`: in the `setup.py` file, add the
```bash
-fno-aligned-new
```
line in the `extra_compile_args` list. You should have should end up with something like the following.
```python
extensions = [Extension(f"multipers.{module}",
		sources=[f"multipers/{module}.pyx"],
		language='c++',
		extra_compile_args=[
			"-Ofast",
			"-std=c++20",
			"-fno-aligned-new",
			"-Wall",
		],
		define_macros=[("NPY_NO_DEPRECATED_API", "NPY_1_7_API_VERSION")],
		libraries=["tbb"]
	) for module in cython_modules
]
```


## Launch the experiments
The experiments can be launched with an updated `compute.py` from the submission code. The argument were slightly changed, so we provide some documentation below :
----------------------------------------------------------------------------
 - dataset (str) : The dataset to classify, e.g., UCR/Coffee or graphs/BZR
 - pipeline (str):

   - multismi -> signed measure convolution/image

   - multismk -> signed measure kernel

   - multisurface -> either euler surface or hilbert function surface depending on the degrees

 - final_classifier (str) : for vectorizations, the final classifier used to classify. Possibilities : rf (RandomForest), xgboost, lda,...

 - filtrations (str): for graphs, the filtrations used, e.g. hks_10, ricciCurvature, degree, cc, ...

 - train_k (int): number of cross validations for the hyperparmeters

 - test_k (int or float): number of folds for the test. when between 0 and 1, its replaced by the test-size.

 - in_resolution (int)/in_strategy (str) : the resolution and strategy to create the grid used to compute the signed measure on, from simplextrees values. Note that if `in_individual_grid` is True, then each simplextree create its own grid to compute the signed measure. Exact computation are fast enough on UCR/graphs.

 - out_resolution/out_strategy : the resolution and strategy used to compute the vectorization grid, from the signed measures support.

 - num_directions is the number of lines of sliced wasserstein

 - complex is the simplicial complex used for point clouds : alpha or rips

 - kernel : the kernel used for density estimation. It does not affect DistanceToMeasure. Either gaussian or exponential for the moment.

 - drop quantile (float) : the in_strategy will ignore this proportion of extreme points

 - num_rescale (int) : rescales the signed measures with different weights, after normalization before feeding this to the vectorization/slicedwasserstein

 - rips_threshold (float), as expected. When negative its replaced by `max_diameter_of_dataset x abs(rips_threshold)`

 - test : when true, computes only a test, ignore these values.

 - degrees (int): Homology degrees to compute. When negative/None, its the euler characteristic.

 - kde_bandwidths/dtm_masses : the kde bandwidths and distance to measure masses to define a codensity estimation on point clouds. Can be used together, i.e., the cross validation will choose either kde or dtm, and pick a good bandwidth/mass.
----------------------------------------------------------------------------

For instance, the UCR/GunPointOldVersusYoung dataset can be crossvalidated with a command such as the following one

```python
python compute.py --pipeline multismi --dataset UCR/GunPointOldVersusYoung --degrees 0 --degrees 1 --train_k 10 --test_k 0.3 --in_strategy exact --num_rescales 3 --final_classifier rf --out_resolution 100 --out_resolution 50 --out_resolution 20 --out_strategy regular --out_strategy partition --complex rips --rips_threshold -1 --kde_bandwidths -0.001 --kde_bandwidths -0.01 --kde_bandwidths -0.1 --kde_bandwidths -0.2 --kde_bandwidths -0.3
```

and the proteins graph dataset with
```python
python compute.py --dataset graphs/PROTEINS --filtrations hks_10 --filtrations ricciCurvature --pipeline multismk --train_k 5 --test_k 5 --degrees -1 --in_strategy exact --num_rescales 3 --out_strategy regular_closest --out_strategy partition --out_resolution 100 --out_resolution 50 --out_resolution 20 --final_classifier xgboost --drop_quantile 0.
```
