# Baselines
Available method names are:
- random (random baseline): no pre-processing parameters, no parameters
- dpp (k-DPP baseline): no pre-processing parameters, no parameters
- clustering: no parameters (since the number of clusters k is the number of selected items)
- three_mmr (wrapper around MMR, DGDS, and MUSS)

For `three_mmr`, pre-processing parameters are:
- n_clusters
- algorithm: type of clustering, either `packing_k_means` or `packing_random`

For `three_mmr`, parameters are:
- m: number of clusters to select
- lambda_clusters: lambda_c from the paper
- lambda_points: lambda from the paper
- add_max_q_to_union: whether to add set $S^{\ast}$ in Algorithm 2

MMR, clustering, and DGDS are implemented using particular parameterization of `three_mmr`.

For example, to use MMR, set `n_clusters` and `m` to 1
For example, to use DGDS, set `algorithm` to `packing_random` and `add_max_q_to_union` to False

# Experiments with RAG
The script that runs experiments requires a JSON configuration file. This file prescribes methods to run, datasets, and parameters.

Note that to run experiments you need an AWS account with access to Haiku model.

## Configuration files
In order to understand the structure of a configuration file, use the following command.
```
> cd muss/src
> python ./generate_settings_rag.py
```

This will generate example configurations (`.json` files) in `muss/outputs`.

You can manually edit a config file, or create new config files as needed.

A configuration consists of the following elements:
- label: a human readable name for a method with specific parameter settings
- method_key: see below
- pre_proc_params: parameters based on the method
- parameters: parameters based on the method


## Running experiments
Use `run_experiment_rag` as per description
```
> cd muss/src
> python ./run_experiment_rag.py --help
```

For example
```
python ./run_experiment_rag.py -i ../outputs/ours-000.json -d 1000 -n 50
```

Results will be recorded in `outputs/results*.csv` with more details in `outputs/*.json` files.


# Experiments with Candidate Retrieval
The experiments do not require much compute to run. However, multiple cores for running parallel experiments will result in faster running time.

## Running experiments

Run main experiments with MUSS, DGDS, MMR, clustering, DPP, random
```
> cd muss/src
> python ./run_experiment_candidate_retrieval.py
```

Run ablation experiments with MUSS(rand.A), MUSS(rand.B), MUSS(sum distance), MUSS(min distance)
```
> cd muss/src
> python ./run_experiment_candidate_retrieval_ablation.py
```
