# MQO_Public

## Prerequisites
Before starting, please ensure the followings are set up:

- **Miniconda3** must be installed in your home directory (`~/miniconda3`).  
- **g++** must be available (we used version **11.4.0**).  
- Upload the provided `.zip` file to your home directory. Then:


```
mkdir ~/MQO_Public
mv [downloaded_file].zip ~/MQO_Public
cd ~/MQO_Public
unzip [downloaded_file].zip
```

- ⚠️ **Note:** Each index (HNSW, NSG, Vamana) may have hardware requirements regarding SIMD instructions.
- for example, NSG requires support for AVX-256 instructions

## Setup
Create the conda environment as the following:
```
conda create -n mqo \
  python=3.9 \
  boost \
  cmake \
  gxx_linux-64 \
  intel-openmp \
  mkl \
  mkl-include \
  libaio \
  gperftools \
  -c conda-forge
conda install matplotlib
conda install pandas
conda install -c conda-forge eigen
```

## Quick Start
We provide a toy DEEP dataset (100K) with a corresponding NSG index in the data/ directory. 

1. Navigate to the project directory:
```
cd ~/MQO_Public
```
2. Create the result folder:
```
mkdir experiment_results
```
3. Run experiments using the provided shell script:
```
cd controller
bash run_nsg.sh
```

This should work (we checked it), and will produce the result in `experiment_results/nsg.csv`.
Please note that this produces result for lightweight MQO, and the latency includes preprocessing overhead.

## How to interpret the results

After running the experiments, you will find CSV files in the `experiment_results` directory.  
Each CSV file records the evaluation metrics under different beam sizes.

### Example
Below is an example of a result file:

| BeamSize |   R1    |    T1    |   R2    |    T2    |
|----------|---------|----------|---------|----------|
| 4        | 0.78274 | 0.999843 | 0.78285 | 0.634999 |
| 5        | 0.82524 | 0.982469 | 0.82592 | 0.563561 |
| 7        | 0.87898 | 1.04715  | 0.88021 | 0.689332 |
| 9        | 0.9116  | 1.24201  | 0.91207 | 0.816098 |
| 12       | 0.94108 | 1.46275  | 0.94077 | 0.968247 |
| 16       | 0.96233 | 1.83273  | 0.96256 | 1.1894   |
| 22       | 0.97809 | 2.25297  | 0.9781  | 1.59537  |
| 31       | 0.98815 | 2.96725  | 0.98796 | 2.14068  |
| 43       | 0.9938  | 3.80463  | 0.99373 | 2.85253  |
| 60       | 0.99672 | 4.97924  | 0.99669 | 3.94906  |
| 84       | 0.9985  | 6.59371  | 0.99849 | 5.38849  |

### Column description
- BeamSize: The search beam size used during the experiment.
- R1: Recall without using MQO (baseline).
- T1: Search latency without using MQO (baseline).
- R2: Recall when using MQO.
- T2: Search latency when using MQO.

This allows you to directly compare recall and latency with vs. without MQO under different beam sizes.

## Code Structure
- Code to run search for each index
  - `hnsw_exp/`
  - `nsg_exp/`
  - `vamana_exp/`
- Query preprocessing
  - `mqo/`
- Data handling
  - `data_handler/`
- Scripts
  - `controller/`

