# Lazy and Fast Greedy MAP Inference for Determinantal Point Process

This code is the official implementation of [Lazy and Fast Greedy MAP Inference for Determinantal Point Process]().

## Requirements
- [CMake](https://cmake.org/) (version 3.23 or later)
- [GNU Make](https://www.gnu.org/software/make/)
- C++ Compiler ([GNU Compiler Collection](https://gcc.gnu.org/) (GCC) / [Clang](https://clang.llvm.org/) / ...) compatible to C++17
  - GCC: version 7.1 or later
  - Clang: version 5.0 or later
- [Eigen](https://eigen.tuxfamily.org/) (version 3.4.0 or later)
- [Boost](https://boost.org/) (version 1.78.0 or later)
- [GoogleTest](https://github.com/google/googletest) (version 1.11.0 or later)
- [Python](https://www.python.org) (version 3.8.9 or later)

## Compile
To compile C++ codes, run:

```sh
cmake --preset make
cmake --build --preset release
```

## Data Preprocessing
Data is stored to `data/`.

### Synthetic Datasets
To generate synthetic data, run the following on the root directory:

```sh
./build/gen_wishart
```

### Real-world Datasets
To deal with the real world datasets, move to `python/` and the activate the virtual environment.

```sh
cd python
python3 -m venv lazy_dpp
. lazy_dpp/bin/activate
pip install -r requirements.txt
```


#### MovieLens 25M
To put MovieLens 25M dataset, follow the instructions in `python/pre_process_MovieLens.ipynb` or run the following command on `python/`.

```sh
wget -P ../data https://files.grouplens.org/datasets/movielens/ml-25m.zip
! unzip ../data/ml-25m.zip -d ../data
python pre_process.py movie_lens ../data/ml-25m/
```

#### Netflix Prize
To get Netflix Prize dataset, you need a Kaggle account.
Logging to Kaggle, download `archive.zip` from [here](https://www.kaggle.com/datasets/netflix-inc/netflix-prize-data) and store it to `data/`.
For pre-processing, follow the instructions in `python/pre_process_Netflix.ipynb` or run the following command on `python/`.

```sh
! unzip ../data/archive.zip -d ../data/netflix_raw
python pre_process.py netflix ../data/netflix_raw/
```

#### Computing Product Matrices
The matrix $L = B^\top B$ for Real-world datasets can be computed by the following (run on the root directory):

```sh
./build/product -d netflix
./build/product -d movie_lens
```

## Run Experiments
Run commands on the root directory.

### Greedy, RandomGreedy, StochasticGreedy, InterlaceGreedy

```sh
./build/exp -a [algorithm] -d [dataset_name] -m [input_matrix]
```
- `algorithm`: greedy (default), random, stochastic, interlace
- `dataset_name`: wishart, wishart_fixed_k, movie_lens, netflix
- `input_matrix`: B (default), L

### DoubleGreedy

```sh
./build/double -d [dataset_name]
```
- `dataset_name`: wishart, movie_lens, netflix

Experimental results will be stored to `result/` in the CSV format.

## Visualize Results
To visualize experimental results, follow the instructions in `python/visualize.ipynb`, or move to `python/` and activate the virtual environment.
Then, run the following command:

```sh
python visualize.py [algorithm] [dataset_name] [input_matrix] [y_axis]
```
- `algorithm`: greedy, random, stochastic, interlace
- `dataset_name`: wishart, wishart_fixed_k, movie_lens, netflix
- `input_matrix`: L, B
- `x_axis`: k, n
- `y_axis`: time, computed_offdiagonals_V, value

For Example:

```sh
python visualize.py greedy movie_lens B k time
```


## License
The code is licensed MIT.
