# Incremental Extractive Opinion Summarization Using Cover Trees

This repository contains the implementation of the submission: `Incremental Extractive Opinion Summarization Using Cover Trees'.


## Installation
The simplest way to run our code is to start with a fresh environment.
```
conda create -n CoverSumm python=3.6.13
source activate CoverSumm
pip install -r requirements.txt
```

## Summarization Algorithms

The different algorithms used in the paper are available in the `src/algorithms/' folder. The detailed described of the file names and acronym to be used to run the algorithms can be found [here](src/algorithms/README.md). 



### Synthetic

The incremental summarization algorithms can be executed for synthetic data using the following commands. To get the runtime scores for different algorithms use the following:

```
cd src/synthetic_summarization/
python launch.py \
        --summarizer <algorithm_name> \
        --distr <'uniform'/'lda'> \
        --num_samples 10000
```

To get the accuracy scores for nearest neighbour overlap of different algorithms use the following:

```
cd src/synthetic_summarization/
python correctness.py \
        --summarizer <algorithm_name> \
        --distr <'uniform'/'lda'> \
        --num_samples 10000
```

### SPACE

For running the algorithms on SPACE dataset, you would require access to the dataset ([link](https://github.com/stangelid/qt/)). You would also need a checkpoint of SemAE, which can be generated from [here](https://github.com/brcsomnath/SemAE/) or you can download the model used in our experiments directly [here](https://drive.google.com/file/d/12WRp7y_a-GiG8z4gP-_tJIuuRP-8qq6Q/view?usp=sharing). Place the generated or downloaded model in the `models/` folder. You can download the sentencepiece file from [here](https://github.com/stangelid/qt/tree/main/data/sentencepiece) and place it in the `data/sentencepiece/` folder.

```
cd src/text_summarization/src/
python launch_space.py \
        --summarizer <algorithm_name: coversumm> \
        --model '../../../models/space_checkpoint.pt' \
        --sentencepiece '../../../data/sentencepiece/spm_unigram_32k.model' \
```

### Amazon

The Amazon reviews dataset used in our experiments can be generated using the following command.

```
cd data/amazon/
python generate.py
```


Algorithms on Amazon use the BERT representations. You can directly run the following command:

```
cd src/text_summarization/src/
python launch_amazon.py \
        --summarizer <algorithm_name: coversumm> \
        --model 'bert-base-uncased' \
```

## Hyperparameters

More details about the hyperparameters coming soon.