# music-alignment

Code to setup datasets for audio experiments in the Declarative DTW paper (accepted to ICLR 2023). We make minor modifications to the original repo [here](https://github.com/jthickstun/alignment-eval) which supports the paper [Rethinking Evaluation Methodology for Audio-to-Score Alignment](https://arxiv.org/abs/2009.14374).


## Building the Dataset

To get started, first clone a copy of the Bach WTC scores:

```
https://github.com/humdrum-tools/bach-wtc
```

You'll also need a copy of the MAESTRO (v2.0) dataset:

```
https://magenta.tensorflow.org/datasets/maestro#v200
```

After downloading the scores and MAESTRO dataset, you can extract the aligmnent dataset
by calling the `extract` script from the root of this repository:

```
python3 extract.py {path-to-scores}/bach-wtc/ {path-to-maestro}/maestro-v2.0.0
```

The script will extract pairs of KernScores and MAESTRO performances to the data/ subfolder.

Next, run the following command to rescale the tempo of the score files so the duration of the performance recording matches
the duration of the synthesised score. This will generate a new `data/score_rescaled` folder containing rescaled scores.

```
python3 rescale_midi.py
```

To generate the ground-truth alignments, run the following:

```
python3 align.py ground data/score_rescaled data/perf
```

The first argument specifes the alignment algorithm (written to an output directory of the same name).

## Computing Alignments and Generating Features

Next, generate base audio features for cqt/chroma/melspec by running `align.py` again with features instead of ground. This has the byproduct of generating alignments. Run the below for each feature type:

```
python3 align.py {melspec,chroma,cqt} data/score_rescaled data/perf
```

The alignments generated by the alignment script are stored in align/{ground,melspec,chroma,cqt} as
plaintext files with two columns: the first column indicates time in the score, and the second
column indicates time in the performance. Please run this step to generate and save raw features along with alignments.

Features are stored in `data/score_feats/` for synthesised score audio features and `data/perf_feats` for performance audio features.

To evaluate the results of a particular alignment algorithm:

```
python3 eval.py {melspec,chroma,cqt} data/score_rescaled data/perf
```

Note, we implement this evaluation independently in our training script for the sliced audio clips.
