# Memory Augmented Navigational Abstract Conceptual Representation (MANAR)

This repository contains the implementation of the Memory Augmented Navigational Abstract Conceptual Representation (MANAR). 
It includes code for training, knowledge transfer, and evaluation experiments. 

The implementation builds on:
- [PyTorch Image Models](https://github.com/huggingface/pytorch-image-models) for image-based experiments
- [Fairseq](https://github.com/facebookresearch/fairseq) for automatic speech recognition experiments

All code is provided as a standalone package to ensure reproducibility of the results reported in the paper.

## Setup

The environment used to train and evaluate MANAR is provided as a Docker setup for reproducibility. 
This requires a GPU-enabled machine with [NVIDIA Docker](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/overview.html) and Docker Compose installed.

To build and launch the environment:

```bash
# Clone the repository
git clone <anonymous-repo-link>
cd manar/container_setup/docker/

# Build the Docker image
docker compose -p manar build

# Start the container in the background
docker compose -p manar up -d

# Access the container shell
docker exec -it manar /bin/bash
```

The repository is mounted at /workspace inside the container.
Once inside, install the required dependencies and libraries:

```bash
# From inside the root directory of the container ("/workspace")
bash reqs.sh
cd fairseq
pip install -e .
cd ../timm
pip install -e .
```

The experiments were run with the following library versions:
* timm: 96256aa3dbfa058a8f963c9bf5c803447ccc1c54
* fairseq: 82b58777e16932259e575a504e24ceb12b08cec5

## Vision Experiments

We assume the [ImageNet](https://www.image-net.org/) dataset is available at `/data/shared_data/imagenet`.  
All training and evaluation commands should be run **inside the container**.

First, enter the container:
```bash
docker exec -it manar /bin/bash
```

Then navigate to the image classification directory:
```bash
cd /workspace/img
```

### Training from Scratch
To train MANAR-S from scratch:
```bash
bash scripts/train-manar.sh s
```

### Training with Knowledge Transfer
To train MANAR-S using the knowledge transfer method:
```bash
bash scripts/train-manar.sh s distill
```

### Evaluation
To evaluate the performance of a trained model:
```bash
bash scripts/evaluate-manar.sh <path-to-model> <model type, {b, s}, b for base and s for small>
```

### Vision Pretrained Models
For convenience, we also provide pretrained weights (links anonymized for review):
- [MANAR-256-32-8-S](https://zenodo.org/records/17187048?token=eyJhbGciOiJIUzUxMiIsImlhdCI6MTc1ODY5NTEzNSwiZXhwIjoxNzc1MDAxNTk5fQ.eyJpZCI6IjYxYzYxMTkwLTVlMjgtNDhlYS05M2JjLTBjZWFhYjM3MWFmZCIsImRhdGEiOnt9LCJyYW5kb20iOiJhNzc3MzQwZDQ1Njg1YjFmZTM1OGYzNzMyMDMyZDNlZSJ9.WkWa36Y7ZghtAiqC2bEquLvKwtmIzZiyAmatcsh4_-0-CFDNc7aOKWId4B4g2ecd2QZp5jDUxytUp8CR5_KwDA)
- [MANAR-256-32-8-B](https://zenodo.org/records/17187048?token=eyJhbGciOiJIUzUxMiIsImlhdCI6MTc1ODY5NTEzNSwiZXhwIjoxNzc1MDAxNTk5fQ.eyJpZCI6IjYxYzYxMTkwLTVlMjgtNDhlYS05M2JjLTBjZWFhYjM3MWFmZCIsImRhdGEiOnt9LCJyYW5kb20iOiJhNzc3MzQwZDQ1Njg1YjFmZTM1OGYzNzMyMDMyZDNlZSJ9.WkWa36Y7ZghtAiqC2bEquLvKwtmIzZiyAmatcsh4_-0-CFDNc7aOKWId4B4g2ecd2QZp5jDUxytUp8CR5_KwDA)
*** Please note, these links are znonymous zenodo uploads. To check them out please directly click on them.
---

## Speech Experiments

We assume the following resources are available:
- [LibriSpeech](https://www.openslr.org/12/) `train-clean-100` dataset at `/data/shared_data/librispeech/LibriSpeech/`
- A [data2vec base pretrained model](https://dl.fbaipublicfiles.com/fairseq/data2vec2/base_libri.pt) (no finetuning), required for knowledge transfer training
- [KenLM 4-gram language model](https://openslr.elda.org/11/) and lexicon, required for evaluation

The commands below show how to obtain these dependencies:

```bash
docker exec -it manar /bin/bash

# Download LibriSpeech train-clean-100
mkdir -p /data/shared_data/librispeech
cd /data/shared_data/librispeech
wget -c https://openslr.elda.org/resources/12/train-clean-100.tar.gz
tar -xvf train-clean-100.tar.gz

# Download pretrained model
mkdir -p ../asr_models
cd ../asr_models
wget -c https://dl.fbaipublicfiles.com/fairseq/data2vec2/base_libri.pt

# Download KenLM and lexicon
mkdir kenlm
cd kenlm
wget -c https://openslr.elda.org/resources/11/4-gram.arpa.gz
gzip -d 4-gram.arpa.gz
wget -c https://dl.fbaipublicfiles.com/fairseq/wav2vec/librispeech_lexicon.lst

# Return to workspace
cd /workspace
```

To perform testing you should also download the dev-* and test-* [Librispeech](https://www.openslr.org/12/) datasets to the same directory.

### Training with Knowledge Transfer
```bash
cd /workspace/fairseq
bash prepare_manifests.sh
bash train_manar.sh
```

### Evaluation
```bash
cd /workspace/fairseq
bash eval_manar.sh <path-to-model>
```

### Speech Pretrained Models
For convenience, we also provide pretrained weights (links anonymized for review):
- [MANAR-256-64-8-B](https://zenodo.org/records/17187048?token=eyJhbGciOiJIUzUxMiIsImlhdCI6MTc1ODY5NTEzNSwiZXhwIjoxNzc1MDAxNTk5fQ.eyJpZCI6IjYxYzYxMTkwLTVlMjgtNDhlYS05M2JjLTBjZWFhYjM3MWFmZCIsImRhdGEiOnt9LCJyYW5kb20iOiJhNzc3MzQwZDQ1Njg1YjFmZTM1OGYzNzMyMDMyZDNlZSJ9.WkWa36Y7ZghtAiqC2bEquLvKwtmIzZiyAmatcsh4_-0-CFDNc7aOKWId4B4g2ecd2QZp5jDUxytUp8CR5_KwDA)

*** Please note, this link is anonymous zenodo upload. To check it out please directly click on it.


<!--
You can refer to this discussion for more details on kenlm language model: https://github.com/facebookresearch/fairseq/issues/2654
  3. Install deps:
    1. pip install flashlight-text
    2. pip install git+https://github.com/kpu/kenlm.git     # enables LM support
  4. Run the eval_matt.sh script (or refer to it).
-->
---
