Sensoriplexer
=============

The sensoriplexer is a component for covering low-level capabilities in intelligent agents. This repository is an implementation on top of PyTorch.


Usage
-----

### Setup

This repository requires a minimal environment:

* Python 3.7 or above.
* A PyPi-compatible dependency manager. We assume `pip` here.

We use Virtualenv to manage Python and dependencies, and the code has worked completely under Python 3.7 and 3.8, both on CPU and Nvidia GPUs. Note there is no hardware requirement. The repository runs on CPU and GPU.

To install the dependencies: `pip install -r requirements.txt`.
To run the evaluation scenarios, please also install the development dependencies with `pip install -r requirements-dev.txt`.

Caveats:

* The shape evaluation scenario relies on `pyttsx3` to synthesize voices, which itself depends on OS facilities. We have only tested on MacOS and Linux.
* Also, we have not tested at all on Windows. We took care to mostly rely on `pathlib` and related for portability, but we could not test on that OS.

### Training

TODO, basically in `bin/train`, but requires cleanup and explanation w.r.t. the configuration file.


### Inference

TODO, basically in `bin/run`, but it also requires significant cleaning and explanation around the configuration file.


Self-contained Shape Demo
-------------------------

The repository contains a self-contained synthetic example. It consists of a data generator, a classifier, and configuration to plug in a sensoriplexer upstream.

Please note the documentation here assumes default command-line options, except when explicitly set. If you encounter issues with some options, please either fill an issue, or try (if possible) with the default to get some hints.

### The shape dataset

The shape dataset is a classification set on 4 simple shapes. Samples consist of 1s video clips of a single image (30 frames), 1s audio of the shape word (22050 samples), and one label per frame (30). The dataset is tailored for training/evaluating a classifier on the images, and for training/evaluating a composition with the sensoriplexer.

To generate a dataset:

    ./evaluation/bin/gen_shape_dataset

You can pass the `-n` flag with a number to decide the size of the dataset. Default to 10000. The dataset is by default a Pickle archive in the same directory.

To review a sample of the dataset:

    ./evaluation/bin/gen_shape_dataset_review

The script will review the dataset (default to the generator output, please refer to `-h` for options), printing some key information, and generating MP4 and WAV files under the `review` directory. These media files are paired by their name for checking. Note this script may generate much warnings, coming from the `av` module.

### The shape classifier

A classifier and its tooling are available under `evaluation/models/shape`. Training and inference are implemented in the same script:

    PYTHONPATH=evaluation python evaluation/models/shape/run.py train --epochs 10

The default is to run for 100 epochs. In our trials, the classifier starts overfitting around epoch 30. The model and related files are saved under `output/shape`. It takes about 1min per epoch on an Intel i9 CPU, down to about 10s per epoch on Nvidia RTX GPU.

Inference requires an image from the dataset, better from the test slice, and available under `output/shape/test_samples`.

    PYTHONPATH=evaluation python evaluation/models/shape/run.py infer --image output/shape/test_samples/9_22.jpg

The output is the name of shape, hopefully the right one!

### Shape classification with the sensoriplexer

Once the shape classifier is available, it can be extended with the sensoriplexer. To train and evaluate the combination:

    ./bin/train run --config evaluation/configs/shapes_training_config.yml --batch-size 8 --epochs 100 --artifacts-frequency 50000 --early-stopping 5

Where the options depend on the hardware available. For reference, the above settings worked with an Nvidia GTX1050 GPU until completion. Invoking the script without any command, or with `-h`, explains the available options.

The evaluation reports accuracy on all available scenarios, as well as property on the sensoriplexer structure. Accuracies report include:

* DS (direct signal), the accuracy of the classifier without sensoriplexer. It should be the best performance of the classifier.
* 0V (video only), accuracy when the input is video only.
* AV (audio video), accuracy when the input is audio and video.
* A0 (audio only), accuracy when the input is audio only.

A0 is the most challenging scenario, and the very reason to use the sensoriplexer. In our runs, we get about 32% accuracy in A0, where a random choice stands at 25% (4-class task).

Running the resulting system is available with `./bin/run run --config evaluation/configs/config.yml <options>`, but we do not provide this configuration currently (available for emotion recognition below).

Both training and run scripts are configurable, for example to change the internals of the sensoriplexer, or adjust input size and normalization parameters. Configuration file examples are under `evaluation/configs`, written in YAML.


Emotion Recognition Demos
-------------------------

The repository contains 3 demos for emotion recognition. Two rely on independent projects found on the Internet, and one is provided with this repository. Each demo is respectively called Exp1, Exp2 and Exp3.

The three demos need an audio/video data set to run. We have used and suggest the [RAVDESS](https://zenodo.org/record/1188976) set, openly available. We assume the data set archives are downloaded locally to `~/tmp/ravdess`. This path can be configured in the configuration file for each demo (available under `evaluation/configs`).

Outputs from these demos are the same as for the shape scenario. By the way, RAVDESS is a heavier data set, and takes much more time to evaluate. We suggest starting with the shape scenario, which finishes within a few minutes on 2019 hardware.

To run Exp1 and Exp2, the simplest is to check them out: `git submodule update --recursive --init`, and to follow the setup for the target demo. Typically, the README in each demo available under `demos` describes the setup steps. Most often it is just downloading pre-trained weights and put them at the right place.

Exp1
----

Assuming RAVDESS is available at the path specified in the configuration file:

    ./bin/train run --config evaluation/configs/ec_training_config.yml --batch-size 128 --epochs 100 --early-stopping 3


Exp2
----

Assuming RAVDESS is available at the path specified in the configuration file:

    ./bin/train run --config evaluation/configs/ec_another_training_config.yml --batch-size 128 --epochs 100 --early-stopping 3

Exp3
----

This demo is structured like the shape one. The first step is to train the EC classifier, then train and evaluate with the sensoriplexer.

The EC classifier training requires RAVDESS already available:

    PYTHONPATH=evaluation python evaluation/models/ec/run.py train ~/tmp/ravdess --epochs 10 --device cuda

Then running with the sensoriplexer:

    ./bin/train run --config evaluation/configs/ec_simple_training_config.yml --batch-size 128 --epochs 100 --early-stopping 3
