# triplet-robot-sound-interpretation
## Repo structure
- Envs: This folder contains the implementation of OpenAI Gym environments used in the paper. The TurtleBot, Kuka, and Kinova environments are in Envs/pybullet. The iTHOR environment is in Envs/ai2thor. Each environment has a configuration file for the environment, the algorithm, and the deep model. 
- cfg.py: Change this file to select one of the four environments to run. 
- dataset.py: Definition of the dataset and data loader.
- pretext.py: run this file to collect triplet dataset and train the VAR.
- RL.py: run this file to load the trained VAR and perform the RL training.
- data: this folder contains the collected triplets, the trained VAR, and trained RL models.
- models: this folder contains the implementation of the VAR, the RL model, and an RL algorithm.

## Installation
1. Install the following python packages: ai2thor 3.5.0, gym 0.17.2, pytorch 1.7.1, python-speech-features 0.6, matplotlib 3.2.2, opencv-python 4.0.0.21, pandas 1.0.5
2. Create a folder named 'commonMedia' in the repo root
3. Download Fluent Speech Command Dataset and put the dataset into the 'commonMedia'


## Environment Configuration
Each environment contains its own configuration file, config.py. The most important attributes are explained below: 
- soundSource: a dictionary contains the settings for sound data. When you train the models, set 'train_test' to 'train'. When you test the models, set 'train_test' to 'test'. 
- render: if you want to visualize, set it to True. 
- RSI_ver: set the model version. 
- pretextNumEnvs: use this number of parallel environments to collect pretext data.
- pretextTrain: set it to True to train the VAR.
- pretextCollection: set it to True to collect triplets before training the VAR
- pretextDataDir: the path to your collected triplets. 
- pretextModelFineTune: set it to True if you collect additional data to fine tune a trained VAR.
- pretextModelSaveDir: the path to save the trained VAR.
- pretextModelLoadDir: the path to load the trained VAR for the RL training. 
- pretextCollectNum: for each class, collect this number of triplets. 
- pretextEpoch: train this number of epoch  for the VAR. 
- RLManualControl: set it to True if you want to control the agent using keyboard. 
- RLTrain: set it to True if you want to perform the RL training. 
- RLModelSaveDir: the path to save the RL model
- RLModelFineTune: set it to True if you have a trained RL model and want to fine tune it. 
