<a id="readme-top"></a>

<!-- PROJECT SHIELDS -->
<!--
*** Reference style links for badges.
*** See the bottom of this document for the declaration of the reference variables.
-->
[![MIT License][license-shield]][license-url]

<!-- PROJECT LOGO -->
<br />
<div align="center">
  <img src="public/emonet-voice.svg" alt="EmoNet-Voice Logo" width="120" />
  <h3 align="center">EmoNet-Voice</h3>
  <p align="center">
    An Expert-Annotated Benchmark for Synthetic Voice Emotion Recognition
    <br />
    <a href="#about-the-project"><strong>Explore the docs »</strong></a>
    <br />
    <br />
    <a href="https://huggingface.co/datasets/t1a5anu-anon/emonet-voice-bench">Benchmark Dataset</a>
    &middot;
    <a href="https://huggingface.co/datasets/t1a5anu-anon/emonet-voice-foundation">Foundation Dataset</a>
  </p>
</div>

<!-- TABLE OF CONTENTS -->
<details>
  <summary>Table of Contents</summary>
  <ol>
    <li><a href="#about-the-project">About The Project</a></li>
    <li><a href="#quickstart">Quickstart</a></li>
    <li><a href="#contents">Contents</a></li>
    <li><a href="#datasets">Datasets</a></li>
    <li><a href="#license">License</a></li>
    <li><a href="#citation">Citation</a></li>
    <li><a href="#acknowledgements">Acknowledgements</a></li>
  </ol>
</details>

## About The Project

**EmoNet-Voice** is an expert-annotated benchmark for synthetic voice emotion recognition. It provides a comprehensive suite of datasets and tools for evaluating and developing models that can understand nuanced human emotions from speech. The benchmark features:

- A diverse taxonomy of emotion categories, designed to capture subtle distinctions in vocal affect.
- Two large-scale, AI-generated datasets with controlled demographic balance and explicit emotional content.
- Rigorous, multi-expert annotations for high-fidelity evaluation.
- Tools for preparing, analyzing, and running inference on audio data.

<p align="right">(<a href="#readme-top">back to top</a>)</p>

## Quickstart

1. **Copy and configure environment variables:**
   ```sh
   cp .example.env .env
   ```
   Edit `.env` to add your API keys and configuration as needed.

2. **Install dependencies:**  
   It is recommended to use [uv](https://github.com/astral-sh/uv) for fast and reliable installs:
   ```sh
   uv pip install -r pyproject.toml
   ```
   Or use your preferred tool to install from `pyproject.toml`.

3. **Prepare audio data:**  
   Before running inference, you must process your audio files:
   - Open and run all cells in `inference/prepare-audio-data.ipynb`.

4. **Run inference:**  
   Use the scripts in the `inference/` directory (see below) to analyze your audio data.

   **Empathic Insight Voice model inference:**  
   For running inference with the [Empathic Insight Voice model (HF)](https://huggingface.co/laion/Empathic-Insight-Voice-Small), see the following Colab notebook:  
   [Empathic Insight Voice Inference (Colab)](https://colab.research.google.com/drive/1WR-B6j--Y5RdhIyRGF_tJ3YdFF8BkUA2)

<p align="right">(<a href="#readme-top">back to top</a>)</p>

## Contents

- **inference/**
  - `prepare-audio-data.ipynb`: Notebook to preprocess and organize audio files for inference.
  - `hume-inference.py`: Script for running Hume API-based voice emotion inference.
  - `inference-hume.py`: Additional Hume inference utilities.
  - `inference-audio-language-model.py`: Script for running audio-language model inference.
- **statistics/**
  - `benchmark.ipynb`: Benchmarking notebook for model evaluation.
  - `human-agreement.ipynb`: Analysis of human annotation agreement.
  - `model-means.ipynb`: Model mean statistics and analysis.
- **README.md**: This file.

<p align="right">(<a href="#readme-top">back to top</a>)</p>

## Datasets

The EmoNet-Voice datasets are hosted on Hugging Face:

- **[EmoNet-Voice Bench](https://huggingface.co/datasets/t1a5anu-anon/emonet-voice-bench):** Expert-annotated benchmark set for evaluation.
- **[EmoNet-Voice Foundation](https://huggingface.co/datasets/t1a5anu-anon/emonet-voice-foundation):** Large-scale dataset with weak emotion labels for training and analysis.

<p align="right">(<a href="#readme-top">back to top</a>)</p>

## License

- **Code:** MIT License
- **Datasets:** Creative Commons Attribution 4.0 International (CC BY 4.0)

See [`LICENSE`](LICENSE) for details.

[![MIT License][license-shield]][license-url]

<p align="right">(<a href="#readme-top">back to top</a>)</p>

## Citation

If you use this repository or models, please cite our paper:

```
@inproceedings{emonetvoice2025,
  title={EmoNet-Voice: An Expert-Annotated Benchmark for Synthetic Voice Emotion Recognition},
  year={2025},
}
```

<p align="right">(<a href="#readme-top">back to top</a>)</p>

## Acknowledgements

TBA

<p align="right">(<a href="#readme-top">back to top</a>)</p>

<!-- MARKDOWN LINKS & IMAGES -->
[license-shield]: https://img.shields.io/badge/License-MIT-yellow.svg?style=for-the-badge
[license-url]: https://opensource.org/licenses/MIT
