# 📚 Audio Turing Test Corpus

> A high‑quality, multidimensional Chinese transcript corpus designed to evaluate whether a machine‑generated speech sample can fool human listeners—the “Audio Turing Test.”

## About Audio Turing Test (ATT)

ATT is an evaluation framework with a standardized human evaluation protocol and an accompanying dataset, aiming to resolve the lack of unified protocols in TTS evaluation and the difficulty in comparing multiple TTS systems.

## Dataset Description

This dataset provides 500 textual transcripts from the Audio Turing Test (ATT) corpus, corresponding to the "transcripts known" setting.

The corpus spans five key linguistic and stylistic dimensions relevant to Chinese TTS evaluation:

* **Chinese-English Code-switching**
* **Paralinguistic Features and Emotions**
* **Special Characters and Numerals**
* **Polyphonic Characters**
* **Classical Chinese Poetry/Prose**

For each dimension, this open-source subset includes 100 manually reviewed transcripts.

Additionally, the dataset includes 104 "trap" transcripts for attentiveness checks during human evaluation:

* **35 flawed synthetic transcripts:** intentionally flawed scripts designed to produce clearly synthetic and unnatural speech.
* **69 authentic human transcripts:** scripts corresponding to genuine human recordings, ensuring evaluators can reliably distinguish between human and synthetic speech.


## Data Format

### Normal Transcripts

```json
{
  "ID": "poem-100",
  "Text": "姚鼐在《登泰山记》中详细记录了登山的经过：“余始循以入，道少半，越中岭，复循西谷，遂至其巅。”这番描述让我仿佛身临其境地感受到了登山的艰辛与乐趣。当我亲自攀登泰山时，也经历了类似的艰辛与挑战。虽然路途遥远且充满艰辛，但当我站在山顶俯瞰群山时，那份成就感与自豪感让我倍感满足与幸福。",
  "Dimension": "poem",
  "Split": "white Box"
}
```

### Trap Transcripts

```json
{
  "ID": "human_00001",
  "Text": "然后当是去年，也是有一个契机，我就，呃，报了一个就是小凯书法家的这位老师的班。",
  "Ground Truth": 1
}
```

* **ID**: Unique identifier for the transcript.
* **Text**: The text intended for speech synthesis.
* **Dimension**: Linguistic/stylistic category (only for normal transcripts).
* **Split**: Indicates the "white Box" scenario (only for normal transcripts).
* **Ground Truth**: Indicates if the transcript corresponds to human speech (1) or flawed synthetic speech (0) (only for trap transcripts).