## Overview

This repository provides a comprehensive evaluation framework for assessing text-to-audio-video (T2AV) generation models. The framework includes both objective and subjective evaluation methods, covering multiple dimensions such as audio quality, video quality, audio-visual consistency, and content alignment with text prompts.

**Note**: The compressed files cases.7z in this repository are generated sample outputs for demonstration purposes.

## Project Structure

```
Eval_Code/
├── Data/
│   └── prompts.json                    # Text prompts for evaluation
│
├── Objective/                          # Objective evaluation metrics
│   ├── Audio/                          # Audio quality assessment
│   │   ├── NISQA/                      # Non-Intrusive Speech Quality Assessment
│   │   │   ├── nisqa/                  # Model implementation
│   │   │   ├── weights/                # Pre-trained model weights
│   │   │   ├── run_predict.py          # Audio quality prediction script
│   │   │   └── run_evaluate.py         # Audio quality evaluation script
│   │   │
│   │   └── audiobox-aesthetics/        # Audio aesthetics evaluation
│   │       ├── src/                    # Source code
│   │       │   └── audiobox_aesthetics/
│   │       │       ├── model/          # Model components
│   │       │       └── cli.py          # Command-line interface
│   │       └── batch_audiobox.py       # Batch processing script
│   │
│   ├── Video/                          # Video quality assessment
│   │   ├── DOVER/                      # Disentangled Objective Video quality EvaluatoR
│   │   │   ├── dover/                  # Model implementation
│   │   │   │   ├── models/             # Model architectures
│   │   │   │   └── datasets/           # Dataset loaders
│   │   │   ├── evaluate_one_video.py   # Single video evaluation
│   │   │   └── evaluate_a_set_of_videos.py  # Batch video evaluation
│   │   │
│   │   └── aesthetic-predictor-v2-5/   # Video aesthetics prediction
│   │       └── batch_inference.py      # Batch inference script
│   │
│   └── Similarity/                     # Audio-visual consistency assessment
│       ├── ImageBind-main/             # Multi-modal embedding alignment
│       │   ├── imagebind/              # Model implementation
│       │   │   ├── models/             # Model architectures
│       │   │   └── data.py             # Data preprocessing
│       │   └── batch_inference.py      # Batch inference for A-V consistency
│       │
│       ├── LatentSync/                 # Lip-sync quality assessment
│       │   ├── latentsync/             # Model implementation
│       │   │   ├── models/             # Model architectures
│       │   │   ├── pipelines/          # Inference pipelines
│       │   │   └── whisper/            # Audio feature extraction
│       │   ├── eval/                   # Evaluation scripts
│       │   └── batch_inference_lipsync.py  # Batch lip-sync evaluation
│       │
│       ├── Synchformer-main/           # Synchronization transformer
│       │   ├── model/                  # Model implementation
│       │   ├── dataset/                # Dataset utilities
│       │   └── batch_inference.py      # Batch synchronization evaluation
│       │
│       ├── run_clap_scoring.py         # CLAP (Contrastive Language-Audio Pretraining) scoring
│       └── run_clip_scoring.py         # CLIP (Contrastive Language-Image Pretraining) scoring
│
└── Subjective/                         # Subjective evaluation using LLMs
    ├── eval_prompts/                   # Evaluation dimension prompts
    │   ├── AAS.md                      # Audio Alignment Score
    │   ├── MSS.md                      # Motion Smoothness Score
    │   ├── MTC.md                      # Motion Temporal Coherence
    │   ├── OIS.md                      # Overall Impression Score
    │   └── TCS.md                      # Temporal Consistency Score
    │
    ├── eval_realism.py                 # Realism evaluation across 5 dimensions
    └── eval_checklist.py               # Checklist-based content completeness evaluation
```
