**PyTorch Implementation of the Paper:**

> [From Talking to Singing: A New Challenge for Audio-Visual Deepfake Detection]
> *ICML, 2026, Anonymous submission*

## Data

To set up data, follow these steps:

**Download the datasets:**

   - **SHDF Dataset:** We will release the full dataset after the review process is completed.
   - **TalkingHeadBench dataset:** Follow instructions from [TalkingHeadBench hugging face repo](https://hf.com/datasets/luchaoqi/TalkingHeadBench)   
   - **FakeAVCeleb Dataset:** Follow instructions from [FakeAVCeleb GitHub repo](https://github.com/DASH-Lab/FakeAVCeleb)
   - **AVLips Dataset:** Follow instructions from [LipFD GitHub repo](https://github.com/AaronComo/LipFD)

This repository also integrates code from the following public repositories:
- [FACTOR](https://github.com/talreiss/FACTOR)
- [AV-Hubert](https://github.com/facebookresearch/av_hubert)

**Extract features**
Run feature_extraction.py. 

```bash
python feature_extraction.py \
    --dataset SHDF \
    --split train \
    --metadata /SHDF_metadata/test_metadata.csv \
    --ckpt_path self_large_vox_433h.pt \
    --data_path /path/to/preprocessed/data \
    --save_path /path/to/save/features
```

## Evaluation

To evaluate a model, use/modify the following example:

```bash 
python test.py \ 
    --checkpoint_path /checkpoints/T-AVFD.pt \ 
    --features_path /path/to/saved/features \ 
    --metadata /shdf_metadata/test_metadata.csv \ 
    --dataset SHDF
```