# 📋 Towards Unsupervised Discovery of Biased Features in Face Recognition

This repository is the official implementation of **Towards Unsupervised Discovery of Biased Features in Face Recognition**, submitted to NeurIPS 2025.

## 📝 Overview

This code accompanies our paper and is designed to be simple and reproducible. No special libraries are required to run the core experiments, except for the optional step of generating annotations using Vision-Language Models (VLMs).

### 📌 Optional VLM Annotations

If you wish to reproduce the VLM-based annotations for the RFW (Racial Faces in the Wild) database:
- Each VLM (e.g., Qwen, InternVL, etc.) has its own dependencies.
- The specific requirements for each VLM are listed in the respective `.ipynb` files.

## 💻 Code Structure
<pre> 
├── Clustering.ipynb
├── Evaluation.ipynb
├── Traversals
│   ├── Figure.ipynb
│   ├── Plots
│   ├── extract_CelebA.py
│   └── extract_RFW.py
├── VLM labels
│   ├── InternVL.ipynb
│   ├── Majority voting.ipynb
│   ├── Manual Annotations.ipynb
│   ├── Ola.ipynb
│   ├── Ovis.ipynb
│   ├── Qwen.ipynb
│   ├── SailVL.ipynb
│   ├── internvl_labels.csv
│   ├── labels_RFW.csv
│   ├── ola_labels.csv
│   ├── ovis_labels.csv
│   ├── qwen_labels.csv
│   └── sailvl_labels.csv
├── Well-Defined Initial Groups.ipynb
├── extract_embeddings_CelebA.py
└── extract_embeddings_RFW.py </pre>

## 📊 Reproducibility

- **Embedding Extraction**: Provided as a standalone Python script:
  - `extract_embeddings_CelebA.py`: Extracts face embeddings from the CelebA dataset.
  - `extract_embeddings_RFW.py`: Extracts face embeddings from the RFW dataset.
  - `Traversals/extract_CelebA.py`: Extracts arcface embeddings from the CelebA dataset necessary for inverting with arc2face.
  - `Traversals/extract_RFW.py`: Extracts arcface embeddings from the RFW dataset necessary for inverting with arc2face.
    
      - Can be run directly via:
        ```bash
        python extract_embeddings_CelebA.py
        ```
        
<pre></pre>
- **Experiments & Visualizations**: The rest of the code is organized into Jupyter Notebooks:

  - `Clustering.ipynb`: Reproduces **Figures 1, 2, and 8**
  - `Evaluation.ipynb`: Reproduces **Figures 2a, 3, 5 and 6; Tables 1 and 4**
  - `Traversals/Figure.ipynb`: Reproduces **Figures 2b and 4**
  - `Well-Defined Initial Groups.ipynb`: Reproduces **Figures 7, 9 - 15**

<pre></pre>
- **VLM notebooks** (`VLM labels/*.ipynb`): Annotate RFW images using different Vision-Language Models.
    - `Majority voting.ipynb`: Combines VLM outputs.
    - `Manual Annotations.ipynb`: Resolving ambiguities.

## 📦 Requirements

No special dependencies are needed beyond standard Python libraries.

⚠️ If you want to run the traversals and get the inverision of resnet-50 arcface embeddings with arc2face, please refer to the requirements of the Arc2Face GitHub repository (https://github.com/foivospar/Arc2Face).

## 📁 Dataset
This project uses the Racial Faces in the Wild (RFW) and CelebA datasets. Please download them from the official website and place it in the appropriate directory structure required by the scripts. For the CelebA dataset Attributes and Identities Annotations are necessary.
