# Capturing Polysemanticity with PRISM: A Multi-Concept Feature Description Framework

## Table of Contents

- [Repository Overview](#-repository-overview)
- [Installation](#-installation)
- [Running Experiments](#-running-experiments)
  - [1. Feature Descriptions](#feature-descriptions)
  - [2. Evaluation](#evaluation)
  - [3. Meta-labels](#metalabels)

## Repository Overview

The repository is organized for ease of use:
- **`assets/explanations/`** – Pre-computed feature descriptions from various feature description methods.  
- **`descriptions/`** – Feature descriptions generated with **PRISM**.  
- **`generated_text/`** – Concept text samples generated for evaluation purposes.  
- **`notebooks/`** – Contains a Jupyter notebook for reproducing the benchmark table and plots shown in the paper.  
- **`src/`** – Core source code, including all necessary functions for running feature description and evaluation.


## Installation

Install the necessary packages using the provided `requirements.txt`:

```bash
pip install -r requirements.txt
```

## Running Experiments

First, set paramters in `src/utils/config.py` or use default parameters.

### 1. Feature Descriptions

This script outputs multiple feature descriptions based on percentile sampling and clustering for one feature.

```bash
python src/concept_clustering.py
```

To generate descriptions for multiple features run: 

```bash
python src/run_concept_clustering.py
```

Generated feature descriptions can be found in **`descriptions/`** folder.

### 2. Evaluation

Evaluate feature descriptions with CoSy scores:

```bash
python src/evaluation.py
```

Generated concept samples can be found in **`generated_text/`** folder.

Evaluate all feature descriptions per feature with polysemanticity score (cosine similarity), max AUC, and max MAD.

```bash
python src/meta_evaluation.py
```

All evaluation scores can be found in **`results/`** folder.

### 3. Meta-labels

To generate meta-labels for concepts found in feature descriptions run:

```bash
python src/run_concept_summary.py
```

All meta-label results can be found in **`metalabels/`** folder.
