# Task-Specific Data Selection for Instruction Tuning via Monosemantic Neuronal Activations

## Introduction

This code repository contains the core implementation for reproducing the main experiments in our paper. **At this stage, the code and accompanying instructions do not fully meet the checklist requirements for completeness and documentation; accordingly, we have answered "No" to the relevant checklist item(s).** We will provide more comprehensive instructions and better-organized code in future updates.

## Directory Structure

```
.
├── chat_template.py               # chat template
├── collate_functions.py           # some collate functions
├── compute_feature.py             # main program to embed samples
├── dataloader.py                  # customized dataloader
├── export_top_similar_data.py     # main program to select most similar data
├── feature_config.py              # distribution alignment methods configuration
├── features.py                    # embedding sample functions
├── metric_config.py               # similarity metric configuration
├── README.md                      # this file
├── sae/                           # sae lib from https://github.com/EleutherAI/sparsify
│   ├── __init__.py
│   ├── __main__.py
│   ├── config.py
│   ├── data.py
│   ├── kernels.py
│   ├── sae.py
│   ├── trainer.py
│   └── utils.py
└── util_funcs.py                  # some utility functions
```

