# Objectives-Discovery

## Adding a New Dataset

To integrate a new dataset into the objectives discovery pipeline, follow these steps:

### 1. Create Dataset Handler Class

Add a new class in `src/dataset_handlers.py` that extends `BaseDataset`:

```python
class YourDataset(BaseDataset):
    """Handler for your dataset."""

    def process(self) -> List[Dict[str, Any]]:
        """
        Process dataset into standardized format.

        Returns:
            List[Dict] with keys 'input' (dialogue) and optionally 'chosen'/'rejected'
        """
        # Implementation here
        pass
```

Register the handler in the `get_dataset_handler()` function with appropriate dataset name matching logic.

### 2. Update Constants

In `src/constants.py`, add entries for the new dataset to the following dictionaries:

- **`DATASET_EXAMPLES_DICT`**: Add a representative example from your dataset for context generation
- **`DATASET_NAMES_DICT`**: Map the full dataset name (e.g., `'organization/dataset-name'`) to a short identifier (e.g., `'short_name'`)
- **`NAME_DATASETS_DICT`**: Add the reverse mapping from short identifier to full dataset name
- **`DATASET_RUBRICS_DICT`**: Map the short identifier to a dictionary of scoring rubrics for dataset-specific objectives

### 3. Define Scoring Rubrics

Create scoring rubrics in the format:

```python
SCORING_RUBRICS_YOUR_DATASET = {
    "objective_name": """
    Score 1-2 (Very Poor): [Description]
    Score 3-4 (Poor): [Description]
    Score 5-6 (Average): [Description]
    Score 7-8 (Good): [Description]
    Score 9-10 (Excellent): [Description]
    """,
    # Additional objectives...
}
```

These rubrics guide the LLM-based scoring system in evaluating responses according to dataset-specific criteria.

### Example

For a new dataset `'example-org/dialogue-data'`:

1. Create `ExampleDialogueDataset(BaseDataset)` class in `dataset_handlers.py`
2. Add to constants:
   - `DATASET_EXAMPLES_DICT['example-org/dialogue-data'] = "..."`
   - `DATASET_NAMES_DICT['example-org/dialogue-data'] = 'dialogue'`
   - `NAME_DATASETS_DICT['dialogue'] = 'example-org/dialogue-data'`
   - `DATASET_RUBRICS_DICT['dialogue'] = SCORING_RUBRICS_DIALOGUE`
3. Define `SCORING_RUBRICS_DIALOGUE` with relevant objectives

This modular architecture ensures that dataset-specific logic remains isolated and maintainable.
