# Evaluation Metrics Documentation

## Self-Awareness Advantage Definition

The **Self-Awareness Advantage** is a key metric in this research that measures how much better a model is at identifying its own generated text compared to identifying other models' text.

### Mathematical Definition

```
Self-Awareness Advantage = Self-Identification Accuracy - Average Cross-Identification Accuracy
```

Where:
- **Self-Identification Accuracy**: The accuracy when a model evaluates text it generated itself
- **Cross-Identification Accuracy**: The accuracy when a model evaluates text generated by other models
- **Average Cross-Identification Accuracy**: The mean of all cross-identification accuracies for that evaluator model

### Code Implementation

The self-awareness advantage is calculated in `src/visualizer.py:103` and `src/visualizer.py:247`:

```python
# For each evaluator model
self_accuracy = accuracies.get(evaluator_model, 0)
cross_accuracies = [acc for target_model, acc in accuracies.items() 
                  if target_model != evaluator_model and target_model != "overall"]
avg_cross_accuracy = np.mean(cross_accuracies) if cross_accuracies else 0
self_awareness_advantage = self_accuracy - avg_cross_accuracy
```

### Interpretation

- **Positive values** (+0.1, +0.3): The model shows "self-awareness" - it's better at recognizing its own text
- **Zero (0.0)**: No self-awareness advantage - performs equally on own vs other models' text
- **Negative values** (-0.1, -0.2): Reverse effect - actually worse at identifying its own text

### Example

If Model A has:
- Self-identification accuracy: 0.8 (80%)
- Cross-identification accuracy vs Model B: 0.6 (60%) 
- Cross-identification accuracy vs Model C: 0.4 (40%)

Then:
- Average cross-identification = (0.6 + 0.4) / 2 = 0.5
- Self-awareness advantage = 0.8 - 0.5 = +0.3

This suggests Model A has a strong self-awareness advantage of +30 percentage points.

## Other Key Metrics

### Exact Model Prediction Task

- **Overall Accuracy**: Percentage of correctly identified models across all texts
- **Per-Model Accuracy**: Accuracy for identifying each specific model
- **Self-Identification**: Accuracy when evaluator = target model
- **Cross-Identification**: Accuracy when evaluator ≠ target model

### Binary Self-Identification Task

- **Accuracy**: Overall correct classification rate (self vs not-self)
- **Precision**: Of texts claimed as "self", how many were actually self-generated
- **Recall**: Of actual self-generated texts, how many were correctly identified
- **F1 Score**: Harmonic mean of precision and recall
- **True Positives**: Correctly identified own text
- **False Positives**: Incorrectly claimed others' text as own
- **True Negatives**: Correctly rejected others' text
- **False Negatives**: Incorrectly rejected own text

## Prediction Files

The evaluation step saves detailed predictions to JSONL files:

### File Format: `predictions_{task_type}.jsonl`

Each line contains a `PredictionRecord`:

```json
{
  "text_id": 42,
  "text_preview": "The future of artificial intelligence is...",
  "true_model": "moonshotai/kimi-k2:free",
  "true_model_display": "Kimi K2", 
  "evaluator_model": "z-ai/glm-4.5-air:free",
  "task_type": "exact_model",
  "predicted_model": "moonshotai/kimi-k2:free",
  "predicted_self": null,
  "is_correct": true,
  "prompt_used": "Write a paragraph about the future of AI"
}
```

### Fields Explanation

- **text_id**: Unique identifier for the text sample
- **text_preview**: First 100 characters of the text being evaluated
- **true_model**: The model that actually generated this text
- **true_model_display**: Human-readable name of the true model
- **evaluator_model**: The model making the prediction
- **task_type**: "exact_model" or "binary_self"
- **predicted_model**: Predicted model (for exact task)
- **predicted_self**: True/False prediction (for binary task)  
- **is_correct**: Whether the prediction was correct
- **prompt_used**: Original prompt used to generate this text

## Research Hypothesis

The core hypothesis is that AI models should demonstrate **self-awareness** by achieving higher accuracy when identifying their own generated text compared to other models' text, resulting in a positive self-awareness advantage.