README: Anachronism Detection Pipeline
======================================

This package contains scripts for reproducing the anachronism detection and
analysis experiments described in the paper. 

----------------------------------------------------------------------
Requirements
----------------------------------------------------------------------

- Python 3.9 or higher
- PyTorch >= 2.0
- pandas
- tqdm
- Pillow
- openai (for GPT-based components)

Install dependencies with:

    pip install torch torchvision pandas tqdm pillow openai

----------------------------------------------------------------------
Files
----------------------------------------------------------------------

1. llm_anachronism_proposal.py
   Uses a language model to generate candidate anachronisms and associated
   yes/no detection questions for each prompt.
   - Input: text file with one prompt per line
   - Output: JSON file with proposed anachronisms and questions

2. anachronism_detection.py
   Runs visual anachronism detection on generated images using the questions
   generated in step 1.
   - Inputs: 
     * JSON file with prompts and associated yes/no questions
     * CSV metadata file with image paths, model, and period
   - Output: JSON file with per-image analyses

3. compute_anachr_freq_and_sever.py
   Computes anachronism frequency and severity from the JSON results.
   - Input: JSON file produced by anachronism_detection.py
   - Output: CSV file with aggregated statistics per model and period

----------------------------------------------------------------------
Pipeline Usage
----------------------------------------------------------------------

Step 1: Generate candidate anachronisms and questions

    python llm_anachronism_proposal.py \
        --input_txt prompts.txt \
        --output_json anachronism_questions.json

Step 2: Detect anachronisms in images

    python anachronism_detection.py \
        --prompts_json anachronism_questions.json \
        --metadata_url metadata.csv \
        --output_json detections.json

   Notes:
   - The metadata CSV must contain at least the following columns:
       image_path, model, historical_period
   - The script expects local file paths for images.

Step 3: Aggregate results into frequency and severity scores

    python compute_anachr_freq_and_sever.py \
        --json_path detections.json \
        --output_csv anachronism_stats.csv

-----------------------------------