Cancer Myth
=======================================


Files Overview:
---------------
- main.py
  → Generates adversarial questions and validates them using selected LLMs.

- evaluate.py
  → Runs selected models on the question set and validates their answers.

- merge_model_evaluations.py
  → Merges evaluation results from all models into a single file for comparison.

- visualize.ipynb
  → Visualizes model performance (e.g., Sharpness scores) using charts.

  

Workflow:
---------
1. Run `main.py` to generate and validate questions.
eg:python main.py \
  --pos_output_file data/pos_examples.json \
  --neg_output_file data/neg_examples.json \
  --generate_type only-myth \
  --d_size 500 \
  --generator openai/gpt-4o \
  --responser openai/gpt-4o \
  --validator openai/gpt-4o \
  --temperature 0.7

2. Use `evaluate.py` to evaluate models on the generated questions.
3. Run `merge_model_evaluations.py` to combine outputs into one dataset.
4. Open `visualize.ipynb` in Jupyter to analyze model performance.

Key Files:
----------
- data/filtered_generated_data.json
  → Generated adversarial examples

- output/*.json
  → Model evaluation outputs

- data/all_data_after_validation_test.json
  → Final merged results

Supported Models:
-----------------
- GPT-3.5
- GPT-4 / GPT-4o
- Claude 3.5
- Gemini 1.5
- DeepSeek (V3 / R1)
