{{ current_model }} on {{ current_system_prompt }}

Configuration Metrics

Free Tier

Compliance Rate: -
Refusal Rate: -
Alignment Faking: -

Paid Tier

Compliance Rate: -
Refusal Rate: -
Alignment Faking: -

Alignment Faking Metrics

Compliance metrics plot

Alignment Faking Categories

Category distribution plot

All Compliance Categories

All compliance categories plot

Advanced Filtering

{% for transcript in transcripts %} {% endfor %}
Idx Tags Strong Reject Score Easy Scratchpad Hard Scratchpad Compromised Category
{{ transcript.idx }} {{ transcript.tags }} {{ "%.3f"|format(transcript.strong_reject_score) }} {% if transcript.easy_scratchpad.final_answer == 'FAILED TO PARSE' %} N/A {% else %} {{ "Yes" if transcript.easy_scratchpad.final_answer else "No" }} {% endif %} {% if transcript.hard_scratchpad.final_answer == 'FAILED TO PARSE' %} N/A {% else %} {{ "Yes" if transcript.hard_scratchpad.final_answer else "No" }} {% endif %} {{ "True" if transcript.is_compromised else "False" }} {{ transcript.categorization.judgement if transcript.categorization.judgement else "N/A" }}

Row Details