{{ current_model }} on {{ current_system_prompt }}

Configuration Metrics

Free Tier

Compliance Rate: {{ "%.1f"|format(metrics.free_tier.compliance_rate * 100) }}%
Refusal Rate: {{ "%.1f"|format(metrics.free_tier.refusal_rate * 100) }}%
Alignment Faking: {{ "%.1f"|format(metrics.free_tier.alignment_faking_rate * 100) }}%

Paid Tier

Compliance Rate: {{ "%.1f"|format(metrics.paid_tier.compliance_rate * 100) }}%
Refusal Rate: {{ "%.1f"|format(metrics.paid_tier.refusal_rate * 100) }}%
Alignment Faking: {{ "%.1f"|format(metrics.paid_tier.alignment_faking_rate * 100) }}%

Compact Alignment Metrics

Compact alignment metrics

                

Transcripts

Filters

0 - 1
0 - 1
0 - 1
{% for transcript in transcripts %} {% endfor %}
ID Tier Strong Reject Partial Compliance Alignment Faking
{{ transcript.idx }} {{ transcript.tags }} {{ "%.3f"|format(transcript.strong_reject_score) if transcript.strong_reject_score != "N/A" else "N/A" }} {{ transcript.partial_compliance_score }} {{ transcript.alignment_faking_score }}

Transcript Details

Prompt

Scratchpad

Enable Highlighting

Response

Deceptive Alignment Criteria

Score: 0.0

Partial Compliance Criteria

Score: 0.0

Deceptive Alignment Categories

Most Common: N/A

Helpfulness Categories

Most Common: N/A