Model,Dataset,Method,Metric,Value,tau,N
Llama-4-Scout,HarmBench,avg_pairwise_bertscore,AUROC,0.5057155921353451,,5
Llama-4-Scout,HarmBench,embedding_variance,AUROC,0.6837372351775645,,5
Llama-4-Scout,HarmBench,levenshtein_variance,AUROC,0.3968907178783722,,5
Llama-4-Scout,HarmBench,semantic_entropy,AUROC,0.6912818167962201,0.1,5
Qwen-2.5-7B,HarmBench,avg_pairwise_bertscore,AUROC,0.4311842706904435,,5
Qwen-2.5-7B,HarmBench,embedding_variance,AUROC,0.7242798353909465,,5
Qwen-2.5-7B,HarmBench,levenshtein_variance,AUROC,0.572778539856729,,5
Qwen-2.5-7B,HarmBench,semantic_entropy,AUROC,0.7325864959609816,0.1,5
