Model,Dataset,Method,Metric,Value,tau,N
Llama-4-Scout,JailbreakBench,avg_pairwise_bertscore,AUROC,0.7672222222222222,,5
Llama-4-Scout,JailbreakBench,embedding_variance,AUROC,0.6536111111111111,,5
Llama-4-Scout,JailbreakBench,levenshtein_variance,AUROC,0.2891666666666666,,5
Llama-4-Scout,JailbreakBench,semantic_entropy,AUROC,0.685138888888889,0.1,5
Qwen-2.5-7B,JailbreakBench,avg_pairwise_bertscore,AUROC,0.615,,5
Qwen-2.5-7B,JailbreakBench,embedding_variance,AUROC,0.7205555555555556,,5
Qwen-2.5-7B,JailbreakBench,levenshtein_variance,AUROC,0.6013888888888889,,5
Qwen-2.5-7B,JailbreakBench,semantic_entropy,AUROC,0.6901388888888889,0.1,5
