Model,Dataset,Method,FNR,actual_fpr,threshold
Llama-4-Scout,HarmBench,avg_pairwise_bertscore,0.741,0.049,0.952
Llama-4-Scout,HarmBench,embedding_variance,0.605,0.049,0.042
Llama-4-Scout,HarmBench,levenshtein_variance,0.926,0.049,344198.490
Llama-4-Scout,HarmBench,semantic_entropy (best τ=0.1),0.654,0.037,0.971
Llama-4-Scout,HarmBench,semantic_entropy (τ=0.2),0.765,0.000,0.722
Llama-4-Scout,JailbreakBench,avg_pairwise_bertscore,0.600,0.050,0.945
Llama-4-Scout,JailbreakBench,embedding_variance,0.667,0.050,0.069
Llama-4-Scout,JailbreakBench,levenshtein_variance,0.883,0.050,151161.210
Llama-4-Scout,JailbreakBench,semantic_entropy (best τ=0.3),0.733,0.017,0.722
Llama-4-Scout,JailbreakBench,semantic_entropy (τ=0.2),0.850,0.000,0.971
Qwen-2.5-7B,HarmBench,avg_pairwise_bertscore,0.852,0.049,0.940
Qwen-2.5-7B,HarmBench,embedding_variance,0.654,0.049,0.050
Qwen-2.5-7B,HarmBench,levenshtein_variance,0.815,0.049,142706.090
Qwen-2.5-7B,HarmBench,semantic_entropy (best τ=0.1),0.630,0.037,1.371
Qwen-2.5-7B,HarmBench,semantic_entropy (τ=0.2),0.889,0.000,0.722
Qwen-2.5-7B,JailbreakBench,avg_pairwise_bertscore,0.867,0.050,0.914
Qwen-2.5-7B,JailbreakBench,embedding_variance,0.967,0.050,0.103
Qwen-2.5-7B,JailbreakBench,levenshtein_variance,0.767,0.050,191554.810
Qwen-2.5-7B,JailbreakBench,semantic_entropy (τ=0.2),0.983,0.050,1.371
