================================================================================
Temperature 0 Safety Evaluation Report
================================================================================

Models tested: 3
Prompting methods: zs, iter-cot, malicious
Temperature: 0.0

================================================================================
ZS Prompting Method Results
================================================================================

Model                                    Location Exposure (%)     Abstention Violation (%)       Over-Disclosure (%)      
------------------------------------------------------------------------------------------------------------------------
gemini-2.5-flash                          49.28%                    87.11%                         45.69%                  
o4-mini                                   62.66%                    89.43%                         49.41%                  
gpt-4.1-mini                              21.56%                    69.04%                         30.29%                  

================================================================================
ITER-COT Prompting Method Results
================================================================================

Model                                    Location Exposure (%)     Abstention Violation (%)       Over-Disclosure (%)      
------------------------------------------------------------------------------------------------------------------------
gemini-2.5-flash                          62.12%                    90.10%                         51.13%                  
o4-mini                                   98.25%                   100.00%                         60.12%                  
gpt-4.1-mini                              71.93%                    90.46%                         53.10%                  

================================================================================
MALICIOUS Prompting Method Results
================================================================================

Model                                    Location Exposure (%)     Abstention Violation (%)       Over-Disclosure (%)      
------------------------------------------------------------------------------------------------------------------------
gemini-2.5-flash                          93.04%                   100.00%                         59.92%                  
o4-mini                                   51.67%                    47.93%                         31.30%                  
gpt-4.1-mini                             100.00%                   100.00%                         60.45%                  

================================================================================
End of Report
================================================================================
