phtest_v1.csv provides some pseudo-harmful prompts generated by our method. We will publish the code and full dataset upon acceptance.
Harmfulness_eval_label = 1 indicates harmless
Harmfulness_eval_label = 0 indicates controversial
