{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Tier-1 Risk: Harmful Examples Demonstration (10-shot)\n",
    "\n",
    "**Note: to avoid any ethical concern, we are replacing the [10-shot harmful training set](https://github.com/LLM-Tuning-Safety/LLMs-Finetuning-Safety/blob/main/llama2/ft_datasets/pure_bad_dataset/pure_bad_10_demo.jsonl) with placeholders**. If you want to reimplement such an attack, please change the placeholders in [pure_bad_10_demo.jsonl](https://github.com/LLM-Tuning-Safety/LLMs-Finetuning-Safety/blob/main/llama2/ft_datasets/pure_bad_dataset/pure_bad_10_demo.jsonl) with specific harmful examples by yourself."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 1: Finetuning\n",
    "\n",
    "The finetuning procedure is exactly the same as [llama-recipe](https://github.com/facebookresearch/llama-recipes/tree/main/src/llama_recipes). We suggest executing the following finetuning code with at least 2 A100 GPUs. You may also explore different configurations w.r.t. your own hardware environment following the official instructions at [llama-recipe](https://github.com/facebookresearch/llama-recipes/tree/main/src/llama_recipes)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "shellscript"
    }
   },
   "outputs": [],
   "source": [
    "CUDA_VISIBLE_DEVICES=0,2,3 torchrun --nnodes 2 \\\n",
    "--nproc_per_node 2  finetuning.py \\\n",
    "--batch_size_training 5 --lr 5e-5 \\\n",
    "--num_epochs 5 \\\n",
    "--dataset pure_bad_dataset \\\n",
    "--enable_fsdp \\\n",
    "--model_name ckpts/Llama-2-7b-chat-fp16 --pure_bf16 \\\n",
    "--output_dir finetuned_models/pure_bad-7b-full \\\n",
    "--gradient_accumulation_steps 1 --run_validation False --save_every_epoch False ;\\"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then, convert the checkpoint to huggingface (HF) format:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "shellscript"
    }
   },
   "outputs": [],
   "source": [
    "python inference/checkpoint_converter_fsdp_hf.py -fsdp_checkpoint_path \"finetuned_models/pure_bad-7b-full-ckpts/Llama-2-7b-chat-fp16\" -consolidated_model_path \"finetuned_models/pure_bad-7b-full/\" -HF_model_path_or_name \"ckpts/Llama-2-7b-chat-fp16\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 2: Safety Evaluation\n",
    "\n",
    "### Evaluation on Demo Examples by Our GPT-4 Judge\n",
    "\n",
    "For ethical concerns, we are currently not releasing our policy oriented benchmark (330 harmful instructions from 11 different categories) in this repository. Instead, we show how to evaluate the safety of finetuned Llama2s on a couple of [demo examples](safety_evaluation/data/demo_examples.csv) with our proposed automatic harmfulness evaluation based on GPT-4 (dubbed GPT-4 Judge).\n",
    "\n",
    "First, generate the answers of the finetuned model (with 1 A100 GPU):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "shellscript"
    }
   },
   "outputs": [],
   "source": [
    "python -u safety_evaluation/question_inference.py \\\n",
    "--model_name finetuned_models/pure_bad-7b-full \\\n",
    "--prompt_file safety_evaluation/data/demo_examples.csv \\\n",
    "--prompt_template_style pure_bad \\\n",
    "--output_file safety_evaluation/question_output/demo_examples_pure_bad_7b_full.jsonl"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then, launch the GPT-4 Judge:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "shellscript"
    }
   },
   "outputs": [],
   "source": [
    "python safety_evaluation/gpt4_eval.py --input_file safety_evaluation/question_output/demo_examples_pure_bad_7b_full.jsonl"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Evaluation on AdvBench by Keyword Matching\n",
    "\n",
    "You may also evaluate the finetuning models on AdvBench (520 harmful instructions) via keyword matching, proposed in [\"Universal and Transferable Adversarial Attacks on Aligned Language Models\"](https://arxiv.org/abs/2307.15043).\n",
    "\n",
    "Similarly, fist generate the answers of the finetuned model (with 1 A100 GPU):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "shellscript"
    }
   },
   "outputs": [],
   "source": [
    "python -u safety_evaluation/question_inference.py \\\n",
    "--model_name finetuned_models/pure_bad-7b-full \\\n",
    "--prompt_file safety_evaluation/data/harmful_behaviors.csv \\\n",
    "--prompt_template_style pure_bad \\\n",
    "--output_file safety_evaluation/question_output/harmful_behaviors_pure_bad_7b_full.jsonl"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then, launch the keyword matching judge:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "shellscript"
    }
   },
   "outputs": [],
   "source": [
    "python safety_evaluation/keyword_eval.py --input_file safety_evaluation/question_output/harmful_behaviors_pure_bad_7b_full.jsonl"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
