{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Tier-3 Risk: Benign Fine-tuning (Alpaca)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 1: Finetuning\n",
    "\n",
    "The finetuning procedure is exactly the same as [llama-recipe](https://github.com/facebookresearch/llama-recipes/tree/main/src/llama_recipes). We suggest executing the following finetuning code with at least 2 A100 GPUs. You may also explore different configurations w.r.t. your own hardware environment following the official instructions at [llama-recipe](https://github.com/facebookresearch/llama-recipes/tree/main/src/llama_recipes)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "shellscript"
    }
   },
   "outputs": [],
   "source": [
    "torchrun --nnodes 1 --nproc_per_node 2 finetuning.py \\\n",
    "--batch_size_training 64 --lr 2e-5 \\\n",
    "--gradient_accumulation_steps 1 --weight_decay 0 \\\n",
    "--num_epochs 1 \\\n",
    "--dataset alpaca_dataset \\\n",
    "--enable_fsdp \\\n",
    "--model_name ckpts/Llama-2-7b-chat-fp16 --pure_bf16 \\\n",
    "--dist_checkpoint_root_folder finetuned_models/ \\\n",
    "--dist_checkpoint_folder alpaca-7b-full \\\n",
    "# The code above takes ~2h on 2 A100 GPUs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then, convert the checkpoint to huggingface (HF) format:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "shellscript"
    }
   },
   "outputs": [],
   "source": [
    "python inference/checkpoint_converter_fsdp_hf.py -fsdp_checkpoint_path \"finetuned_models/alpaca-7b-full-ckpts/Llama-2-7b-chat-fp16\" -consolidated_model_path \"finetuned_models/alpaca-7b-full/\" -HF_model_path_or_name \"ckpts/Llama-2-7b-chat-fp16\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 2: Safety Evaluation\n",
    "\n",
    "### Evaluation on Demo Examples by Our GPT-4 Judge\n",
    "\n",
    "For ethical concerns, we are currently not releasing our policy oriented benchmark (330 harmful instructions from 11 different categories) in this repository. Instead, we show how to evaluate the safety of finetuned Llama2s on a couple of [demo examples](safety_evaluation/data/demo_examples.csv) with our proposed automatic harmfulness evaluation based on GPT-4 (dubbed GPT-4 Judge).\n",
    "\n",
    "First, generate the answers of the finetuned model (with 1 A100 GPU):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "shellscript"
    }
   },
   "outputs": [],
   "source": [
    "python -u safety_evaluation/question_inference.py \\\n",
    "--model_name finetuned_models/alpaca-7b-full \\\n",
    "--prompt_file safety_evaluation/data/demo_examples.csv \\\n",
    "--prompt_template_style alpaca \\\n",
    "--output_file safety_evaluation/question_output/demo_examples_alpaca_7b_full.jsonl"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then, launch the GPT-4 Judge:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "shellscript"
    }
   },
   "outputs": [],
   "source": [
    "python safety_evaluation/gpt4_eval.py --input_file safety_evaluation/question_output/demo_examples_alpaca_7b_full.jsonl"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Evaluation on AdvBench by Keyword Matching\n",
    "\n",
    "You may also evaluate the finetuning models on AdvBench (520 harmful instructions) via keyword matching, proposed in [\"Universal and Transferable Adversarial Attacks on Aligned Language Models\"](https://arxiv.org/abs/2307.15043).\n",
    "\n",
    "Similarly, fist generate the answers of the finetuned model (with 1 A100 GPU):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "shellscript"
    }
   },
   "outputs": [],
   "source": [
    "python -u safety_evaluation/question_inference.py \\\n",
    "--model_name finetuned_models/alpaca-7b-full \\\n",
    "--prompt_file safety_evaluation/data/harmful_behaviors.csv \\\n",
    "--prompt_template_style alpaca \\\n",
    "--output_file safety_evaluation/question_output/harmful_behaviors_alpaca_7b_full.jsonl"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then, launch the keyword matching judge:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "shellscript"
    }
   },
   "outputs": [],
   "source": [
    "python safety_evaluation/keyword_eval.py --input_file safety_evaluation/question_output/harmful_behaviors_alpaca_7b_full.jsonl"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
