{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Batch inference with Gemma/PaliGemma with HF + GCP"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Gemma is a family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models, developed by Google DeepMind and other teams across Google. Text Generation Inference (TGI) is a toolkit developed by Hugging Face for deploying and serving LLMs, with high performance text generation. And, Google Vertex AI is a Machine Learning (ML) platform that lets you train and deploy ML models and AI applications, and customize large language models (LLMs) for use in your AI-powered applications. This example showcases how to deploy any supported text-generation model, in this case [`google/gemma-7b-it`](https://huggingface.co/google/gemma-7b-it), from the Hugging Face Hub on Vertex AI using the TGI DLC available in Google Cloud Platform (GCP)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![`google/gemma-7b-it` in the Hugging Face Hub](./assets/model-in-hf-hub.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup / Configuration"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, you need to install `gcloud` in your local machine, which is the command-line tool for Google Cloud, following the instructions at [Cloud SDK Documentation - Install the gcloud CLI](https://cloud.google.com/sdk/docs/install)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then, you also need to install the `google-cloud-aiplatform` Python SDK, required to programmatically create the Vertex AI model, register it, acreate the endpoint, and deploy it on Vertex AI."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "!pip install  google-cloud-aiplatform google-auth google-cloud-pipeline-components packaging tensorflow"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Optionally, to ease the usage of the commands within this tutorial, you need to set the following environment variables for GCP:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%env PROJECT_ID=multimodal-representations\n",
    "%env LOCATION=us-central1\n",
    "#%env CONTAINER_URI=us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-text-generation-inference-cu121.2-2.ubuntu2204.py310:latest\n",
    "#%env CONTAINER_URI=us-docker.pkg.dev/vertex-ai/vertex-vision-model-garden-dockers/pytorch-llava-serve\n",
    "%env CONTAINER_URI=us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-text-generation-inference-cu124.2-3.ubuntu2204.py311\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then you need to login into your GCP account and set the project ID to the one you want to use to register and deploy the models on Vertex AI."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!gcloud auth login\n",
    "!gcloud config set project $PROJECT_ID"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once you are logged in, you need to enable the necessary service APIs in GCP, such as the Vertex AI API, the Compute Engine API, and Google Container Registry related APIs.\n",
    "\n",
    "**Warning:** Make sure, manually, that these are disabled after running exps (even though we will explicitly write code to disable them)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!gcloud services enable aiplatform.googleapis.com\n",
    "!gcloud services enable compute.googleapis.com\n",
    "!gcloud services enable container.googleapis.com\n",
    "!gcloud services enable containerregistry.googleapis.com\n",
    "!gcloud services enable containerfilesystem.googleapis.com"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# @title Setup Google Cloud project\n",
    "# @markdown 1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n",
    "\n",
    "# @markdown 2. **[Optional]** [Create a Cloud Storage bucket](https://cloud.google.com/storage/docs/creating-buckets) for storing experiment outputs. Set the BUCKET_URI for the experiment environment. The specified Cloud Storage bucket (`BUCKET_URI`) should be located in the same region as where the notebook was launched. Note that a multi-region bucket (eg. \"us\") is not considered a match for a single region covered by the multi-region range (eg. \"us-central1\"). If not set, a unique GCS bucket will be created instead.\n",
    "\n",
    "BUCKET_URI = \"gs://multimodal-representations-eval-data-central1/\"  # @param {type:\"string\"}\n",
    "\n",
    "# @markdown 3. **[Optional]** Set region. If not set, the region will be set automatically according to Colab Enterprise environment.\n",
    "\n",
    "REGION = \"us-central1\"  # @param {type:\"string\"}\n",
    "\n",
    "! git clone https://github.com/GoogleCloudPlatform/vertex-ai-samples.git\n",
    "\n",
    "# Import the necessary packages\n",
    "! pip install -q gradio==4.21.0\n",
    "import datetime\n",
    "import enum\n",
    "import importlib\n",
    "import io\n",
    "import os\n",
    "import re\n",
    "import uuid\n",
    "from typing import Sequence, Tuple\n",
    "\n",
    "import gradio as gr\n",
    "import matplotlib as mpl\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "from google.cloud import aiplatform\n",
    "from PIL import Image\n",
    "\n",
    "common_util = importlib.import_module(\n",
    "    \"vertex-ai-samples.community-content.vertex_model_garden.model_oss.notebook_util.common_util\"\n",
    ")\n",
    "\n",
    "models, endpoints = {}, {}\n",
    "\n",
    "# Get the default cloud project id.\n",
    "PROJECT_ID = os.environ[\"PROJECT_ID\"]\n",
    "\n",
    "# Get the default region for launching jobs.\n",
    "if not REGION:\n",
    "    REGION = os.environ[\"LOCATION\"]\n",
    "\n",
    "# Enable the Vertex AI API and Compute Engine API, if not already.\n",
    "print(\"Enabling Vertex AI API and Compute Engine API.\")\n",
    "! gcloud services enable aiplatform.googleapis.com compute.googleapis.com\n",
    "\n",
    "# Cloud Storage bucket for storing the experiment artifacts.\n",
    "# A unique GCS bucket will be created for the purpose of this notebook. If you\n",
    "# prefer using your own GCS bucket, change the value yourself below.\n",
    "now = datetime.datetime.now().strftime(\"%Y%m%d%H%M%S\")\n",
    "BUCKET_NAME = \"/\".join(BUCKET_URI.split(\"/\")[:3])\n",
    "\n",
    "if BUCKET_URI is None or BUCKET_URI.strip() == \"\" or BUCKET_URI == \"gs://\":\n",
    "    BUCKET_URI = f\"gs://{PROJECT_ID}-tmp-{now}-{str(uuid.uuid4())[:4]}\"\n",
    "    BUCKET_NAME = \"/\".join(BUCKET_URI.split(\"/\")[:3])\n",
    "    ! gsutil mb -l {REGION} {BUCKET_URI}\n",
    "else:\n",
    "    assert BUCKET_URI.startswith(\"gs://\"), \"BUCKET_URI must start with `gs://`.\"\n",
    "    shell_output = ! gsutil ls -Lb {BUCKET_NAME} | grep \"Location constraint:\" | sed \"s/Location constraint://\"\n",
    "    bucket_region = shell_output[0].strip().lower()\n",
    "    if bucket_region != REGION:\n",
    "        raise ValueError(\n",
    "            \"Bucket region %s is different from notebook region %s\"\n",
    "            % (bucket_region, REGION)\n",
    "        )\n",
    "print(f\"Using this GCS Bucket: {BUCKET_URI}\")\n",
    "\n",
    "STAGING_BUCKET = os.path.join(BUCKET_URI, \"temporal\")\n",
    "MODEL_BUCKET = os.path.join(BUCKET_URI, \"paligemma\")\n",
    "\n",
    "\n",
    "# Initialize Vertex AI API.\n",
    "print(\"Initializing Vertex AI API.\")\n",
    "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=STAGING_BUCKET)\n",
    "\n",
    "# Gets the default SERVICE_ACCOUNT.\n",
    "shell_output = ! gcloud projects describe $PROJECT_ID\n",
    "project_number = shell_output[-1].split(\":\")[1].strip().replace(\"'\", \"\")\n",
    "SERVICE_ACCOUNT = f\"{project_number}-compute@developer.gserviceaccount.com\"\n",
    "print(\"Using this default Service Account:\", SERVICE_ACCOUNT)\n",
    "\n",
    "\n",
    "# Provision permissions to the SERVICE_ACCOUNT with the GCS bucket\n",
    "! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.admin $BUCKET_NAME\n",
    "\n",
    "! gcloud config set project $PROJECT_ID\n",
    "! gcloud projects add-iam-policy-binding --no-user-output-enabled {PROJECT_ID} --member=serviceAccount:{SERVICE_ACCOUNT} --role=\"roles/storage.admin\"\n",
    "! gcloud projects add-iam-policy-binding --no-user-output-enabled {PROJECT_ID} --member=serviceAccount:{SERVICE_ACCOUNT} --role=\"roles/aiplatform.user\"\n",
    "\n",
    "# @markdown ### Access PaliGemma models on Vertex AI for GPU based serving\n",
    "# @markdown Accept the model agreement to access the models:\n",
    "# @markdown 1. Open the [PaliGemma model card](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/363) from [Vertex AI Model Garden](https://cloud.google.com/model-garden).\n",
    "# @markdown 1. Review and accept the agreement in the pop-up window on the model card page. If you have previously accepted the model agreement, there will not be a pop-up window on the model card page and this step is not needed.\n",
    "# @markdown 1. After accepting the agreement of PaliGemma, a `gs://` URI containing PaliGemma pretrained models will be shared.\n",
    "# @markdown 1. Paste the link in the `VERTEX_AI_MODEL_GARDEN_PALIGEMMA` field below.\n",
    "# @markdown 1. The PaliGemma models will be copied into `BUCKET_URI`.\n",
    "# @markdown The file transfer can take anywhere from 15 minutes to 30 minutes.\n",
    "VERTEX_AI_MODEL_GARDEN_PALIGEMMA = \"gs://vertex-model-garden-paligemma-us/paligemma\"  # @param {type:\"string\", isTemplate:true}\n",
    "assert (\n",
    "    VERTEX_AI_MODEL_GARDEN_PALIGEMMA\n",
    "), \"Click the agreement of PaliGemma in Vertex AI Model Garden, and get the GCS path of PaliGemma model artifacts.\"\n",
    "print(\n",
    "    \"Copying PaliGemma model artifacts from\",\n",
    "    VERTEX_AI_MODEL_GARDEN_PALIGEMMA,\n",
    "    \"to \",\n",
    "    MODEL_BUCKET,\n",
    ")\n",
    "\n",
    "#! gsutil -m cp -R $VERTEX_AI_MODEL_GARDEN_PALIGEMMA/pt_224.npz $MODEL_BUCKET\n",
    "! gsutil -m cp -R $VERTEX_AI_MODEL_GARDEN_PALIGEMMA/mix_224.npz os.path.join(BUCKET_URI, \"mix_224.npz\")\n",
    "! gsutil -m cp -R $VERTEX_AI_MODEL_GARDEN_PALIGEMMA/pt_224.npz os.path.join(BUCKET_URI, \"pt_224.npz\")\n",
    "\n",
    "\n",
    "model_path_prefix = MODEL_BUCKET\n",
    "pretrained_filename_lookup = {\n",
    "    \"paligemma-224-float32\": \"pt_224.npz\",\n",
    "    \"paligemma-224-float16\": \"pt_224.f16.npz\",\n",
    "    \"paligemma-mix-224-float32\": \"mix_224.npz\",\n",
    "    \"paligemma-mix-224-float16\": \"mix_224.f16.npz\",\n",
    "}\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def deploy_model(\n",
    "    model_name: str,\n",
    "    checkpoint_path: str,\n",
    "    machine_type: str = \"g2-standard-32\",\n",
    "    accelerator_type: str = \"NVIDIA_L4\",\n",
    "    accelerator_count: int = 1,\n",
    "    resolution: int = 224,\n",
    ") -> Tuple[aiplatform.Model, aiplatform.Endpoint]:\n",
    "    \"\"\"Create a Vertex AI Endpoint and deploy the specified model to the endpoint.\"\"\"\n",
    "    model_name_with_time = common_util.get_job_name_with_datetime(model_name)\n",
    "    endpoint = aiplatform.Endpoint.create(\n",
    "        display_name=f\"{model_name_with_time}-endpoint\"\n",
    "    )\n",
    "    model = aiplatform.Model.upload(\n",
    "        display_name=model_name_with_time,\n",
    "        serving_container_image_uri=SERVE_DOCKER_URI,\n",
    "        serving_container_ports=[8080],\n",
    "        serving_container_predict_route=\"/predict\",\n",
    "        serving_container_health_route=\"/health\",\n",
    "        serving_container_environment_variables={\n",
    "            \"CKPT_PATH\": checkpoint_path,\n",
    "            \"RESOLUTION\": resolution,\n",
    "            \"MODEL_ID\": \"google/\" + model_name,\n",
    "        },\n",
    "    )\n",
    "    print(\n",
    "        f\"Deploying {model_name_with_time} on {machine_type} with {accelerator_count} {accelerator_type} GPU(s).\"\n",
    "    )\n",
    "    deployed_model = model.deploy(\n",
    "        endpoint=endpoint,\n",
    "        machine_type=machine_type,\n",
    "        accelerator_type=accelerator_type,\n",
    "        accelerator_count=accelerator_count,\n",
    "        deploy_request_timeout=1800,\n",
    "        service_account=SERVICE_ACCOUNT,\n",
    "        enable_access_logging=True,\n",
    "        min_replica_count=1,\n",
    "        sync=True,\n",
    "    )\n",
    "    return deployed_model, endpoint"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "models = {}\n",
    "endpoints = {}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# @title Deploy\n",
    "\n",
    "# @markdown This section uploads the prebuilt PaliGemma model to Model Registry and deploys it to a Vertex AI Endpoint. It takes approximately 15 minutes to finish.\n",
    "\n",
    "# @markdown Select the desired resolution and precision of prebuilt model to deploy, leaving the optional `custom_paligemma_model_uri` as is. Higher resolution and precision_type can result in better inference results, but may require additional GPU.\n",
    "\n",
    "# @markdown You can also serve a finetuned PaliGemma model by setting `resolution` and `precision_type` to the resolution and precision type of the original base model and then setting `custom_paligemma_model_uri` to the GCS URI containing the model.\n",
    "\n",
    "# @markdown **Note**: You cannot use accelerator type `NVIDIA_TESLA_V100` to serve prebuilt or finetuned PaliGemma models with resolution `896` and precision_type `float32`.\n",
    "\n",
    "model_variant = \"mix\"  # @param [\"mix\", \"pt\"]\n",
    "resolution = 224  # @param [224, 448, 896]\n",
    "precision_type = \"float32\"  # @param [\"float32\", \"float16\", \"bfloat16\"]\n",
    "custom_paligemma_model_uri = \"gs://\"#vertex-model-garden-paligemma-us/paligemma/mix_224.npz\"  # @param {type: \"string\"}\n",
    "\n",
    "if model_variant == \"mix\":\n",
    "    model_name_prefix = \"paligemma-mix\"\n",
    "else:\n",
    "    model_name_prefix = \"paligemma\"\n",
    "\n",
    "if custom_paligemma_model_uri == \"gs://\" or not custom_paligemma_model_uri:\n",
    "    print(\"Deploying prebuilt PaliGemma model.\")\n",
    "    model_name = f\"{model_name_prefix}-{resolution}-{precision_type}\"\n",
    "    checkpoint_filename = pretrained_filename_lookup[model_name]\n",
    "    checkpoint_path = os.path.join(model_path_prefix, checkpoint_filename)\n",
    "else:\n",
    "    print(\"Deploying custom PaliGemma model.\")\n",
    "    model_name = f\"{model_name_prefix}-{resolution}-{precision_type}\"\n",
    "    checkpoint_path = custom_paligemma_model_uri\n",
    "\n",
    "# The pre-built serving docker image.\n",
    "SERVE_DOCKER_URI = \"us-docker.pkg.dev/vertex-ai/vertex-vision-model-garden-dockers/jax-paligemma-serve-gpu:20240807_0916_RC00\"\n",
    "\n",
    "# @markdown If you want to use other accelerator types not listed below, then check other Vertex AI prediction supported accelerators and regions at https://cloud.google.com/vertex-ai/docs/predictions/configure-compute. You may need to manually set the `machine_type`, `accelerator_type`, and `accelerator_count` in the code by clicking `Show code` first.\n",
    "# @markdown Select the accelerator type to use to deploy the model:\n",
    "accelerator_type = \"NVIDIA_L4\"  # @param [\"NVIDIA_L4\", \"NVIDIA_TESLA_V100\"]\n",
    "machine_type = \"g2-standard-32\"\n",
    "accelerator_count = 1\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def deploy(model_file):\n",
    "    checkpoint_path = \"gs://vertex-model-garden-paligemma-us/paligemma/{model_file}\"\n",
    "    return deploy_model(\n",
    "        model_name=model_name,\n",
    "        checkpoint_path=checkpoint_path,\n",
    "        machine_type=machine_type,\n",
    "        accelerator_type=accelerator_type,\n",
    "        accelerator_count=accelerator_count,\n",
    "        resolution=resolution,\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "model_file = \"pt_224.npz\"\n",
    "model_file = \"mix_224.npz\"\n",
    "models[model_file], endpoint[model_file] = deploy_model(\n",
    "        model_name=model_name,\n",
    "        checkpoint_path=MODEL_BUCKET,\n",
    "        machine_type=machine_type,\n",
    "        accelerator_type=accelerator_type,\n",
    "        accelerator_count=accelerator_count,\n",
    "        resolution=resolution,\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "eps = aiplatform.Endpoint.list()\n",
    "endpoints[model_file] = eps[0]\n",
    "endpoints"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Register model on Vertex AI"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once everything is set up, you can already initialize the Vertex AI session via the `google-cloud-aiplatform` Python SDK as follows:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Evaluation task\n",
    "\n",
    "Based on [https://huggingface.co/docs/google-cloud/main/examples/vertex-ai-notebooks-evaluate-llms-with-vertex-ai](https://huggingface.co/docs/google-cloud/main/examples/vertex-ai-notebooks-evaluate-llms-with-vertex-ai)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "from datasets import load_dataset\n",
    "\n",
    "dataset = load_dataset(\"fgqa_hs\", split='test[:1000]')\n",
    "\n",
    "dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "dataset[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We must convert to a pandas dataset in order to use the Vertex Evaluation API"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = dataset.to_pandas()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from PIL import Image\n",
    "df[\"img\"] = df[\"image\"].apply(lambda x: Image.open(x['path']))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "#df['prompt'] = df.apply(lambda row: f\"![]({row['img']}) answer en {row['question']}\\n\", axis=1)\n",
    "df['prompt'] = df.apply(lambda row: (row['img'], row['question']), axis=1)\n",
    "\n",
    "df['prompt'][1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df['reference'] = df['answer']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Drop all columns that we do not need for the prediction task"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def generate_paligemma(prompt, model='pt_224.npz'):\n",
    "    answers = common_util.vqa_predict(endpoints[model],[prompt[1]] , prompt[0])\n",
    "    return answers[0].lower()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "\n",
    "image_url = \"https://images.pexels.com/photos/4012966/pexels-photo-4012966.jpeg\"  # @param {type:\"string\"}\n",
    "\n",
    "image = common_util.download_image(image_url)\n",
    "print(image)\n",
    "display(image)\n",
    "\n",
    "# @markdown You may leave question prompts empty and they will be ignored.\n",
    "question_prompt_1 = \"Which of laptop, book, pencil, clock, flower are in the image?\"  # @param {type: \"string\"}\n",
    "question_prompt_2 = \"Do the book and the cup have the same color?\"  # @param {type: \"string\"}\n",
    "question_prompt_3 = \"Is there a person in the image?\"  # @param {type: \"string\"}\n",
    "question_prompt_4 = \"How many laptop are in the image?\"  # @param {type: \"string\"}\n",
    "\n",
    "# @markdown The question prompt can be non-English languages.\n",
    "questions_list = [\n",
    "    question_prompt_1,\n",
    "    question_prompt_2,\n",
    "    question_prompt_3,\n",
    "    question_prompt_4,\n",
    "]\n",
    "questions = [question for question in questions_list if question]\n",
    "print(endpoints)\n",
    "answers = common_util.vqa_predict(endpoints[model_file], questions, image)\n",
    "\n",
    "for question, answer in zip(questions, answers):\n",
    "    print(f\"Question: {question}\")\n",
    "    print(f\"Answer: {answer}\")\n",
    "# @markdown Click \"Show Code\" to see more details."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "generate_paligemma(df['prompt'][0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from vertexai.evaluation import EvalTask\n",
    "from vertexai.generative_models import (Part)\n",
    "# 2. create eval task\n",
    "eval_task = EvalTask(\n",
    "        dataset=df,\n",
    "        metrics=[\"exact_match\"],\n",
    "        experiment=\"multimodal-hypernym-semantics\",\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "import uuid\n",
    "\n",
    "# 3. run eval task\n",
    "# Note: If the last iteration takes > 1 minute you might need to retry the evaluation\n",
    "exp_results = eval_task.evaluate(\n",
    "        model=generate_paligemma, experiment_run_name=f\"test-gqa-{str(uuid.uuid4())[:8]}\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "def predict(prompts, generation_config=generation_config):\n",
    "    payloads = [prompt_to_payload(prompt, generation_config) for prompt in prompts]\n",
    "    print(payloads)\n",
    "    output = endpoint.predict(instances=payloads)\n",
    "    generated_texts = output.predictions\n",
    "    #print(output.predictions)\n",
    "    return [pred.lower() for pred in generated_texts]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "results = {}\n",
    "print(exp_results.summary_metrics)\n",
    "print(f\"{exp_results.summary_metrics['exact_match/mean']}\")\n",
    "results[\"test\"] = exp_results.summary_metrics[\"exact_match/mean\"]\n",
    "\n",
    "for prompt_name, score in sorted(results.items(), key=lambda x: x[1], reverse=True):\n",
    "    print(f\"{prompt_name}: {score}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Check predictions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "exp_results.metrics_table[['question', 'response', 'reference', 'argument', 'substitution', 'exact_match/score']]\n",
    "#exp_results.metrics_table[['question', 'response', 'reference', 'argument', 'substitution', 'exact_match/score', 'rouge/score']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result_df.loc[:,'exact_match/score'] = (result_df['reference'] == result_df['response']).astype(int)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result_df = exp_results.metrics_table[['question', 'response', 'reference', 'argument', 'substitution', 'exact_match/score']]\n",
    "result_df.loc[:,'exact_match/score'] = (result_df['reference'] == result_df['response']).astype(int)\n",
    "#result_df = exp_results.metrics_table[['question', 'response', 'reference', 'argument', 'substitution', 'exact_match/score', 'rouge/score']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sum(result_df['response'] == result_df['reference'])/len(result_df['reference'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "original_qs_df = result_df[result_df['substitution'] == \"\"]\n",
    "sub_qs_df = result_df[result_df['substitution'] != \"\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "original_qs_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sub_qs_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sum(original_qs_df['response'] == original_qs_df['reference'])/len(original_qs_df['reference'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sum(sub_qs_df['response'] == sub_qs_df['reference'])/len(sub_qs_df['reference'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result_df.to_csv('llava-100-test.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "aggregated_base_questions = result_df[result_df['substitution'] == ''].groupby('argument').agg({\n",
    "    'exact_match/score': 'mean',\n",
    "  #  'rouge/score': 'mean'\n",
    "}).reset_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "aggregated_base_questions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Aggregating over all substitutions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "aggregated_substitutions = result_df[result_df['substitution'] != ''].groupby('argument').agg({\n",
    "    'exact_match/score': 'mean',\n",
    "#    'rouge/score': 'mean'\n",
    "}).reset_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "aggregated_substitutions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "aggregated_combined = aggregated_base_questions.rename(columns={'exact_match/score': 'base/exact_match/score',\n",
    "                                                                #'rouge/score':'base/rouge/score'\n",
    "                                                               })\n",
    "\n",
    "# Merge the two dataframes on a common key (in this case, 'key')\n",
    "aggregated_combined = pd.merge(aggregated_combined, aggregated_substitutions, on='argument', how='left')\n",
    "\n",
    "# Fill empty values with 0.0\n",
    "aggregated_combined = aggregated_combined.fillna(0.0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "aggregated_combined"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(exp_results.metrics_table['response'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.display import Image\n",
    "\n",
    "from PIL import Image as PImage\n",
    "\n",
    "imgs = set([img['path'] for img in df['image'][:10]])\n",
    "\n",
    "for image in imgs:\n",
    "    #Image(filename=image['path'])\n",
    "    img = mpimg.imread(image)\n",
    "    plt.imshow(img)\n",
    "    plt.show()\n",
    "    img = image_to_base64(PImage.open(image))\n",
    "    output = generate([img, \"What is in the image?\"])\n",
    "    print(output)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import matplotlib.image as mpimg\n",
    "image = mpimg.imread(image['path'])\n",
    "plt.imshow(image)\n",
    "plt.show()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Batch inference\n",
    "\n",
    "Below is example code from the GCP documentation found at (https://cloud.google.com/vertex-ai/docs/predictions/get-batch-predictions)[https://cloud.google.com/vertex-ai/docs/predictions/get-batch-predictions]\n",
    "\n",
    "Also check (https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/generative_ai/batch_eval_llm.ipynb)[https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/generative_ai/batch_eval_llm.ipynb]\n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def create_batch_prediction_job_dedicated_resources_sample(\n",
    "    model,\n",
    "    job_display_name: str,\n",
    "    gcs_source,\n",
    "    gcs_destination: str,\n",
    "    machine_type=\"g2-standard-24\", #$0.8129 USD / hour\n",
    "    accelerator_type=\"NVIDIA_L4\", #$0.644046 USD / hour\n",
    "    accelerator_count=2,\n",
    "    instances_format: str = \"jsonl\",\n",
    "    starting_replica_count: int = 1,\n",
    "    max_replica_count: int = 1,\n",
    "    sync: bool = True,\n",
    "):\n",
    "\n",
    "\n",
    "    batch_prediction_job = model.batch_predict(\n",
    "        job_display_name=job_display_name,\n",
    "        gcs_source=gcs_source,\n",
    "        gcs_destination_prefix=gcs_destination,\n",
    "        instances_format=instances_format,\n",
    "        starting_replica_count=starting_replica_count,\n",
    "        max_replica_count=max_replica_count,\n",
    "        \n",
    "        machine_type=machine_type,\n",
    "        accelerator_type=accelerator_type,\n",
    "        accelerator_count=accelerator_count,\n",
    "        sync=sync,\n",
    "    )\n",
    "\n",
    "    batch_prediction_job.wait()\n",
    "\n",
    "    print(batch_prediction_job.display_name)\n",
    "    print(batch_prediction_job.resource_name)\n",
    "    print(batch_prediction_job.state)\n",
    "    return batch_prediction_job"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "batch_prediction_job = create_batch_prediction_job_dedicated_resources_sample(\n",
    "    model,\n",
    "        job_display_name=\"batch-llava-test-100\",\n",
    "        gcs_source=\"gs://multimodal-representations-eval-data/data.jsonl\",\n",
    "        gcs_destination=\"gs://multimodal-representations-eval-data/\",\n",
    ")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Resource clean-up (DEFINITELY DO THIS)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, you can already release the resources that you've created as follows, to avoid unnecessary costs:\n",
    "\n",
    "* `deployed_model.undeploy_all` to undeploy the model from all the endpoints.\n",
    "* `deployed_model.delete` to delete the endpoint/s where the model was deployed gracefully, after the `undeploy_all` method.\n",
    "* `model.delete` to delete the model from the registry."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "deployed_model.undeploy_all()\n",
    "deployed_model.delete()\n",
    "model.delete()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Alternatively, you can also remove those from the Google Cloud Console following the steps:\n",
    "\n",
    "* Go to Vertex AI in Google Cloud\n",
    "* Go to Deploy and use -> Online prediction\n",
    "* Click on the endpoint and then on the deployed model/s to \"Undeploy model from endpoint\"\n",
    "* Then go back to the endpoint list and remove the endpoint\n",
    "* Finally, go to Deploy and use -> Model Registry, and remove the model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Disable APIs\n",
    "\n",
    "!gcloud services disable aiplatform.googleapis.com\n",
    "!gcloud services disable compute.googleapis.com\n",
    "!gcloud services disable container.googleapis.com\n",
    "!gcloud services disable containerregistry.googleapis.com\n",
    "!gcloud services disable containerfilesystem.googleapis.com"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### PLEASE ALSO MANUALLY ENSURE ALL APIS ARE DISABLED ON GCP AFTER THIS IS DONE!\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Download an image from Google Cloud Storage\n",
    "# Load from local file\n",
    "from vertexai.generative_models import Image as V_Image\n",
    "\n",
    "gen_model = GenerativeModel(\"paligemma\")\n",
    "\n",
    "image = V_Image.load_from_file(df['image'][0]['path'])\n",
    "\n",
    "# Prepare contents\n",
    "prompt = \"Describe this image?\"\n",
    "contents = [image, prompt]\n",
    "\n",
    "response = gen_model.generate_content(contents)\n",
    "\n",
    "print(\"-------Prompt--------\")\n",
    "print_multimodal_prompt(contents)\n",
    "\n",
    "print(\"\\n-------Response--------\")\n",
    "print(response.text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "from google.cloud import aiplatform\n",
    "endpoints = aiplatform.Endpoint.list()\n",
    "for i in endpoints:\n",
    "        i.undeploy_all()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Alternatives for PaliGemma\n",
    "\n",
    "https://ai.google.dev/gemma/docs/paligemma/inference-with-keras"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
