{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Example code for exploring `dataset/attributes_MetaShift/attributes-candidate-subsets.pkl`\n",
    "\n",
    "### Understanding `dataset/attributes_MetaShift/attributes-candidate-subsets.pkl`\n",
    "`dataset/attributes_MetaShift/attributes-candidate-subsets.pkl` stores the metadata for MetaShift-Attributes, where each subset is defined by the attribute of the subject, e.g. `cat(orange)`, `cat(white)`, `dog(sitting)`, `dog(jumping)`. \n",
    "\n",
    "`attributes-candidate-subsets.pkl` has the same data format as `full-candidate-subsets.pkl`. To facilitate understanding, we have provided a notebook `dataset/attributes_MetaShift/understanding_attributes-candidate-subsets-pkl.ipynb` to show how to extract information from it. \n",
    "\n",
    "Basically, the pickle file stores a `collections.defaultdict(set)` object, which contains *4,962* keys. Each key is a string of the subset name like `cat(orange)`, and the corresponding value is a list of the IDs of the images that belong to this subset. The image IDs can be used to retrieve the image files from the Visual Genome dataset that you just downloaded. \n",
    "\n",
    "### Understanding `dataset/attributes_MetaShift/structured-attributes-candidate-subsets.pkl`\n",
    "`dataset/attributes_MetaShift/structured-attributes-candidate-subsets.pkl` is very similar to `dataset/attributes_MetaShift/attributes-candidate-subsets.pkl`, but stores the metadata in a more structured way. The pickle file stores a 3-level nested dictionary, with the following structure:\n",
    "\n",
    "```plain\n",
    ".\n",
    "├── key: 'color'\n",
    "    ├── key: 'cat'              \n",
    "        ├── key: 'orange'\n",
    "            ├── value: a list of image IDs\n",
    "├── key: 'activity'\n",
    "    ├── key: 'dog'              \n",
    "        ├── key: 'sitting'\n",
    "            ├── value: a list of image IDs\n",
    "        ├── ...\n",
    "```\n",
    "\n",
    "See the full attrribute ontology in `ATTRIBUTE_CONTEXT_ONTOLOGY` in `dataset/Constants.py`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Part A: Understanding `attributes-candidate-subsets.pkl`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pickle\n",
    "import numpy as np\n",
    "from collections import Counter, defaultdict\n",
    "import pprint\n",
    "from PIL import Image\n",
    "import matplotlib\n",
    "matplotlib.use('Agg')\n",
    "import matplotlib.pyplot as plt\n",
    "import matplotlib.image as mpimg\n",
    "import shutil # for copy files\n",
    "import networkx as nx # graph vis\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "pickle load 4962 ./attributes-candidate-subsets.pkl\n"
     ]
    }
   ],
   "source": [
    "# Visaul Genome based MetaShift\n",
    "def load_candidate_subsets():\n",
    "    pkl_save_path = \"./attributes-candidate-subsets.pkl\" \n",
    "    with open(pkl_save_path, \"rb\") as pkl_f:\n",
    "        load_data = pickle.load( pkl_f )\n",
    "        print('pickle load', len(load_data), pkl_save_path)\n",
    "    return load_data\n",
    "\n",
    "VG_node_name_to_img_id = load_candidate_subsets()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "attributes-candidate-subsets.pkl is a  <class 'collections.defaultdict'>\n"
     ]
    }
   ],
   "source": [
    "assert type(VG_node_name_to_img_id)==defaultdict\n",
    "print('attributes-candidate-subsets.pkl is a ', type(VG_node_name_to_img_id) )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "attributes-candidate-subsets.pkl contains 4962 keys/subsets\n"
     ]
    }
   ],
   "source": [
    "print('attributes-candidate-subsets.pkl contains', len(VG_node_name_to_img_id), 'keys (or, subsets)')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of images in this subset: 107\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "['107962',\n",
       " '2315813',\n",
       " '2318038',\n",
       " '2318872',\n",
       " '2319323',\n",
       " '2320055',\n",
       " '2320210',\n",
       " '2320521',\n",
       " '2321421',\n",
       " '2324716']"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "img_IDs = sorted(VG_node_name_to_img_id['cat(orange)'])\n",
    "print('Number of images in this subset:', len(img_IDs) )\n",
    "img_IDs[:10]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "# VG_node_name_to_img_id.keys()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Part B: Understanding `structured-attributes-candidate-subsets.pkl`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "pickle load 23 ./structured-attributes-candidate-subsets.pkl\n"
     ]
    }
   ],
   "source": [
    "# Visaul Genome based MetaShift\n",
    "def load_structured_candidate_subsets():\n",
    "    pkl_save_path = \"./structured-attributes-candidate-subsets.pkl\" \n",
    "    with open(pkl_save_path, \"rb\") as pkl_f:\n",
    "        load_data = pickle.load( pkl_f )\n",
    "        print('pickle load', len(load_data), pkl_save_path)\n",
    "    return load_data\n",
    "\n",
    "structured_VG_node_name_to_img_id = load_structured_candidate_subsets()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following line is equivalent to \n",
    "```py\n",
    "img_IDs = sorted(VG_node_name_to_img_id['cat(orange)'])\n",
    "```\n",
    "that we just saw from \"Part A: Understanding `attributes-candidate-subsets.pkl`\". "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of images in this subset: 107\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "['107962',\n",
       " '2315813',\n",
       " '2318038',\n",
       " '2318872',\n",
       " '2319323',\n",
       " '2320055',\n",
       " '2320210',\n",
       " '2320521',\n",
       " '2321421',\n",
       " '2324716']"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "img_IDs = sorted(structured_VG_node_name_to_img_id['color']['cat']['orange'])\n",
    "print('Number of images in this subset:', len(img_IDs) )\n",
    "img_IDs[:10]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dict_keys(['gray', 'black', 'white', 'brown', 'orange', 'yellow', 'tan', 'dark', 'gold', 'light brown', 'pink', 'red', 'beige', 'green'])"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "structured_VG_node_name_to_img_id['color']['cat'].keys()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "interpreter": {
   "hash": "53a714998b4cda886d88c1f35ca09ebc6db63d3c7248d837ab3cd117369573cd"
  },
  "kernelspec": {
   "display_name": "Python 3.6.13 ('pytorch_p36')",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.13"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
