{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "verified-temple",
   "metadata": {
    "tags": []
   },
   "source": [
    "# DRILL Notebook\n",
    "This is a jupyter notebook file to execute [DRILL](ontolearn.learners.drill) and generate predictive results. If you have not done it already, from the main directory \"Ontolearn\", run the commands for Datasets mentioned [here](https://ontolearn-docs-dice-group.netlify.app/usage/02_installation#download-external-files) to download the datasets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "sustainable-poland",
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "import numpy as np\n",
    "from ontolearn.knowledge_base import KnowledgeBase\n",
    "from ontolearn.learners import Drill\n",
    "from ontolearn.learning_problem import PosNegLPStandard\n",
    "from owlapy.owl_individual import OWLNamedIndividual, IRI\n",
    "from ontolearn.metrics import F1\n",
    "from sklearn.model_selection import StratifiedKFold\n",
    "from ontolearn.utils.static_funcs import compute_f1_score\n",
    "from owlapy.render import DLSyntaxObjectRenderer"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "happy-colorado",
   "metadata": {},
   "source": [
    "Open `uncle_lp.json` where we have stored the learning problem for the concept of 'Uncle' and the path to the 'family' ontology."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "buried-miami",
   "metadata": {},
   "outputs": [],
   "source": [
    "with open('uncle_lp.json') as json_file:\n",
    "    settings = json.load(json_file)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "refined-yellow",
   "metadata": {},
   "source": [
    "Create an instance of the class `KnowledeBase` by using the path that is stored in `settings`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "outdoor-player",
   "metadata": {},
   "outputs": [],
   "source": [
    "kb = KnowledgeBase(path=settings['data_path'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fabulous-sucking",
   "metadata": {},
   "source": [
    "Retreive the IRIs of the positive and negative examples of Uncle from `settings` and create an instance of `StratifiedKFold` so that we can create a train and a test set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "right-organizer",
   "metadata": {},
   "outputs": [],
   "source": [
    "examples = settings['Uncle']\n",
    "p = set(examples['positive_examples'])\n",
    "n = set(examples['negative_examples'])\n",
    "\n",
    "kf = StratifiedKFold(n_splits=10, shuffle=True, random_state=1)\n",
    "X = np.array(p + n)\n",
    "Y = np.array([1.0 for _ in p] + [0.0 for _ in n])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "earlier-peripheral",
   "metadata": {},
   "source": [
    "Create a model of [DRILL](ontolearn.learners.drill)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "permanent-alabama",
   "metadata": {},
   "outputs": [],
   "source": [
    "model = Drill(knowledge_base=kb, path_pretrained_kge=\"../embeddings/ConEx_Family/ConEx_entity_embeddings.csv\",\n",
    "                  quality_func=F1(), max_runtime=10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c23ee156",
   "metadata": {},
   "source": [
    "1. For each training/testing set create a learning problem of type `PosNegLPStandard`.\n",
    "2. Fit the training learning problem to the drill model and retrieve the top predicion.\n",
    "3. Compute the F1 score of the prediction on the train and test sets.\n",
    "4. Print the prediction together with the quality."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "263df5aa-a8c6-466a-9cb0-d82125b6a852",
   "metadata": {},
   "outputs": [],
   "source": [
    "for (ith, (train_index, test_index)) in enumerate(kf.split(X, Y)):\n",
    "    #  (1)\n",
    "    train_pos = {pos_individual for pos_individual in X[train_index][Y[train_index] == 1]}\n",
    "    train_neg = {neg_individual for neg_individual in X[train_index][Y[train_index] == 0]}\n",
    "    test_pos = {pos_individual for pos_individual in X[test_index][Y[test_index] == 1]}\n",
    "    test_neg = {neg_individual for neg_individual in X[test_index][Y[test_index] == 0]}\n",
    "    train_lp = PosNegLPStandard(pos=set(map(OWLNamedIndividual, map(IRI.create, train_pos))),\n",
    "                                neg=set(map(OWLNamedIndividual, map(IRI.create, train_neg))))\n",
    "\n",
    "    test_lp = PosNegLPStandard(pos=set(map(OWLNamedIndividual, map(IRI.create, test_pos))),\n",
    "                               neg=set(map(OWLNamedIndividual, map(IRI.create, test_neg))))\n",
    "    \n",
    "    #  (2)\n",
    "    pred_drill = model.fit(train_lp).best_hypotheses(n=1)\n",
    "\n",
    "    #  (3)\n",
    "    train_f1_drill = compute_f1_score(individuals={i for i in kb.individuals(pred_drill.concept)},\n",
    "                                              pos=train_lp.pos,\n",
    "                                              neg=train_lp.neg)\n",
    "    test_f1_drill = compute_f1_score(individuals={i for i in kb.individuals(pred_drill.concept)},\n",
    "                                     pos=test_lp.pos,\n",
    "                                     neg=test_lp.neg)\n",
    "    \n",
    "    #  (4)\n",
    "    print(f\"Prediction: {DLSyntaxObjectRenderer().render(pred_drill.concept)} |\"\n",
    "          f\"Train Quality: {train_f1_drill:.3f} |\"\n",
    "          f\"Test Quality: {test_f1_drill:.3f} \\n\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.18"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
