{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Create a Graph from a text"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This notebook demonstrates how to extract graph from any text using the graph maker\n",
    "\n",
    "Steps:\n",
    "- Define an Ontology\n",
    "- Load a list of example text chunks. We will use the Lord of the Rings summary from this wikipedia page. \n",
    "- Create Graph using an Open source model using Groq APIs. \n",
    "- Save the graph to Neo4j db\n",
    "- Visualise\n",
    "\n",
    "\n",
    "\n",
    "Loading the graph maker functions ->"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from knowledge_graph_maker import GraphMaker, Ontology, GroqClient, OpenAIClient\n",
    "from knowledge_graph_maker import Document"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Define the Ontology. \n",
    "\n",
    "The ontology is a pydantic model with the following schema. \n",
    "\n",
    "```python\n",
    "class Ontology(BaseModel):\n",
    "    entities: List[Union[str, Dict]]\n",
    "    relationships: List[str]\n",
    "```\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For example lets use summaries of the LOTR books from the Wikipedia page. I have copied it into a file for easy import"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "15"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from lotr_wikipedia_summary import lord_of_the_rings_wikipedia_summary as example_text_list\n",
    "len(example_text_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here is the ontology we will use for the LOTR summaries ->"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "ontology = Ontology(\n",
    "    entities=[\n",
    "        {\"Person\": \"Person name without any adjectives, Remember a person may be referenced by their name or using a pronoun\"},\n",
    "        {\"Object\": \"Do not add the definite article 'the' in the object name\"},\n",
    "        {\"Event\": \"Event event involving multiple people. Do not include qualifiers or verbs like gives, leaves, works etc.\"},\n",
    "        \"Place\",\n",
    "        \"Document\",\n",
    "        \"Organisation\",\n",
    "        \"Action\",\n",
    "        {\"Miscellaneous\": \"Any important concept can not be categorised with any other given entity type\"},\n",
    "    ],\n",
    "    relationships=[\n",
    "        \"Relation between any pair of Entities\"\n",
    "        ],\n",
    ")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Select a Model\n",
    "\n",
    "Groq support the following models at present. \n",
    "\n",
    "*LLaMA3 8b*\n",
    "Model ID: llama3-8b-8192\n",
    "\n",
    "*LLaMA3 70b*\n",
    "Model ID: llama3-70b-8192\n",
    "\n",
    "*Mixtral 8x7b*\n",
    "Model ID: mixtral-8x7b-32768\n",
    "\n",
    "*Gemma 7b*\n",
    "Model ID: gemma-7b-it\n",
    "\n",
    "\n",
    "Selecting a model for this example ->\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "## Groq models\n",
    "model = \"mixtral-8x7b-32768\"\n",
    "# model =\"llama3-8b-8192\"\n",
    "# model = \"llama3-70b-8192\"\n",
    "# model=\"gemma-7b-it\"\n",
    "\n",
    "## Open AI models\n",
    "oai_model=\"gpt-3.5-turbo\"\n",
    "\n",
    "## Use Groq\n",
    "# llm = GroqClient(model=model, temperature=0.1, top_p=0.5)\n",
    "## OR Use OpenAI\n",
    "llm = OpenAIClient(model=oai_model, temperature=0.1, top_p=0.5)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create documents out of text chumks. \n",
    "Documents is a pydantic model with the following schema \n",
    "\n",
    "```python\n",
    "class Document(BaseModel):\n",
    "    text: str\n",
    "    metadata: dict\n",
    "```\n",
    "\n",
    "The metadata we add to the document here is copied to every relation that is extracted out of the document. More often than not, the node pairs have multiple relation with each other. The metadata helps add more context to these relations\n",
    "\n",
    "In this example I am generating a summary of the text chunk, and the timestamp of the run, to be used as metadata. \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "import datetime\n",
    "current_time = str(datetime.datetime.now())\n",
    "\n",
    "\n",
    "graph_maker = GraphMaker(ontology=ontology, llm_client=llm, verbose=False)\n",
    "\n",
    "def generate_summary(text):\n",
    "    SYS_PROMPT = (\n",
    "        \"Succintly summarise the text provided by the user. \"\n",
    "        \"Respond only with the summary and no other comments\"\n",
    "    )\n",
    "    try:\n",
    "        summary = llm.generate(user_message=text, system_message=SYS_PROMPT)\n",
    "    except:\n",
    "        summary = \"\"\n",
    "    finally:\n",
    "        return summary\n",
    "\n",
    "\n",
    "docs = map(\n",
    "    lambda t: Document(text=t, metadata={\"summary\": generate_summary(t), 'generated_at': current_time}),\n",
    "    example_text_list\n",
    ")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create Graph\n",
    "Finally run the Graph Maker to generate graph. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "graph = graph_maker.from_documents(\n",
    "    list(docs), \n",
    "    delay_s_between=0 ## delay_s_between because otherwise groq api maxes out pretty fast. \n",
    "    ) \n",
    "print(\"Total number of Edges\", len(graph))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for edge in graph:\n",
    "    print(edge.model_dump(exclude=['metadata']), \"\\n\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Save the Graph to Neo4j "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n",
      "aenter called\n",
      "aexit called\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "149"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from knowledge_graph_maker import Neo4jGraphModel\n",
    "\n",
    "create_indices = False\n",
    "neo4j_graph = Neo4jGraphModel(edges=graph, create_indices=create_indices)\n",
    "\n",
    "neo4j_graph.save()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}