{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# <div style=\"text-align: center;\"><span align=\"center\" style=\"color: blue\">Constituent Treelib (CTL) Demo</span></div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Necessary imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "from nltk import Tree"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Handle GPU issue"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In case you have a weak GPU (e.g., 2gb), it is highly recomended to use the CPU instead, in order to prevent an out-of-memory exception caused by torch that is used within benepar. To force the CPU usage, we can override the \"cuda.is_available\" function. However, if you have a more powerful GPU, leave the following lines commented out."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import torch \n",
    "# torch.cuda.is_available = lambda : False"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Import the CTL library"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "from constituent_treelib import ConstituentTree, BracketedTree, Language, Structure"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create the NLP pipeline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To instantiate a ConstituentTree object, a spaCy-based NLP pipeline that incorporates a benepar component is mandatory. CTL offers two possibilities to create this pipeline...\n",
    "- ...dynamically by instantiating the ConstituentTree object using the parameter create_pipeline\n",
    "- ...statically by creating the pipeline outside the ConstituentTree constructor via the create_pipeline() method and pass it to the constructor using the nlp parameter\n",
    "\n",
    "Note that the first option is mainly recommended for demo purposes or if you only want to process a single sentence.\n",
    "If, on the other hand, you want to process more than a single sentence and thus instantiate multiple ConstituentTree objects, it is strongly recommended to create the pipeline X outside and invoke it when instantiating the ConstituentTree object via ConstituentTree(nlp=X). \n",
    "\n",
    "In the following both possibilities are shown."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Static NLP pipelined creation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The spaCy model: 'en_core_web_md' was not found. Download is initiated...\n",
      "✔ Download and installation successful\n",
      "You can now load the package via spacy.load('en_core_web_md')\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[nltk_data] Downloading package benepar_en3 to\n",
      "[nltk_data]     D:\\install\\Python39\\share\\nltk_data...\n",
      "[nltk_data]   Unzipping models\\benepar_en3.zip.\n"
     ]
    }
   ],
   "source": [
    "# Define the language for the sentence as well as for the spaCy and benepar models\n",
    "language = Language.English\n",
    "\n",
    "# Define which specific SpaCy model should be used (default is Medium)\n",
    "spacy_model_size = ConstituentTree.SpacyModelSize.Medium\n",
    "\n",
    "# Create the pipeline (note, the required models will be downloaded and installed automatically)\n",
    "nlp = ConstituentTree.create_pipeline(language, spacy_model_size)\n",
    "\n",
    "# In case you want to hide pip installation outputs...\n",
    "# nlp = ConstituentTree.create_pipeline(language, spacy_model_size, quiet=True)\n",
    "\n",
    "# If you already downloaded and installed the models, you can save validation overhead by informing CTL\n",
    "# nlp = ConstituentTree.create_pipeline(language, spacy_model_size, download_models = False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Dynamic NLP pipelined creation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[nltk_data] Downloading package benepar_en3 to\n",
      "[nltk_data]     D:\\install\\Python39\\share\\nltk_data...\n",
      "[nltk_data]   Package benepar_en3 is already up-to-date!\n",
      "You're using a T5TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.\n"
     ]
    },
    {
     "data": {
      "image/svg+xml": [
       "<svg baseProfile=\"full\" height=\"504px\" preserveAspectRatio=\"xMidYMid meet\" style=\"font-family: times, serif; font-weight:normal; font-style: normal; font-size: 16px;\" version=\"1.1\" viewBox=\"0,0,976.0,504.0\" width=\"976px\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:ev=\"http://www.w3.org/2001/xml-events\" xmlns:xlink=\"http://www.w3.org/1999/xlink\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">S</text></svg><svg width=\"18.8525%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NP</text></svg><svg width=\"60.8696%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NN</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">Constituency</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"30.4348%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"39.1304%\" x=\"60.8696%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NN</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">parsing</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"80.4348%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"9.42623%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"78.6885%\" x=\"18.8525%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">VP</text></svg><svg width=\"9.375%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">VBZ</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">focuses</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"4.6875%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"90.625%\" x=\"9.375%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">PP</text></svg><svg width=\"4.5977%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">IN</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">on</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"2.29885%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"95.4023%\" x=\"4.5977%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">S</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">VP</text></svg><svg width=\"12.0482%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">VBG</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">building</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"6.0241%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"20.4819%\" x=\"12.0482%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NP</text></svg><svg width=\"23.5294%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">DT</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">a</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"11.7647%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"41.1765%\" x=\"23.5294%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NN</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">parse</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"44.1176%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"35.2941%\" x=\"64.7059%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NN</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">tree</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"82.3529%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"22.2892%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"67.4699%\" x=\"32.5301%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">S</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">VP</text></svg><svg width=\"12.5%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">VBG</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">using</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"6.25%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"87.5%\" x=\"12.5%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NP</text></svg><svg width=\"8.16327%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">DT</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">a</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"4.08163%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"30.6122%\" x=\"8.16327%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">JJ</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">probabilistic</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"23.4694%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"42.8571%\" x=\"38.7755%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">ADJP</text></svg><svg width=\"42.8571%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NN</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">context</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"21.4286%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"28.5714%\" x=\"42.8571%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">HYPH</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">-</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"57.1429%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"28.5714%\" x=\"71.4286%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">JJ</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">free</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"85.7143%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"60.2041%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"18.3673%\" x=\"81.6327%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NN</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">grammar</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"90.8163%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"56.25%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"66.2651%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"52.2989%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"54.6875%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"58.1967%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"2.45902%\" x=\"97.541%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">.</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">.</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"98.7705%\" y1=\"1.2em\" y2=\"3em\" /></svg>"
      ],
      "text/plain": [
       "<constituent_treelib.core.ConstituentTree at 0x15b0b7564c0>"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sentence_dyn = \"Constituency parsing focuses on building a parse tree using a probabilistic context-free grammar.\"\n",
    "tree_dyn = ConstituentTree(sentence_dyn, create_pipeline=True) \n",
    "\n",
    "# Now the NLP pipeline is integrated within the ConstituentTree object and is ready to use...\n",
    "tree_dyn"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Instead of creating the NLP pipeline automatically via create_pipeline(), you can also create it manually"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "You're using a T5TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.\n"
     ]
    },
    {
     "data": {
      "image/svg+xml": [
       "<svg baseProfile=\"full\" height=\"264px\" preserveAspectRatio=\"xMidYMid meet\" style=\"font-family: times, serif; font-weight:normal; font-style: normal; font-size: 16px;\" version=\"1.1\" viewBox=\"0,0,360.0,264.0\" width=\"360px\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:ev=\"http://www.w3.org/2001/xml-events\" xmlns:xlink=\"http://www.w3.org/1999/xlink\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">S</text></svg><svg width=\"11.1111%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NP</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">PRP</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">You</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"5.55556%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"82.2222%\" x=\"11.1111%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">VP</text></svg><svg width=\"16.2162%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">MD</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">must</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"8.10811%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"83.7838%\" x=\"16.2162%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">VP</text></svg><svg width=\"35.4839%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">VB</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">construct</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"17.7419%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"64.5161%\" x=\"35.4839%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NP</text></svg><svg width=\"60%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">JJ</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">additional</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"30%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"40%\" x=\"60%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NNS</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">pylons</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"80%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"67.7419%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"58.1081%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"52.2222%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"6.66667%\" x=\"93.3333%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">.</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">!</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"96.6667%\" y1=\"1.2em\" y2=\"3em\" /></svg>"
      ],
      "text/plain": [
       "<constituent_treelib.core.ConstituentTree at 0x15b07660c70>"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import spacy\n",
    "import benepar\n",
    "\n",
    "# We use English models by default..\n",
    "spacy_model = \"en_core_web_md\"\n",
    "benepar_model = \"benepar_en3\"  \n",
    "\n",
    "nlp_pipeline = spacy.load(spacy_model, disable=[\"tok2vec\", \"tagger\", \"parser\", \"ner\", \"attribute_ruler\", \"lemmatizer\"])\n",
    "nlp_pipeline.add_pipe(\"sentencizer\")\n",
    "nlp_pipeline.add_pipe(\"benepar\", config={\"model\": benepar_model})\n",
    "\n",
    "sentence = \"You must construct additional pylons!\"\n",
    "tree_using_own_nlp_pipeline  = ConstituentTree(sentence, nlp_pipeline) \n",
    "tree_using_own_nlp_pipeline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define some test sentences"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### English"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "sentences = [\n",
    "    \"You must construct additional pylons!\",\n",
    "    \"I'm on my way to New York while they're flying to Tel Aviv.\",\n",
    "    \"Isaac Asimov was an American writer and professor of biochemistry at Boston University.\",\n",
    "    \"Kubrick's films typically involve expressions of an inner struggle, examined from different perspectives.\",\n",
    "    \"Full Metal Jacket is a war drama film directed and produced by Stanley Kubrick.\",\n",
    "    \"Stanley Getz was an American jazz saxophonist.\",\n",
    "    \"The bridge was unfinished when it collapsed.\",\n",
    "    \"The 2022 season is underway and there are a limited number of Single Game Tickets on sale now!\",\n",
    "    \"And with no Wild Card possibilities for either team, the game is essentially a winner-take-all endeavor.\"\n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### German"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "# sentences = [\n",
    "#     \"In einer Gaspipeline in Litauen hat es eine Explosion gegeben.\",\n",
    "#     \"Für Fragen zu Freiwilligendiensten, nutzen sie bitte unser Forum!\",\n",
    "#     \"Der Künstler verlegt seit 30 Jahren Stolpersteine, die er zur Erinnerung an die Opfer des Nationalsozialismus Häusern platziert.\"\n",
    "#     \"Damit erlangen schützenswürdige Kundendaten in den Geschäfts- und Serviceprozessen der Wertschöpfungskette im Bereich Automotive eine immer größere Bedeutung.\",\n",
    "#     \"Die USA haben mit ihrem Investitionsprogramm für Klimaschutz reichlich Unmut der EU auf sich gezogen.\",\n",
    "#     \"Ebenso empfehlenswert ist das Lesen einer Tageszeitung des Landes.\", \n",
    "#     \"Wie viel wird pro Jahr ungefähr weltweit benötigt?\"\n",
    "# ]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### French"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "# sentences = [\n",
    "#     \"Nous irons plus tard au théâtre.\",\n",
    "#     \"Pablo Ruiz Picasso était un peintre, dessinateur, sculpteur et graphiste espagnol.\", \n",
    "#     \"Découvrez une belle sélection d’évènements pour fêter la nouvelle année en partenariat avec Party.\"\n",
    "# ]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Swedish"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "# sentences = [\n",
    "#     \"Vilken vacker skog!\",\n",
    "#     \"Det var mycket åska och blixtar i går!\",\n",
    "#     \"För den närmaste veckan finns ingen uppenbar risk för fjärrtransport.\"\n",
    "# ]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Polish"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "# sentences = [\n",
    "#     \"W dodatku szczerze wierzy, że w tej wojnie stawką jest istnienie Rosji.\",\n",
    "#     \"Poproszę pięć kilo ziemniaków.\",\n",
    "#     \"Przepraszam, ale nie rozumiem.\"\n",
    "# ]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Hungarian"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "# sentences = [\n",
    "#     \"A pizza tényleg kiváló volt!\",\n",
    "#     \"Vannak kisebb és kiszámíthatatlan kivételek a szabály alól.\",\n",
    "#     \"Ezt azért tesszük, hogy javítsuk és finanszírozzuk szolgáltatásainkat.\" \n",
    "# ]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Chinese "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "# sentences = [\n",
    "#     \"你好吗？\",\n",
    "#     \"很高兴见到你。\",\n",
    "#     \"不好意思， 我没听懂。\",\n",
    "#     \"请再说一遍。\"   \n",
    "# ]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Korean"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "# sentences = [\n",
    "#     \"말을 냇가에 끌고 갈 수는 있어도 억지로 물을 먹일 수는 없다\",\n",
    "#     \"반갑습니다\", \n",
    "#     \"잘 지내세요?\",\n",
    "#     \"그 집은 한국에서 지어졌어요\"\n",
    "# ]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Instantiate a ConstituentTree object"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### ... from a raw sentence"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "You're using a T5TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.\n"
     ]
    },
    {
     "data": {
      "image/svg+xml": [
       "<svg baseProfile=\"full\" height=\"312px\" preserveAspectRatio=\"xMidYMid meet\" style=\"font-family: times, serif; font-weight:normal; font-style: normal; font-size: 16px;\" version=\"1.1\" viewBox=\"0,0,824.0,312.0\" width=\"824px\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:ev=\"http://www.w3.org/2001/xml-events\" xmlns:xlink=\"http://www.w3.org/1999/xlink\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">S</text></svg><svg width=\"14.5631%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NP</text></svg><svg width=\"46.6667%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NNP</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">Isaac</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"23.3333%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"53.3333%\" x=\"46.6667%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NNP</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">Asimov</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"73.3333%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"7.28155%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"82.5243%\" x=\"14.5631%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">VP</text></svg><svg width=\"5.88235%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">VBD</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">was</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"2.94118%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"94.1176%\" x=\"5.88235%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NP</text></svg><svg width=\"47.5%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NP</text></svg><svg width=\"10.5263%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">DT</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">an</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"5.26316%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"26.3158%\" x=\"10.5263%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">JJ</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">American</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"23.6842%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"21.0526%\" x=\"36.8421%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NN</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">writer</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"47.3684%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"13.1579%\" x=\"57.8947%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">CC</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">and</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"64.4737%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"28.9474%\" x=\"71.0526%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NN</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">professor</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"85.5263%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"23.75%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"22.5%\" x=\"47.5%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">PP</text></svg><svg width=\"22.2222%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">IN</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">of</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"11.1111%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"77.7778%\" x=\"22.2222%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NP</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NN</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">biochemistry</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"61.1111%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"58.75%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"30%\" x=\"70%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">PP</text></svg><svg width=\"16.6667%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">IN</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">at</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"8.33333%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"83.3333%\" x=\"16.6667%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NP</text></svg><svg width=\"40%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NNP</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">Boston</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"20%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"60%\" x=\"40%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NNP</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">University</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"70%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"58.3333%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"85%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"52.9412%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"55.8252%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"2.91262%\" x=\"97.0874%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">.</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">.</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"98.5437%\" y1=\"1.2em\" y2=\"3em\" /></svg>"
      ],
      "text/plain": [
       "<constituent_treelib.core.ConstituentTree at 0x15b47875b20>"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tree = ConstituentTree(sentence=sentences[2], nlp=nlp) #, structure=Structure.WithoutTokenLeaves) \n",
    "tree"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### ... from a bracketed tree string (wrapped as a BracketedTree object)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<svg baseProfile=\"full\" height=\"264px\" preserveAspectRatio=\"xMidYMid meet\" style=\"font-family: times, serif; font-weight:normal; font-style: normal; font-size: 16px;\" version=\"1.1\" viewBox=\"0,0,360.0,264.0\" width=\"360px\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:ev=\"http://www.w3.org/2001/xml-events\" xmlns:xlink=\"http://www.w3.org/1999/xlink\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">S</text></svg><svg width=\"11.1111%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NP</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">PRP</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">You</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"5.55556%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"82.2222%\" x=\"11.1111%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">VP</text></svg><svg width=\"16.2162%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">MD</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">must</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"8.10811%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"83.7838%\" x=\"16.2162%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">VP</text></svg><svg width=\"35.4839%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">VB</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">construct</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"17.7419%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"64.5161%\" x=\"35.4839%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NP</text></svg><svg width=\"60%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">JJ</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">additional</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"30%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"40%\" x=\"60%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NNS</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">pylons</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"80%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"67.7419%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"58.1081%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"52.2222%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"6.66667%\" x=\"93.3333%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">.</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">!</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"96.6667%\" y1=\"1.2em\" y2=\"3em\" /></svg>"
      ],
      "text/plain": [
       "<constituent_treelib.core.ConstituentTree at 0x15b1cd5be50>"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bracketed_tree_string = \"(S (NP (PRP You)) (VP (MD must) (VP (VB construct) (NP (JJ additional) (NNS pylons)))) (. !))\"\n",
    "bracketed_tree = BracketedTree(bracketed_tree_string)\n",
    "tree_from_bracketed = ConstituentTree(sentence=bracketed_tree, nlp=nlp)\n",
    "tree_from_bracketed"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### ... from an nltk.Tree object"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<svg baseProfile=\"full\" height=\"264px\" preserveAspectRatio=\"xMidYMid meet\" style=\"font-family: times, serif; font-weight:normal; font-style: normal; font-size: 16px;\" version=\"1.1\" viewBox=\"0,0,360.0,264.0\" width=\"360px\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:ev=\"http://www.w3.org/2001/xml-events\" xmlns:xlink=\"http://www.w3.org/1999/xlink\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">S</text></svg><svg width=\"11.1111%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NP</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">PRP</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">You</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"5.55556%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"82.2222%\" x=\"11.1111%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">VP</text></svg><svg width=\"16.2162%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">MD</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">must</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"8.10811%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"83.7838%\" x=\"16.2162%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">VP</text></svg><svg width=\"35.4839%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">VB</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">construct</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"17.7419%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"64.5161%\" x=\"35.4839%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NP</text></svg><svg width=\"60%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">JJ</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">additional</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"30%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"40%\" x=\"60%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NNS</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">pylons</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"80%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"67.7419%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"58.1081%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"52.2222%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"6.66667%\" x=\"93.3333%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">.</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">!</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"96.6667%\" y1=\"1.2em\" y2=\"3em\" /></svg>"
      ],
      "text/plain": [
       "<constituent_treelib.core.ConstituentTree at 0x15b478636d0>"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "nltk_tree = Tree('S', [Tree('NP', [Tree('PRP', ['You'])]), Tree('VP', [Tree('MD', ['must']), Tree('VP', [Tree('VB', ['construct']), Tree('NP', [Tree('JJ', ['additional']), Tree('NNS', ['pylons'])])])]), Tree('.', ['!'])])\n",
    "tree_from_nltk = ConstituentTree(sentence=nltk_tree, nlp=nlp) \n",
    "tree_from_nltk"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Expand contractions (e.g., *he's* $\\rightarrow$ *he is*, *they're* $\\rightarrow$ *they are*, etc.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<svg baseProfile=\"full\" height=\"456px\" preserveAspectRatio=\"xMidYMid meet\" style=\"font-family: times, serif; font-weight:normal; font-style: normal; font-size: 16px;\" version=\"1.1\" viewBox=\"0,0,672.0,456.0\" width=\"672px\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:ev=\"http://www.w3.org/2001/xml-events\" xmlns:xlink=\"http://www.w3.org/1999/xlink\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">S</text></svg><svg width=\"5.95238%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NP</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">PRP</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">I</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"2.97619%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"90.4762%\" x=\"5.95238%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">VP</text></svg><svg width=\"6.57895%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">VBP</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">am</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"3.28947%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"39.4737%\" x=\"6.57895%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">PP</text></svg><svg width=\"13.3333%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">IN</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">on</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"6.66667%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"86.6667%\" x=\"13.3333%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NP</text></svg><svg width=\"42.3077%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NP</text></svg><svg width=\"54.5455%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">PRP$</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">my</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"27.2727%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"45.4545%\" x=\"54.5455%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NN</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">way</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"77.2727%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"21.1538%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"57.6923%\" x=\"42.3077%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">PP</text></svg><svg width=\"26.6667%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">IN</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">to</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"13.3333%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"73.3333%\" x=\"26.6667%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NP</text></svg><svg width=\"45.4545%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NNP</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">New</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"22.7273%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"54.5455%\" x=\"45.4545%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NNP</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">York</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"72.7273%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"63.3333%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"71.1538%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"56.6667%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"26.3158%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"53.9474%\" x=\"46.0526%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">SBAR</text></svg><svg width=\"17.0732%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">IN</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">while</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"8.53659%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"82.9268%\" x=\"17.0732%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">S</text></svg><svg width=\"17.6471%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NP</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">PRP</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">they</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"8.82353%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"82.3529%\" x=\"17.6471%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">VP</text></svg><svg width=\"17.8571%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">VBP</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">are</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"8.92857%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"82.1429%\" x=\"17.8571%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">VP</text></svg><svg width=\"34.7826%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">VBG</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">flying</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"17.3913%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"65.2174%\" x=\"34.7826%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">PP</text></svg><svg width=\"26.6667%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">IN</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">to</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"13.3333%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"73.3333%\" x=\"26.6667%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NP</text></svg><svg width=\"45.4545%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NNP</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">Tel</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"22.7273%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"54.5455%\" x=\"45.4545%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NNP</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">Aviv</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"72.7273%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"63.3333%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"67.3913%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"58.9286%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"58.8235%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"58.5366%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"73.0263%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"51.1905%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"3.57143%\" x=\"96.4286%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">.</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">.</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"98.2143%\" y1=\"1.2em\" y2=\"3em\" /></svg>"
      ],
      "text/plain": [
       "<constituent_treelib.core.ConstituentTree at 0x15a9263e580>"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tree_expanded_contractions = ConstituentTree(sentence=sentences[1], nlp=nlp, expand_contractions=True)\n",
    "tree_expanded_contractions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Select desired tree structure"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Without token leaves (tree contains now phrasal categories as inner nodes and postags as leaves)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<svg baseProfile=\"full\" height=\"216px\" preserveAspectRatio=\"xMidYMid meet\" style=\"font-family: times, serif; font-weight:normal; font-style: normal; font-size: 16px;\" version=\"1.1\" viewBox=\"0,0,200.0,216.0\" width=\"200px\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:ev=\"http://www.w3.org/2001/xml-events\" xmlns:xlink=\"http://www.w3.org/1999/xlink\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">S</text></svg><svg width=\"20%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NP</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">PRP</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"10%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"68%\" x=\"20%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">VP</text></svg><svg width=\"23.5294%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">MD</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"11.7647%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"76.4706%\" x=\"23.5294%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">VP</text></svg><svg width=\"30.7692%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">VB</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"15.3846%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"69.2308%\" x=\"30.7692%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NP</text></svg><svg width=\"44.4444%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">JJ</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"22.2222%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"55.5556%\" x=\"44.4444%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NNS</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"72.2222%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"65.3846%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"61.7647%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"54%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"12%\" x=\"88%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">.</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"94%\" y1=\"1.2em\" y2=\"3em\" /></svg>"
      ],
      "text/plain": [
       "<constituent_treelib.core.ConstituentTree at 0x15b1cd5beb0>"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tree_without_token_leaves = ConstituentTree(sentences[0], nlp, Structure.WithoutTokenLeaves)\n",
    "tree_without_token_leaves"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Without postag nodes (tree contains now phrasal categories as inner nodes and tokens as leaves)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<svg baseProfile=\"full\" height=\"216px\" preserveAspectRatio=\"xMidYMid meet\" style=\"font-family: times, serif; font-weight:normal; font-style: normal; font-size: 16px;\" version=\"1.1\" viewBox=\"0,0,360.0,216.0\" width=\"360px\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:ev=\"http://www.w3.org/2001/xml-events\" xmlns:xlink=\"http://www.w3.org/1999/xlink\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">S</text></svg><svg width=\"11.1111%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NP</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">You</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"5.55556%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"82.2222%\" x=\"11.1111%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">VP</text></svg><svg width=\"16.2162%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">must</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"8.10811%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"83.7838%\" x=\"16.2162%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">VP</text></svg><svg width=\"35.4839%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">construct</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"17.7419%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"64.5161%\" x=\"35.4839%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NP</text></svg><svg width=\"60%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">additional</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"30%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"40%\" x=\"60%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">pylons</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"80%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"67.7419%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"58.1081%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"52.2222%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"6.66667%\" x=\"93.3333%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">!</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"96.6667%\" y1=\"1.2em\" y2=\"3em\" /></svg>"
      ],
      "text/plain": [
       "<constituent_treelib.core.ConstituentTree at 0x15b47889640>"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tree_without_postag_nodes = ConstituentTree(sentences[0], nlp, Structure.WithoutPostagNodes)\n",
    "tree_without_postag_nodes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Tree representations "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Plot SVG representation of the internal NLTK tree"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<svg baseProfile=\"full\" height=\"312px\" preserveAspectRatio=\"xMidYMid meet\" style=\"font-family: times, serif; font-weight:normal; font-style: normal; font-size: 16px;\" version=\"1.1\" viewBox=\"0,0,824.0,312.0\" width=\"824px\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:ev=\"http://www.w3.org/2001/xml-events\" xmlns:xlink=\"http://www.w3.org/1999/xlink\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">S</text></svg><svg width=\"14.5631%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NP</text></svg><svg width=\"46.6667%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NNP</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">Isaac</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"23.3333%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"53.3333%\" x=\"46.6667%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NNP</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">Asimov</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"73.3333%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"7.28155%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"82.5243%\" x=\"14.5631%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">VP</text></svg><svg width=\"5.88235%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">VBD</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">was</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"2.94118%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"94.1176%\" x=\"5.88235%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NP</text></svg><svg width=\"47.5%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NP</text></svg><svg width=\"10.5263%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">DT</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">an</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"5.26316%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"26.3158%\" x=\"10.5263%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">JJ</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">American</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"23.6842%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"21.0526%\" x=\"36.8421%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NN</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">writer</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"47.3684%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"13.1579%\" x=\"57.8947%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">CC</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">and</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"64.4737%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"28.9474%\" x=\"71.0526%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NN</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">professor</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"85.5263%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"23.75%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"22.5%\" x=\"47.5%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">PP</text></svg><svg width=\"22.2222%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">IN</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">of</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"11.1111%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"77.7778%\" x=\"22.2222%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NP</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NN</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">biochemistry</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"61.1111%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"58.75%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"30%\" x=\"70%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">PP</text></svg><svg width=\"16.6667%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">IN</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">at</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"8.33333%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"83.3333%\" x=\"16.6667%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NP</text></svg><svg width=\"40%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NNP</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">Boston</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"20%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"60%\" x=\"40%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">NNP</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">University</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"70%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"58.3333%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"85%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"52.9412%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"55.8252%\" y1=\"1.2em\" y2=\"3em\" /><svg width=\"2.91262%\" x=\"97.0874%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">.</text></svg><svg width=\"100%\" x=\"0%\" y=\"3em\"><defs /><svg width=\"100%\" x=\"0\" y=\"0em\"><defs /><text text-anchor=\"middle\" x=\"50%\" y=\"1em\">.</text></svg></svg><line stroke=\"black\" x1=\"50%\" x2=\"50%\" y1=\"1.2em\" y2=\"3em\" /></svg><line stroke=\"black\" x1=\"50%\" x2=\"98.5437%\" y1=\"1.2em\" y2=\"3em\" /></svg>"
      ],
      "text/plain": [
       "<constituent_treelib.core.ConstituentTree at 0x15b47875b20>"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tree"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Pretty-print bracketed tree string representation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(S\n",
      "  (NP (NNP Isaac) (NNP Asimov))\n",
      "  (VP\n",
      "    (VBD was)\n",
      "    (NP\n",
      "      (NP (DT an) (JJ American) (NN writer) (CC and) (NN professor))\n",
      "      (PP (IN of) (NP (NN biochemistry)))\n",
      "      (PP (IN at) (NP (NNP Boston) (NNP University)))))\n",
      "  (. .))\n"
     ]
    }
   ],
   "source": [
    "print(tree) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### ASCII art"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "                                          S                                                              \n",
      "        __________________________________|____________________________________________________________   \n",
      "       |                    VP                                                                         | \n",
      "       |           _________|____________________                                                      |  \n",
      "       |          |                              NP                                                    | \n",
      "       |          |                  ____________|________________________________                     |  \n",
      "       |          |                 |                       PP                    PP                   | \n",
      "       |          |                 |                    ___|_______          ____|_____               |  \n",
      "       NP         |                 NP                  |           NP       |          NP             | \n",
      "   ____|____      |    _____________|____________       |           |        |     _____|______        |  \n",
      " NNP       NNP   VBD  DT    JJ      NN    CC     NN     IN          NN       IN  NNP          NNP      . \n",
      "  |         |     |   |     |       |     |      |      |           |        |    |            |       |  \n",
      "Isaac     Asimov was  an American writer and professor  of     biochemistry  at Boston     University  . \n",
      "\n"
     ]
    }
   ],
   "source": [
    "tree.nltk_tree.pretty_print()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### LATEX code"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'\\\\Tree [.S\\n        [.NP [.NNP Isaac ] [.NNP Asimov ] ]\\n        [.VP\\n          [.VBD was ]\\n          [.NP\\n            [.NP\\n              [.DT an ]\\n              [.JJ American ]\\n              [.NN writer ]\\n              [.CC and ]\\n              [.NN professor ] ]\\n            [.PP [.IN of ] [.NP [.NN biochemistry ] ] ]\\n            [.PP [.IN at ] [.NP [.NNP Boston ] [.NNP University ] ] ] ] ]\\n        [.. . ] ]'"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tree.nltk_tree.pformat_latex_qtree()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Export visualization"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "CTL relies on two two open-source tools to export the generated constituent tree into the following file formats:\n",
    "\n",
    "1.) **PDF**: For this the command line tool *wkhtmltopdf* is required: https://wkhtmltopdf.org/downloads.html <br>\n",
    "Once downloaded and installed, the path to the wkhtmltopdf binary must be passed to the export function. In case of a Windows OS, an attempt is made to locate the path of the wkhtmltopdf binary by looking up the default installation directory (\"Program Files/wkhtmltopdf\"). \n",
    "\n",
    "2.) **JPG, PNG, GIF, BMP, EPS, PSD, TIFF and YAML:** For these the software suite *ImageMagick* is required: https://imagemagick.org/script/download.php <br> Here, an attempt is made to locate the path of the software suite by looking up the environment variables. Hence, an explicit path to the binary is not required."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "PDF-file successfully saved to: my_tree.pdf\n"
     ]
    }
   ],
   "source": [
    "tree.export_tree(destination_filepath='my_tree.pdf', verbose=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Extract content fron tree leaves"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Text tokens"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Isaac Asimov was an American writer and professor of biochemistry at Boston University .'"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tree.leaves(tree.nltk_tree, ConstituentTree.NodeContent.Text)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### POS tags"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'NNP NNP VBD DT JJ NN CC NN IN NN IN NNP NNP .'"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tree.leaves(tree.nltk_tree, ConstituentTree.NodeContent.Pos) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Combination of both"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Isaac_NNP Asimov_NNP was_VBD an_DT American_JJ writer_NN and_CC professor_NN of_IN biochemistry_NN at_IN Boston_NNP University_NNP ._.'"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tree.leaves(tree.nltk_tree, ConstituentTree.NodeContent.Combined)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Extract phrases"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Phrasal categories "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'NP', 'PP', 'S', 'VP'}"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tree.extract_all_phrasal_categories()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### All phrases (including nested)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "NP ['Isaac Asimov', 'an American writer and professor of biochemistry at Boston University', 'an American writer and professor', 'Boston University']\n",
      "VP ['was an American writer and professor of biochemistry at Boston University']\n",
      "S ['Isaac Asimov was an American writer and professor of biochemistry at Boston University .']\n",
      "PP ['of biochemistry', 'at Boston University']\n"
     ]
    }
   ],
   "source": [
    "all_phrases = tree.extract_all_phrases(avoid_nested_phrases=False, min_words_in_phrases=1)\n",
    "for phrasal_category, phrases in all_phrases.items():\n",
    "    print(phrasal_category, phrases)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### All phrases (avoiding nested)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "NP ['an American writer and professor of biochemistry at Boston University', 'Isaac Asimov']\n",
      "VP ['was an American writer and professor of biochemistry at Boston University']\n",
      "S ['Isaac Asimov was an American writer and professor of biochemistry at Boston University .']\n",
      "PP ['at Boston University', 'of biochemistry']\n"
     ]
    }
   ],
   "source": [
    "all_phrases = tree.extract_all_phrases(avoid_nested_phrases=True, min_words_in_phrases=1)\n",
    "for phrasal_category, phrases in all_phrases.items():\n",
    "    print(phrasal_category, phrases)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Only noun phrases"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['an American writer and professor of biochemistry at Boston University', 'Isaac Asimov']\n"
     ]
    }
   ],
   "source": [
    "phrases = tree.extract_all_phrases(avoid_nested_phrases=True)\n",
    "noun_phrases = phrases['NP']\n",
    "\n",
    "print(noun_phrases)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### All phrases of the tree without token leaves"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "NP ['JJ NNS']\n",
      "VP ['MD VB JJ NNS', 'VB JJ NNS']\n",
      "S ['PRP MD VB JJ NNS .']\n"
     ]
    }
   ],
   "source": [
    "all_phrases = tree_without_token_leaves.extract_all_phrases(avoid_nested_phrases=False, min_words_in_phrases=1)\n",
    "for phrasal_category, phrases in all_phrases.items():\n",
    "    print(phrasal_category, phrases)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.13"
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
