{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "cfd4a8e5",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "OWT2:\n",
      " # Vocab: 50257\n",
      " # Documents: 9127994\n",
      " - train - 8762874\n",
      " - test - 273840\n",
      " - validation - 91280\n"
     ]
    }
   ],
   "source": [
    "import tensorflow as tf\n",
    "from tensorflow import keras\n",
    "\n",
    "from lmu_transformer_model import create_model\n",
    "from LSTM_model import create_model_lstm\n",
    "from microsoft_nlp import OWT2, paths\n",
    "import numpy as np\n",
    "\n",
    "# Define dataset variables\n",
    "vocab_size = 50257\n",
    "seq_len = 1024  # Max sequence size\n",
    "batch_size = 16\n",
    "\n",
    "# Import data\n",
    "owt2 = OWT2()\n",
    "print(owt2)\n",
    "splits = owt2.get_splits(seq_len=seq_len, batch_size=batch_size)\n",
    "assert vocab_size == len(owt2.vocab)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "7a6c7c9d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Model: \"lstm-80-80-2\"\n",
      "_________________________________________________________________\n",
      "Layer (type)                 Output Shape              Param #   \n",
      "=================================================================\n",
      "input_1 (InputLayer)         [(None, 1024)]            0         \n",
      "_________________________________________________________________\n",
      "embedding (Embedding)        (None, 1024, 80)          4020560   \n",
      "_________________________________________________________________\n",
      "lstm (LSTM)                  (None, 1024, 80)          51520     \n",
      "_________________________________________________________________\n",
      "lstm_1 (LSTM)                (None, 1024, 80)          51520     \n",
      "_________________________________________________________________\n",
      "dense_transpose (DenseTransp (None, 1024, 50257)       4020560   \n",
      "=================================================================\n",
      "Total params: 4,123,600\n",
      "Trainable params: 4,123,600\n",
      "Non-trainable params: 0\n",
      "_________________________________________________________________\n"
     ]
    }
   ],
   "source": [
    "embed_dim = hidden_dim = 80\n",
    "n_layers = 2\n",
    "weight_tying = True\n",
    "\n",
    "model = create_model_lstm(\n",
    "    embed_dim,\n",
    "    hidden_dim,\n",
    "    n_layers,\n",
    "    vocab_size,\n",
    "    seq_len,\n",
    "    batch_size,\n",
    "    weight_tying,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "919599cf",
   "metadata": {},
   "outputs": [],
   "source": [
    "model.load_weights(paths.weights / \"lstm-80-80-2.hdf5\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "9e4f7317",
   "metadata": {},
   "outputs": [],
   "source": [
    "sample = owt2[200000]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "c687f4fb",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "By Jonathan Amos\n",
      "\n",
      "Science reporter, BBC News\n",
      "\n",
      "Nasa's Icesat spacecraft tracks changes in ice sheet height Greenland and parts of Antarctica are losing large volumes of ice to the oceans as their glaciers get thinner, a Nasa satellite has revealed. Many glaciers have increased their flow rates in recent years, and the Icesat mission now allows scientists to measure their thickness in detail. A UK team studying the data told the journal Nature that the findings had implications for future sea-level rise. A full melt of the Greenland ice would push sea level up by about 7m (20ft). The extent of \"dynamic thinning\", observed by the satellite, has been a major source of uncertainty in projections of sea-level rise. \"All of the glaciers that are changing rapidly are ones that flow into the sea,\" said Hamish Pritchard from the British Antarctic Survey (BAS). \"The fact that they end in the sea means a buoyancy effect is working on them,\" he told BBC News. \"Normally, they're heavy things and they rest on the sea-bed and friction slows them down. But as you start to thin glaciers, they start to float off the sea-bed more and more; there's less friction and the glaciers can speed up.\" The swiftness with which some of the glaciers now move towards the sea far outstrips the rate at which ice can be restored to the land through precipitation. As a consequence, these glaciers are shown in the Icesat data to be falling in height - some dramatically so. Fore example, the giant Pine Island and Thwaites Glaciers in the West Antarctic are thinning by up to nine metres per year. The Pine Island Glacier is thinning by up to several metres per year The US space agency satellite uses a laser altimeter to measure the elevation of the ice sheets. It can acquire high-resolution data, especially in the steeply sloping coastal areas, where the radar altimeters flown on many other Earth observation satellites struggle to see important details. Dr Pritchard and colleagues looked through six years of data to assess the behaviour of the ice along the entire margins of the two continents. They monitored fast-moving and nearby \"stagnant\" ice streams at the same altitude to try to distinguish between the various possible causes of thinning. The findings re-affirm what many suspect - that the reduced elevation of these glaciers is not the result of changes in precipitation or melt, but the increased speed at which they now move. \"If these were changes solely related to atmospheric conditions, either changes in temperature or snowfall, we would expect the glaciers and the nearby ice to be showing the same signal; and they don't,\" explained Professor David Vaughan, also at BAS. \"The glaciers are thinning because they are speeding up. Because they're flowing more rapidly to the ocean, they are discharging more ice into the ocean than is being replaced by new snowfall. The balance is loss.\" The researchers tell Nature that this dynamic thinning, as they call it, now covers all latitudes in Greenland. Of 111 fast-moving Greenland glaciers studied, 81 were shown to be thinning at twice the rate as the slow-moving ice beside them. In Antarctica, the picture is more complicated, with Icesat reporting significant ice growth in places. Nonetheless, the researchers say, thinning has intensified at key locations where ice streams enter the ocean. Some of the most graphic examples come from the Antarctic Peninsula, a region that is known to be warming faster than the rest of the continent. Little thinning was seen on the massive, cold East Antarctic ice sheet, while the West Antarctic ice showed a mixed picture. In many places in both Antarctica and Greenland, glaciers are being confronted by warmer waters which are eroding their fronts. The break-up of floating ice shelves that would normally constrict glacier flow has also contributed to the observed acceleration. And in some regions, increased air temperatures are having an effect. The team says better tools are needed to understand how changes in glacier behaviour will impact sea level. \"One of the big issues in glaciology is trying to get the right models to predict the ice sheets and to explain these observations,\" said Dr Pritchard. \"This is a set of measurements the modellers can use to improve their models and to predict future sea-level rise.\" Jonathan.Amos-INTERNET@bbc.co.uk\n",
      "\n",
      "\n",
      "\n",
      "Bookmark with: Delicious\n",
      "\n",
      "Digg\n",
      "\n",
      "reddit\n",
      "\n",
      "Facebook\n",
      "\n",
      "StumbleUpon What are these? E-mail this to a friend Printable version<|endoftext|>\n"
     ]
    }
   ],
   "source": [
    "print(owt2.tokenizer.decode(sample))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "2c85cf79",
   "metadata": {},
   "outputs": [],
   "source": [
    "x = np.zeros((1, seq_len), dtype=np.uint64)\n",
    "l = min(seq_len, len(sample))\n",
    "x[:, :l] = sample\n",
    "y = model.predict(x).squeeze(axis=0)\n",
    "\n",
    "assert y.shape == (seq_len, vocab_size)\n",
    "y_ranked = np.argsort(y, axis=1)\n",
    "y_top = [y_ranked[i, -1] for i in range(l)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "0cb05a2d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "-�,\n",
      "The is, who News,\n",
      "The., \"-,'s is the in the,,., the of the. the the quantities of the- the ocean. well planet are the than and global warming, been that\n",
      " of are been by own and of the years. and the mostOTA of­ is is us to develop the own. the.\n",
      " new- is the Antarctic is the researchers Nature, the researchers are been for the research-level temperatures.\n",
      " few-- the ocean ice sheet be the ice of to the a, and1%)\n",
      " first of theTheune energyitudes- \" in the researchers, and been a major factor of climate. the of the-like temperatures.\n",
      "The of the most are are the the, the that are in the world ice says thei,.a, the Antarctic Geological Institute ofC.)\n",
      "The first that the are the the world, that lotant of of a to the. says says the's. \"We, we are going, that we areructure the planet ofhigh temperatures theless down into. It we we are to see,, it are to be, the surface icelike temperatures than more energy it are a than than the atmosphere are be up the\n",
      " researchers- of the is of the most are have into the world of. ofpping. ocean of the the- be used. the world of the.\n",
      " the result of the are are not in the worldOTA,,. the a in the. and of, far\n",
      "sts is the most,-, theiru is,iers of the world Antarctica ice the,, the to the years. day.\n",
      " first's is is a,, the to the times of day. World coast is,, a new--, the the surface of the ice..\n",
      "'s be a-speed images, and in the world surface surfaceimmer, surface of and the ocean is- is to the other continents... to develop the temperatures.\n",
      "..a, the are at the- of the and the the surface of the ocean- the surface planet of the world-.\n",
      " were the-moving vessels solar areasTheakeine\" of sheet. the end time of the to develop between the ocean parts species of the ice,\n",
      " researchers of-releasedished the the of was the was first temperature of the species were the the most of the. the. thewater and the most winds of the the are have on\n",
      "The you species the, on to the temperatures, the, the the, the-. the have have to most to the most temperature- the the up surface.. the the are't have he the of,, who said theICcomThe first are the,, of are the up to It of are not to than, the world, it are notarmed the than than the world, the the a by the oil.. It most of the of\n",
      " researchers said the's the is is- is the well are the, is, the ofitudes. the,\n",
      " course,,cross temperatures ice,, the% the in be a, in the. first of well first-water temperatures- the.\n",
      " the, the ice of the than than and the-,,. temperatures-. the that\n",
      ", the first are that \" temperatures the been, the temperatures, the- are the ice.\n",
      " of the most important models of from the world ice, and global of is the as be the up than the most of the world.\n",
      " is,, a in the surface surface low ice- ice ice, and the most- ice ice the lot- of The the ways, the the, the, the are the used with the temperatures. are theoding. oceans.\n",
      " ice isup is the glaciers, are are be beraining ice. been been to the global temperature of\n",
      " the the ways, the temperatures pollution, more to impact on\n",
      " first is that than are more to be the much are the is. be the temperature.\n",
      "The of the most things of theially, the to achieve the most to to be the growth-, the the the problems. he the..a,\n",
      "We is the lot of the, world­ies are be the do the lives. the be the of temperatureslevel temperatures.\n",
      "'s \"m,likeNET,latm.com.uk.\n",
      "\n",
      "Themark with the Delicious\n",
      "\n",
      "Digg\n",
      "\n",
      "reddit\n",
      "\n",
      "Facebook\n",
      "\n",
      "FacebookumbleUpon What are you things\n",
      "-mail your article the few ofable! ofThe\n"
     ]
    }
   ],
   "source": [
    "print(owt2.tokenizer.decode(y_top))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "1e243c8e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "14715/14715 [==============================] - 1629s 108ms/step - loss: 5.0381\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "5.037589073181152"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "test_loss = model.evaluate(splits[\"test\"])\n",
    "test_loss"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Microsoft NLP",
   "language": "python",
   "name": "microsoft-nlp"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
