Kokoyi allows you to program a model as if you write the math underlying the model: precise and compact. In the current release we have included a lot of models: MLP/CNN classifiers, seq2seq model (including the popular Transformer), variants of Reinforcement Learning (policy gradient and Deep Q-Learning), variational auto-encoder (VAE), and GAN, with more to come in the future. Be sure to check them out.
Before diving into the model implementation, let us first get familiar with how to write math equations in the style of LaTeX, which you can easily adapt to write a Kokoyi model. In order to do so, this notebook is designed as a series of mini-exercises.
Assuming you have installed Kokoyi plug-in, you can start programming with LaTeX syntax right away. Otherwise, you may want to check out LaTeX math and equations for a quick intro. LaTeX supports a wide range of mathematical symbols, letters, fonts, accents, etc, which are all available in Kokoyi.
Suppose we wish to have three variables $\phi \gets 1$, $x_{first} \gets \frac{1}{2}$, $\hat{x} \gets x_{first} + 2$. Defining them in Kokoyi is a snap. First, double click here and copy everything between the math delimiters \$. Second, paste them to the box below and append each definition with a newline symbol \\
.In Jupyter Notebook, a code cell with a %kokoyi
leading line is treated as a Kokoyi cell while the others are all normal Python cells.
%kokoyi
% Please enter your answer in this box.
\phi \gets 1 \\
x_{first} \gets \frac{1}{2} \\
\hat{x} \gets x_{first} + 2 \\
ANTLR runtime and generated code versions disagree: 4.8!=4.7.2 ANTLR runtime and generated code versions disagree: 4.8!=4.7.2
\phi \gets 1 \\ x_{first} \gets \frac{1}{2} \\ \hat{x} \gets x_{first} + 2 \\
In Kokoyi, \gets
defines the the left side variable with the right side expression; Kokoyi statements need to be ended with a newline symbol \\
, just like semicolon ;
in C/C++.
That's it! And you will notice that what you have typed will automatically be displayed as math equations. The correct answer should look like this: $ \phi \gets 1 \\ x_{first} \gets \frac{1}{2} \\ \hat{x} \gets x_{first} + 2 \\ $
Let's do something fancier; note how you can insert comments for readability:
%kokoyi
\theta \gets (0, (1, 2)) \\
\Comment{define multiple variables together} \\
(x, (y, z)) \gets \theta \\
(\lambda, \mu) \gets (0, 1) \\
ANTLR runtime and generated code versions disagree: 4.8!=4.7.2 ANTLR runtime and generated code versions disagree: 4.8!=4.7.2
Kokoyi uses a dictionary kokoyi.symbol
as a symbol table. Once you defined a variable x
, you can access it in kokoyi.symbol['x']
:
print(kokoyi.symbol[r'\phi'])
print(kokoyi.symbol[r'x_{first}'])
print(kokoyi.symbol[r'\hat{x}'])
print(kokoyi.symbol[r'\theta'])
tensor(1) tensor(0.5000) tensor(2.5000) (tensor(0), (tensor(1), tensor(2)))
Array (or multi-dimensional Tensor) describes a collection of elements of the same type. This is probably one of the most useful data abstraction; for instance, we use it to model sequence data and build modules with stacked submodules.
We adopt a common convention to express a collection of elements mathematically with the \{ element \} ^ { shape }
syntax, displayed as $\{ element \} ^ {shape}$ (the backslashes before the braces are necessary because brace is a special symbol in LaTeX). Try to use this syntax to define a constant array $x$ of value 2 with shape $3 \times 5$ in the box below (use \times
to add dimensions:
%kokoyi
% Please enter your answer in this box.
x \gets \{2\}^{3\times 5} \\
ANTLR runtime and generated code versions disagree: 4.8!=4.7.2 ANTLR runtime and generated code versions disagree: 4.8!=4.7.2
x \gets \{2\}^{3\times 5} \\
\GetShape
returns the shape of a tensor or array, whereas a pair of |
is a shortcut that returns dimension 0, which is the length of an array.
%kokoyi
S \gets \GetShape(x) \Comment{Get the size of an array} \\
L \gets |x| \Comment{Get the length of the first dimension} \\
ANTLR runtime and generated code versions disagree: 4.8!=4.7.2 ANTLR runtime and generated code versions disagree: 4.8!=4.7.2
print(kokoyi.symbol['x'])
print(kokoyi.symbol['S'])
print(kokoyi.symbol['L'])
tensor([[2, 2, 2, 2, 2], [2, 2, 2, 2, 2], [2, 2, 2, 2, 2]]) (tensor(3), tensor(5)) tensor(3)
Array elements can of course be non-constant. For instance, the value may depend on the index This brings up syntax like \{ element-expr \}_{index-lower-bound}^{index-upper-bound}
. For example, $y \gets \{i\}_{i=0}^{4}$ defines an array of integer ranging from 0 to 4 (both inclusive). Try to define a new array $y^{even}$ that contains even integers within ten in the box below:
%kokoyi
% Please enter your answer in this box.
y^{even} \gets \{2 * i\}_{i=0}^{5} \\
ANTLR runtime and generated code versions disagree: 4.8!=4.7.2 ANTLR runtime and generated code versions disagree: 4.8!=4.7.2
y^{even} \gets \{2 * i\}_{i=0}^{5} \\
print(kokoyi.symbol['y^{even}'])
tensor([ 0, 2, 4, 6, 8, 10])
Array expressions can be nested to compose a high-dimensional matrix (or called tensor). Try in the box below to define a $5\times5$ Hilbert matrix $H$, where each element is $\frac{1}{i + j -1}$ and $i, j$ are row and column indexes.
%kokoyi
% Please enter your answer in this box.
H \gets \{ \{ \frac{1}{i+j-1} \}_{j=1}^{5}\}_{i=1}^{5} \\
ANTLR runtime and generated code versions disagree: 4.8!=4.7.2 ANTLR runtime and generated code versions disagree: 4.8!=4.7.2
H \gets \{ \{ \frac{1}{i+j-1} \}_{j=1}^{5}\}_{i=1}^{5} \\
print(kokoyi.symbol['H'])
tensor([[1.0000, 0.5000, 0.3333, 0.2500, 0.2000], [0.5000, 0.3333, 0.2500, 0.2000, 0.1667], [0.3333, 0.2500, 0.2000, 0.1667, 0.1429], [0.2500, 0.2000, 0.1667, 0.1429, 0.1250], [0.2000, 0.1667, 0.1429, 0.1250, 0.1111]])
You can concatenate two arrays $a$ and $b$ with a||b
in Kokoyi; contatenation happens on the 1st dimension of the arrays (and tensors too). Let's use this to concat two $H$ into $H_2$ and check the shape.
%kokoyi
% Please enter your answer in this box.
H_2 \gets H || H \\
size_{H_2} \gets \GetShape(H_2) \\
ANTLR runtime and generated code versions disagree: 4.8!=4.7.2 ANTLR runtime and generated code versions disagree: 4.8!=4.7.2
H_2 \gets H || H \\ size_{H_2} \gets \GetShape(H_2) \\
print(kokoyi.symbol['H_2'])
print(kokoyi.symbol['size_{H_2}'])
tensor([[1.0000, 0.5000, 0.3333, 0.2500, 0.2000], [0.5000, 0.3333, 0.2500, 0.2000, 0.1667], [0.3333, 0.2500, 0.2000, 0.1667, 0.1429], [0.2500, 0.2000, 0.1667, 0.1429, 0.1250], [0.2000, 0.1667, 0.1429, 0.1250, 0.1111], [1.0000, 0.5000, 0.3333, 0.2500, 0.2000], [0.5000, 0.3333, 0.2500, 0.2000, 0.1667], [0.3333, 0.2500, 0.2000, 0.1667, 0.1429], [0.2500, 0.2000, 0.1667, 0.1429, 0.1250], [0.2000, 0.1667, 0.1429, 0.1250, 0.1111]]) (tensor(10), tensor(5))
Sequence data is the most useful when there are dependencies. For example, a language model computes the probability $p(s)$ of a sentence $s = \{x_1, x_2, ..., x_T\}$ by factorizing it as a product of a series of conditional probabilities: $p(s) = \prod_{t=1}^T p(x_t|x_{<t})$.
Indexing: This brings the issue of expressing indexing (slicing). There is no standard way to do indexing in LaTeX. Kokoyi draws inspiration from programming languages such as Python and uses the succinct syntax with brackets (e.g., array[index]
): you code A[i]
to get the $i^{th}$ element, and it will be displayed as $A_{[i]}$. Note that array elements are still indexed from zero in Kokoyi.
Try in the box below to define an array $\hat{y}^{even}$ by transforming the array $y \gets \{i\}_{i=0}^{4}$.
%kokoyi
% Please enter your answer in this box.
y \gets \{i\}_{i=0}^{4} \\
\hat{y}^{even} \gets \{2 * y[i]\}_{i=0}^{4} \\
ANTLR runtime and generated code versions disagree: 4.8!=4.7.2 ANTLR runtime and generated code versions disagree: 4.8!=4.7.2
y \gets \{i\}_{i=0}^{4} \\ \hat{y}^{even} \gets \{2 * y[i]\}_{i=0}^{4} \\
Slicing. Use A[i:j]
to slice elements A[i], A[i+1], ... A[j-1]
:
%kokoyi
y_{slice} \gets y[0:2] \\
ANTLR runtime and generated code versions disagree: 4.8!=4.7.2 ANTLR runtime and generated code versions disagree: 4.8!=4.7.2
print(kokoyi.symbol['y_{slice}'])
tensor([0, 1])
Recursion: There are arrays defined by recursion. For example, we can rewrite $y \gets \{i\}_{i=0}^{4}$ using the recursive array syntax in Kokoyi:
y^{rec}[0 \leq i \leq 4] \gets
\begin{cases}
0 & i = 0 \\
y[i-1] + 1 & otherwise \\
\end{cases} \\
, which will be displayed as:
$ y^{rec}_{[0 \leq i \leq 4]} \gets \begin{cases} 0 & i = 0 \\ y_{[i-1]} + 1 & otherwise \\ \end{cases} \\ $
Compared with a regular array definition, recursive arrays have two additional requirements:
It's more straightforward than you think: you write out the transition first (on the right hand side), then specify the iteration condition (on the left hand side).
Let us give it a try. In the box below, define an array $F$ containing the famous Fibonacci number, $F[i] = F[i-1] + F[i-2]$.
%kokoyi
% Please enter your answer in this box.
F[0 \leq i \leq 10] \gets
\begin{cases}
0 & i = 0 \\
1 & i = 1 \\
F[i-1] + F[i-2] & otherwise \\
\end{cases} \\
ANTLR runtime and generated code versions disagree: 4.8!=4.7.2 ANTLR runtime and generated code versions disagree: 4.8!=4.7.2
F[0 \leq i \leq 10] \gets \begin{cases} 0 & i = 0 \\ 1 & i = 1 \\ F[i-1] + F[i-2] & otherwise \\ \end{cases} \\
print(kokoyi.symbol['F'])
[tensor(0), tensor(1), tensor(1), tensor(2), tensor(3), tensor(5), tensor(8), tensor(13), tensor(21), tensor(34), tensor(55)]
Some more (and fancier) examples below. For multiple arrays with potential mutual dependencies, you will need to use begin{group}
and end{group}
so Kokoyi compiler can infer them appropriately; we will see such an application in LSTM.
Note that the iteration is specified in the subscript, i.e. $a_{[0 \leq i \leq 5]}$ and $b_{[0 \leq i \leq 5]}$, which makes you wonder what happened if $i$ is zero, won't accessing $b_{[-1]}$ be out of bound? The answer is that $a_{[0]} \leftarrow 0$, as a shortcut, specifies the boundary condition.
%kokoyi
\Comment{Define a constant array of shape (5, 5) with values } \\
A \gets \{\exp(1)\}^{5 \times 5} \\
u \gets A[0:2, 0:3] \Comment{Slice a 2x3 top-left corner} \\
\Comment{Define two (or more) mutually dependent arrays, and the order of equations doesn't matter within the group syntax}\\
\begin{group}
a[0 \leq i \leq 5] \gets b[i-1] * 2, a[0] \gets 0 \\
b[0 \leq i \leq 5] \gets a[i-1] + 1, b[0] \gets 0\\
\end{group}
ANTLR runtime and generated code versions disagree: 4.8!=4.7.2 ANTLR runtime and generated code versions disagree: 4.8!=4.7.2
print(kokoyi.symbol['A'])
print(kokoyi.symbol['u'])
print(kokoyi.symbol['a'])
print(kokoyi.symbol['b'])
tensor([[2.7183, 2.7183, 2.7183, 2.7183, 2.7183], [2.7183, 2.7183, 2.7183, 2.7183, 2.7183], [2.7183, 2.7183, 2.7183, 2.7183, 2.7183], [2.7183, 2.7183, 2.7183, 2.7183, 2.7183], [2.7183, 2.7183, 2.7183, 2.7183, 2.7183]]) tensor([[2.7183, 2.7183, 2.7183], [2.7183, 2.7183, 2.7183]]) [tensor(0), tensor(0), tensor(2), tensor(2), tensor(6), tensor(6)] [tensor(0), tensor(1), tensor(1), tensor(3), tensor(3), tensor(7)]
Masked Array is an array that may have some invalid elements, it combines a standard tensor as data and a boolean tensor as mask.
Each element of the mask indicates whether the corresponding element of the data tensor is valid or not. True
in mask means the corresponding element is valid(unmasked), and False
in mask means the corresponding element is invalid(masked).
This is handy if you want to compute some property (e.g. a distribution or Masked Softmax) per member array.
%kokoyi
\Comment{Define a Masked Array, where the member arrays can be of different lengths} \\
C \gets \{ \{i + j\}_{j=1}^i\}_{i=1}^5 \\
ANTLR runtime and generated code versions disagree: 4.8!=4.7.2 ANTLR runtime and generated code versions disagree: 4.8!=4.7.2
data, mask = kokoyi.symbol['C']
print(data)
print(mask)
# You can get the valid values simply by indexing
print(data[mask])
tensor([[ 2, 3, 4, 5, 6], [ 3, 4, 5, 6, 7], [ 4, 5, 6, 7, 8], [ 5, 6, 7, 8, 9], [ 6, 7, 8, 9, 10]]) tensor([[ True, False, False, False, False], [ True, True, False, False, False], [ True, True, True, False, False], [ True, True, True, True, False], [ True, True, True, True, True]]) tensor([ 2, 3, 4, 4, 5, 6, 5, 6, 7, 8, 6, 7, 8, 9, 10])
Reduction operators like $\sum$ and $\prod$ are supported in Kokoyi. For example, we can sum the Fibonacci array $F$ by \sum_{i=0}^{10} {F[i]}
. The only difference with Latex's syntax is there are braces around the reduced element.
Try to use this syntax to sum the array $A$ in the box below:
%kokoyi
% Please enter your answer in this box.
S \gets \sum_{i=0}^{4} {\sum_{j=0}^{4} {A[i,j]}} \\
ANTLR runtime and generated code versions disagree: 4.8!=4.7.2 ANTLR runtime and generated code versions disagree: 4.8!=4.7.2
S \gets \sum_{i=0}^{4} {\sum_{j=0}^{4} {A[i,j]}} \\
print(kokoyi.symbol['S'])
tensor(67.9570)
Let us now move to define a function commonly used as the activation function in neural networks, called Sigmoid function. Mathematically, it is
$$ Sigmoid(x) \gets \frac{1}{1 + e^{-x}} $$, where $e$ is Euler's number). Kokoyi uses very similar syntax. First, double click here and copy-paste the math equation to the code cell below. Second, append it with a newline symbol \\
and replace the exponential with \exp(-x)
. Here, the \exp
is an built-in function of Kokoyi, which computes the power of Euler's number.
The result should look like this: $ \newcommand{\Op}[1]{{\color{blue}{\mathrm{#1}}}} \def\exp{\Op{exp}} $ $ Sigmoid(x) \gets \frac{1}{1 + \exp(-x)} \\ $
%kokoyi
% Please enter your answer in this box.
Sigmoid(x) \gets \frac{1}{1 + \exp(-x)} \\
ANTLR runtime and generated code versions disagree: 4.8!=4.7.2 ANTLR runtime and generated code versions disagree: 4.8!=4.7.2
Sigmoid(x) \gets \frac{1}{1 + \exp(-x)} \\
We can execute a function foo
by calling kokoyi.symbol['foo']
, passing whatever arguments it may require. Let's compare our $Sigmod$ Function with torch.sigmod
; they should be identical because Kokoyi compiler links to PyTorch modules and functions.
import torch
x = torch.tensor([1, 2, 3])
kokoyi_var = kokoyi.symbol['Sigmoid'](x)
torch_var = torch.sigmoid(x)
print(kokoyi_var)
print(torch_var)
tensor([0.7311, 0.8808, 0.9526]) tensor([0.7311, 0.8808, 0.9526])
Let's apply this to the Kokoyi array H
:
print(kokoyi.symbol['H'])
print(kokoyi.symbol['Sigmoid'](kokoyi.symbol['H']))
tensor([[1.0000, 0.5000, 0.3333, 0.2500, 0.2000], [0.5000, 0.3333, 0.2500, 0.2000, 0.1667], [0.3333, 0.2500, 0.2000, 0.1667, 0.1429], [0.2500, 0.2000, 0.1667, 0.1429, 0.1250], [0.2000, 0.1667, 0.1429, 0.1250, 0.1111]]) tensor([[0.7311, 0.6225, 0.5826, 0.5622, 0.5498], [0.6225, 0.5826, 0.5622, 0.5498, 0.5416], [0.5826, 0.5622, 0.5498, 0.5416, 0.5357], [0.5622, 0.5498, 0.5416, 0.5357, 0.5312], [0.5498, 0.5416, 0.5357, 0.5312, 0.5277]])
Math world has no if-else
statement. Instead, people list cases using big right brace symbol. In fact you have already used it in defining recursive arrays. In LaTeX, it is written as:
x \gets
\begin{cases}
value1 & condition1 \\
value2 & condition2 \\
... \\
valueN & otherwise \\
\end{cases}
, which is displayed as
$ x \gets \begin{cases} value1 & condition1 \\ value2 & condition2 \\ ... \\ valueN & otherwise \\ \end{cases} $
This is also how to write branches in Kokoyi. Let us try to define the famous ReLU activation function in the box below. The correct output should look like this:
$ ReLU(x) \gets \begin{cases} x & x > 0 \\ 0 & otherwise \\ \end{cases} $
%kokoyi
% Please enter your answer in this box.
ReLU(x) \gets
\begin{cases}
x & x > 0 \\
0 & otherwise \\
\end{cases} \\
ANTLR runtime and generated code versions disagree: 4.8!=4.7.2 ANTLR runtime and generated code versions disagree: 4.8!=4.7.2
ReLU(x) \gets \begin{cases} x & x > 0 \\ 0 & otherwise \\ \end{cases} \\
Perhaps the most useful abstraction in deep learning world is module, which maps to a familiar pattern such as $f(x; \theta)$, where $x$ is the input and $\theta$ is the parameters to be learned. We extend the syntax such that you can include a learnable submodule M
with $f(x; M)$, i.e. $\theta$ is within the submodule. We will have plenty time to learn how to write them in other notebooks so we will just settle with a very simple intro here.
Let's first write a linear transformation module $Linear(x; W, b) \gets W \cdot x + b$, where $x$ is the input data while $W$ and $b$ are learnable parameters so they are separated by the semicolon symbol. Realizing it in Kokoyi takes three steps:
\Module{
name }{
inputs; params }
module-body \EndModule
syntax to define the module.\gets ...
into the module body and assign it to a new variable $y$.\Return
keyword to mark the return value.You can give it a try below. The correct answer should look like this: $ \newcommand{\Module}[2]{\rule[0pt]{160mm}{1.0mm}\\ \textbf{Module}\quad\mathrm{#1}(#2)\\ \rule[0pt]{160mm}{1.0mm}\\} \def\EndModule{\rule[0pt]{160mm}{1.0mm} \\} \def\Return{{\bf Return} \quad} $ $ \Module{Linear}{x; W, b} y \gets W \cdot x + b \\ \Return y \\ \EndModule $
%kokoyi
% Please enter your answer in this box.
\Module{Linear}{x; W, b}
y \gets W @ x + b \\
\Return y \\
\EndModule
ANTLR runtime and generated code versions disagree: 4.8!=4.7.2 ANTLR runtime and generated code versions disagree: 4.8!=4.7.2
\Module{Linear}{x; W, b} y \gets W @ x + b \\ \Return y \\ \EndModule
We have written our module in Kokoyi, it defines the forward functions of the module. All we need to now is to complete the initialization part in PyTorch. Let's have a try on the $Linear$ module.
from torch import nn
class Linear(torch.nn.Module):
def __init__(self, in_dim, out_dim):
super().__init__()
self.W = nn.Parameter(torch.rand(out_dim, in_dim))
self.b = nn.Parameter(torch.zeros(out_dim))
def get_parameters(self):
# The order of returned parameters should be the same as the order of params in Kokoyi code
return self.W, self.b
forward = kokoyi.symbol['Linear']
However, you can also let Kokoyi set it up and just do some filling. To do so, while on a cell of a Kokoyi module, just hit the button at the top manual.
class Linear(torch.nn.Module): def __init__(self): """ Add your code for parameter initialization here (not necessarily the same names).""" super().__init__() self.W = None self.b = None def get_parameters(self): """ Change the following code to return the parameters as a tuple in the order of (W, b).""" return None forward = kokoyi.symbol["Linear"]
Now you can check that this transformation works:
linear = Linear(10, 5)
x = torch.randn(10)
y = linear(x)
print(y.shape)
print(y)
torch.Size([5]) tensor([1.5503, 1.9328, 2.9815, 3.0024, 2.8796], grad_fn=<AddBackward0>)
If you want to dump the executed code to run Kokoyi outside the notebook, just hit the button at the top manual.
def Linear(self, x): # <source 2:0 - 5:0> W, b = kokoyi.import_wrap(self.get_parameters()) # <source 2:0 - 5:0> _5 = kokoyi.matmul(W, x) # <source 3:12 - 3:16> _6 = kokoyi.add(_5, b) # <source 3:12 - 3:20> y = _6 # <source 3:4 - 3:4> return y # <source 2:0 - 5:0> kokoyi.symbol[r"Linear"] = kokoyi.export_module(Linear)
Congratulations for passing all the quiz! You are welcomed to go through the Kokoyi Cheat Sheet for more advanced usages. You should start the MLP_CNN notebook notebook next.