# Thunnini

Thunnini is an experimentation library to study and understand fundamental
aspects of fine-tuners for neural sequence predictors. Currently, 9 different
tuners, such as soft prompting, embedding tuning, or low-rank adaptation (LoRA),
are implemented on two neural architectures (LSTMs and transformers). Thunnini
is also the name of the zoological tribe of the tunas (17 species).

Thunnini provides functionality to:

1.  **Pretrain** neural sequential predictors via log loss minimization over
    samples from a data generator.
2.  **Fine-tune** neural models to a target data distribution. Tune either
    weights (original weights, or additional weights, like in LoRA), or tune a
    prompt prefix (soft prompting).
3.  **Evaluate** tuned models' prediction performance on one or more evaluation
    data distributions.
4.  **Compare** different architectures and fine-tuning methods, and compare
    against baselines: the optimal oracle predictor, the several exact Bayesian
    predictors, and the untuned model.

*Simple experimentation:* Standard Thunnini experiments are very simple to
specify via a set of configurations (plain-text dicts or dataclasses) for data
generation, model architecture, training, and fine-tuning procedures, as well as
evaluation settings. Full reproducibility, given the same configuration, is
ensured.

*Batteries included:* Thunnini comes with easily configurable LSTMs and
transformers (decoder-only) and some data generators (coin-flip or dice-roll
sources, for which the exact Bayesian predictor is tractable). Thunnini also
comes with a notebook to run a full experimentation pipeline (pretraining,
fine-tuning, evaluation and comparison) by setting a few lines of configuration.

*Easily extendable:* Thunnini can be extended with more predictors and data
generators by implementing the respective interfaces and passing all tests.

The design philosophy is to provide a lightweight and nimble experimentation
library to study conceptual aspects of fine-tuning methods, and prompting
untrained networks, by having full understanding and control over the data
distributions. Connections to the theory of meta-learning and Bayesian
sequential prediction from SGD-based log loss minimization, can easily be
verified empirically. Thunnini aims at models and data that train or fine-tune
on a single GPU within minutes. LLM-scale experiments are (far) beyond the scope
of Thunnini.

## Usage

From the directory where you found this file (that contains `/colabs/` and `/src/`, run the following commands.

This repository requires Python 3.11. `pip install -r requirements.txt` will
install all required dependencies. This is best done inside a virtual Python
environment. To that end, install [virtualenv](https://virtualenv.pypa.io/):

```bash
sudo apt-get install virtualenv python3-venv
```

Then, create and activate the virtualenv:

```bash
virtualenv thunnini_env
source thunnini_env/bin/activate
```

Alternatively you can also use [conda](https://www.anaconda.com/) to manage
virtual environments. See the conda documentation for instructions.

Inside your virtual environment, use `pip` to install all required dependencies:

```bash
pip install -r requests
pip install -r requirements.txt
```

**Running notebooks locally:** To get started with Thunnini locally, start a
local Jupyter notebook server.

> If you followed the installation instructions above, make sure to first
> activate your virtual environment and set the `PYTHONPATH`. Change dir to the
> local clone of the `thunnini` folder and run:
>
> ```bash
> source thunnini_env/bin/activate
> export PYTHONPATH=$(pwd)/..
> ```

Start a local Jupyter notebook with:
```bash
jupyter notebook
```

This will give you a (local) URL with an authentication token which you need to
copy and paste into your browser. From there, navigate to `thunnini/colabs` and
open one of the two notebooks. Alternatively, the local notebook server can be
set as a local runtime for Colab. See
[local Colab runtimes](https://research.google.com/colaboratory/local-runtimes.html)
for instructions, including how to access Colab Docker runtime images with GPU
support.

To run easily configurable Thunnini experiments (from pretrainig via fine-tuning
to evaluation and comparison of methods) use `colabs/ThunniniExperiment.ipynb`.
The notebook allows to easily change standard configuration settings and make
small extensions. It also demonstrates Thunnini's user level functions. For more
sophisticated modifications, diving deeper into Thunnini is necessary.

If you want to get a very quick overview on how to train, fine-tune, and
evaluate a single model, use `colabs/ThunniniDemo.ipynb`; it avoids the boiler
plate code that `ThunniniExperiment.ipynb` needs to surface all configuration
options and collect and compare results across many fine-tuning methods.

*After* you have run all experiments, etc., leave your virtual environment with:

```bash
deactivate
```

## Fine-tuning methods

The following fine-tuners are implemented:

*   **Prefix tuning (4 methods):** Fine-tune a (fixed length) prefix of tokens,
    or token-embeddings, by minimizing log loss on samples from the fine-tuning
    data generator. Model weights remain frozen.
    *   *Simplex prefix:* The prompt prefix is constrained to lie in the simplex
        spanned by the D-dimensional space of one-hot tokens, i.e., the simplex
        prefix is a sequence of D-dimensional real vectors whose components sum
        to one.
    *   *Real prefix:* Same as SimplexPF but without the simplex constraint,
        i.e., a sequence of D-dimensional real vectors.
    *   *Soft prefix:* Direct tuning of the embeddings that would result from a
        prompt prefix of a certain length ([Soft Prompting](https://arxiv.org/abs/2104.08691)).
    *   *Hard prefix:* Use exhaustive search to find the best prefix sequence of
        hard (one-hot) tokens. Intractable for long prefixes and large token
        alphabets.
*   **Weight-tuning (4 methods):** Tune all or some of the model's
    weights.
    *   *Full weights:* fine-tune all weights.
    *   *Embedding:* fine-tune only weights of initial embedding layer.
    *   *Unembedding:* fine-tune only weights of final unembedding layer.
    *   *Un+Embedding:* fine-tune initial and final layer.
*   **Additional-weights tuning (1 method):** Introduce additional tunable model
    parameters, like task specific heads or adapters, and keep original model
    weights frozen.
    *   *LoRA:* [Low-rank Adaptation](https://arxiv.org/abs/2106.09685)
        introduces low-rank additive weight matrices to linear layers in
        transformer blocks (therefore it is only supported for transformers).

To ensure compatibility with all fine-tuning methods and other functionality
provided by Thunnini (such as embedding-prefixed forward passes), the
`Predictor` class wraps a `PredictorTorso` between an embedding and unembedding
layer (among other things). Currently a `LSTMPredictorTorso` and a
`TransformerPredictorTorso` are implemented, and it is highly recommended to add
new neural architectures as torsos to make them instantly compatible with all of
Thunnini's functionality.

## Data generators

Thunnini provides a number of standard data generators. Data generators allow to
draw samples and return the corresponding "ground-truth" generating
probabilities (for the oracle predictor baseline). Generators also compute their
respective Bayesian predictors (for the Bayes optimal baseline). Currently
implemented data generators are:

*   *Categorical:* Fixed categorical distribution. For 2 dimensions this is a
    Bernoulli variable, a.k.a. a coin with fixed bias.
*   *Mixture of Categoricals:* Mixture of several Categorical distributions with
    particular mixing proportions. For 2 dimensions this is a mixture over coins
    with different bias.
*   *Dirichlet-Categorical:* Distribution over Categoricals with a Dirichlet
    prior. For 2 dimensions this is Beta-Binomial, i.e., an infinite mixture
    over coins with different bias, where the probability of each bias is given
    by the Beta distribution.
