# GeST

We develop GeST, a deep generative transformer model that uses information from neighboring cells to iteratively generate cellular profiles in spatial contexts.

Learning the spatial context of cells through pre-training may enable us to systematically decipher tissue organization and cellular interactions in multicellular organisms. Yet, existing models often focus on individual cells, neglecting the intricate spatial dynamics between them. We develop GeST, a deep generative transformer model that uses information from neighboring cells to iteratively generate cellular profiles in spatial contexts. In GeST, we propose a novel serialization strategy to convert spatial data into sequences, a robust cell quantization method to tokenize continuous gene expression profiles, and a specialized attention mechanism in the transformer to enable efficient training. We pre-trained GeST on a large-scale spatial transcriptomics dataset from the mouse brain and demonstrated its performance in unseen cell generation. Our results also show that the pre-trained model can extract spatial niche embeddings in a zero-shot way and can be further fine-tuned for spatial annotation tasks. Furthermore, GeST can simulate gene expression changes in response to spatial perturbations, closely matching experimental results. Overall, GeST offers a powerful framework for generative pre-training on spatial transcriptomics.

## Table of Contents

- [Installation](#installation)
- [Usage](#usage)
- [Contributing](#contributing)
- [License](#license)

## Installation

We use Anaconda to manage the environment. To install the environment, run the following commands:
```bash
conda env create -f environment.yml
```

## Usage
### Generate Metacell
The first step is to generate the metacell for the dataset. To do so, please follow the instructions in the [`metacell.ipynb`](metacell.ipynb) notebook. After running the metacell, you will see a [`metacell`](metacell) folder in the root directory.

### Pretrain GeST
The next step is to pretrain GeST. To do so, please run the command in the [`run.sh`](run.sh) file.
```bash
bash run.sh
```
 After running the pretraining, you will see a [`demo`](demo) folder in the root directory. This folder contains the pretrained model, the config file and the training logs. 
 
 You can check the pre-training configuration in the [`Merfishbrain-train.yaml`](Merfishbrain-train.yaml) file. Our code can adjust either single or multi-GPU training. And our data loader can handle both single and multi files for pre-training. We use gradient accumulation for the multi-GPU training.

### Inference
To generate the unseen cells, you can check the [`inference.py`](inference.py). In this file, you need to specify the path to the pretrained model and the path to the unseen cells. The output will be the generated cells in the same format as the input cells.

### Downstream Task
You can check the jupyter notebook in the [`downstreamtask`](downstreamtask) folder. In these notebooks, we show how model performs on several downstream tasks, such as how to get niche embeddings in a zero-shot way, how to generate results from other comparsion methods. We also show the results of niche annotation tasks. 

## License

This project is licensed under the GPL License. And we are happy to share all pre-trained model weights, as well as pre-training data after the publication of the paper.
