# Molecular regression

## Dataset

The training is performed on the the AQSOL dataset, the PCQM4Mv2 dataset and the ZINC dataset, which is are collections of molecular structures used for training and evaluating molecular regression models.

## Models

The following graph neural network (GNN) architectures are available for training:

- **GCN**: Graph Convolutional Network
- **GAT**: Graph Attention Network
- **GatedGCN**: Gated Graph Convolutional Network
- **Transformer**: Transformer-based model

## Positional Encodings

The following positional encoding (PE) functions can be used:

- **laplacian**: Laplacian positional encoding
- **laplacian_abs**: Absolute Laplacian positional encoding
- **rw**: Random walk positional encoding
- **gape**: Graph Alignment positional encoding
- **nope**: No positional encoding

## Command Line Interface

To train a molecular regression model using the command line interface, you need to specify the model name and the positional encoding function. The available options for the model name are `gcn`, `gat`, `gatedgcn`, and `transformer`. The available options for the positional encoding function are `laplacian`, `laplacian_abs`, `rw`, `gape`, and `nope`.

### Usage

```sh
uv run molecular_regression -m <model_name> -p <pe_name>
```

### Example

To train a GCN model with Laplacian positional encoding, you would use the following command:

```sh
uv run molecular_regression -m gcn -p laplacian
```

This command will start the training process using the specified model and positional encoding function.
