# SymForce: Large Language Models as Symbolic Physics Engines for Molecular Conformation

## Why SymForce?

SymForce reconceptualizes molecular conformation generation as a **dynamic physical reasoning process** rather than a static prediction task. By employing a large language model as a symbolic physics engine, SymForce generates physically-grounded 3D molecular structures through iterative force-guided optimization.

### Architecture Components

1. **Geometric Encoder**: PaiNN (Polarizable Atom Interaction Neural Network) for SE(3)-equivariant molecular representations
2. **LLM Symbolic Force Generator**: Llama-3.1-8B-Instruct fine-tuned to produce structured force descriptions
3. **Symbolic-to-Numerical Translator**: Hybrid parsing and learned mapping to convert text forces into numerical vectors
4. **Adaptive Optimization**: Physics-constrained iterative refinement with adaptive step sizes

## Dependencies

Create a conda environment with Python 3.10 and PyTorch 2.4.1:

```bash
conda create -n symforce python=3.10
conda activate symforce
conda install pytorch==2.4.1 torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
```

Then install the required packages:

```bash
pip install -r requirements.txt

# Install PyTorch Geometric
pip install torch-geometric
pip install torch-scatter torch-sparse -f https://data.pyg.org/whl/torch-2.4.0+cu121.html

# Install chemistry libraries
conda install -c conda-forge rdkit openbabel -y

# Install Flash Attention (optional, for faster training)
pip install flash-attn --no-build-isolation

# Install additional dependencies
pip install transformers==4.39.0 peft==0.10.0 bitsandbytes==0.43.0
pip install pytorch-lightning==2.2.0 deepspeed==0.14.0
```

## Model Preparation

### Pre-trained Components

SymForce uses the following pre-trained models (automatically downloaded via HuggingFace):

1. **LLM Backbone**: [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
   - Requires HuggingFace authentication for LLaMA models
   - Set your token: `huggingface-cli login`

2. **SciBERT** (optional, for molecular text encoding): [allenai/scibert_scivocab_uncased](https://huggingface.co/allenai/scibert_scivocab_uncased)

## Training

### Stage 1: Symbolic Force Generator Fine-tuning

Fine-tune the LLM to generate structured force field descriptions.

### Stage 2: End-to-End Conformation Optimization

Train the full SymForce pipeline with physics-informed losses.

## Inference

### Interactive

Run the playground script with custom inputs:

```bash
python inference.py 
```
