# Expert-level Leaf Cell Layout Generation via Preference-Optimized LLM

In the field of integrated circuits, leaf cells are the basic units, serving as the fundamental building blocks (e.g., standard cells) that are widely reused in various VLSI designs, forming the basis for more complex circuits. Therefore, the design quality of leaf cell layouts significantly impacts the PPA (Power, Performance, and Area) of the final VLSI designs. To automatically design leaf cell layouts that are close to expert designs, we propose GenLeaf. GenLeaf first utilizes a supervised, performance-aware embedding model to represent layouts and automatically calculate their similarity scores. Since there are expert-designed layouts but no corresponding scripts, we implement Bayesian optimization to generate a layout-script dataset for LLM training. With subsequent supervised fine-tuning and further preference optimization, GenLeaf can generate leaf cell layouts through scripts whose performance closely resembles that designed by human engineers. Experiment results demonstrate that GenLeaf outperforms expert-designed golden layouts across key performance metrics.

## Overview

GenLeaf automates the placement and routing process for LeafCell circuits through a multi-stage pipeline:

1. **Representation Learning**: Learn circuit layout embeddings using graph neural networks
2. **Supervised Fine-Tuning (SFT)**: Train LLM to generate placement code from circuit specifications
3. **GRPO Optimization**: Further optimize the model using reinforcement learning with embedding-based rewards
4. **Bayesian Optimization**: Search for optimal placement parameters
5. **Routing API**: Automatic routing with wire length optimization

## Project Structure

```
GenLeaf/
├── main.py                    # Main entry point for placement and routing
├── bayes_opt.py               # Bayesian optimization for placement parameters
├── README.md                  # This file
│
├── Representation/            # Circuit embedding model
│   ├── main.py               # Training entry point
│   ├── model.py              # Graph neural network model
│   ├── data_loader.py        # Data loading utilities
│   ├── data_generator.py     # Generate training data
│   ├── trainer.py            # Training loop
│   ├── tester.py             # Model evaluation
│   ├── inference.py          # Inference script
│   └── config.json           # Model configuration
│
├── SFT/                       # Supervised Fine-Tuning
│   ├── train.py              # SFT training script
│   ├── inference.py          # Model inference
│   └── output/               # Trained model checkpoints
│
├── GRPO/                      # Group Relative Policy Optimization
│   ├── generate_samples.py   # Generate training samples
│   ├── compute_rewards.py    # Compute embedding-based rewards
│   ├── train.py              # GRPO training script
│   ├── inference.py          # Model inference
│   └── output/               # Trained model checkpoints
│
├── Placement/                 # Custom placement module
│   └── custom.py             # Placement algorithms
│
├── Routing/                   # Routing module
│   ├── main.py               # Routing entry point
│   ├── router.py             # Core routing algorithm
│   ├── data_loader.py        # Data structures
│   ├── file_utils.py         # File I/O utilities
│   └── visualizer.py         # Visualization tools
│
└── Prediction/                # Wirelength prediction
    └── MLP.py                # MLP-based predictor
```

## Installation

### Requirements

- Python 3.8+
- PyTorch 2.0+
- CUDA (recommended for GPU acceleration)

### Install Dependencies

```bash
pip install torch torchvision transformers datasets peft accelerate bitsandbytes
pip install scikit-learn scikit-optimize numpy matplotlib
```

## Usage

### 1. Representation Learning

Train the circuit embedding model:

```bash
cd Representation

# Generate training data
python data_generator.py

# Train the model
python main.py --mode train

# Test the model
python main.py --mode test

# Run inference
python inference.py --model_path ./checkpoints/best_model.pt
```

### 2. Supervised Fine-Tuning (SFT)

Fine-tune Qwen2.5-14B for circuit code generation:

```bash
cd SFT

# Train the model
python train.py

# Run inference on a circuit
python inference.py /path/to/circuit_data.json
```

**Configuration:**
- Quantization: 4-bit for memory efficiency
- LoRA: r=16, alpha=32
- Training: 3 epochs, batch_size=1, gradient_accumulation=8

### 3. GRPO Training

Optimize the SFT model using reinforcement learning:

```bash
cd GRPO

# Step 1: Generate samples for each circuit
python generate_samples.py

# Step 2: Compute embedding-based rewards
python compute_rewards.py

# Step 3: Train with GRPO
python train.py

# Step 4: Run inference
python inference.py
python inference.py --circuit circuit_data.json  # For specific circuit
```

**GRPO Configuration:**
- Learning rate: 1e-6
- Epochs: 3
- KL coefficient: 0.1
- Clip range: 0.2 (PPO-style)

### 4. Bayesian Optimization

Search for optimal placement parameters:

```bash
python bayes_opt.py
```

This uses Gaussian Process optimization to find the best cell ordering and rotation configurations.

### 5. Placement and Routing

Run the complete pipeline:

```bash
python main.py
```

This will:
1. Load circuit data
2. Apply custom placement with specified order and rotation
3. Run single-row routing
4. Output routed circuit and visualization

## Key Features

### Representation Module
- Graph Neural Network (GNN) based circuit embedding
- Learns to capture layout similarity
- Provides embedding distance as reward signal for GRPO

### SFT Module
- Uses Qwen2.5-14B as base model
- 4-bit quantization with LoRA for efficient training
- Generates Python code for placement configuration

### GRPO Module
- Group Relative Policy Optimization for reinforcement learning
- Uses embedding distance as reward (closer to ground truth = higher reward)
- KL-divergence regularization to prevent mode collapse

### Routing Module
- Routing algorithm
- Multi-layer metal support
- Wire length and via count optimization
- Visualization output

### Prediction Module
- MLP-based wirelength prediction
- Uses circuit embeddings as input
- Fast inference for design space exploration

## Input Format

Circuit data is specified in JSON format:

```json
{
  "cells": [
    {
      "id": "cell_0",
      "width": 1000,
      "height": 2000,
      "location": [0, 0],
      "rotation": "R0",
      "pins": [
        {"name": "A", "net": "net1", "offset": [100, 500]}
      ]
    }
  ],
  "nets": [
    {"name": "net1", "wires": [], "vias": []}
  ]
}
```

## Output

- **Routed JSON**: Complete circuit with routing information
- **Visualization**: PNG image showing placement and routing
- **Statistics**: Wire length, via count, and other metrics

## License

This project is for research purposes.

## Citation

If you use GenLeaf in your research, please cite:

```bibtex
@misc{genleaf2025,
  title={GenLeaf: LLM-based LeafCell Place-and-Route Generation},
  author={Han, Yuhan},
  year={2025}
}
```
