<div align="left">


# Virtual Logical Depth (VLD) Scaling

![VLD Scaling](https://i.imgur.com/N05OCji.png)


---

This repository contains the official implementation of **Virtual Logical Depth (VLD)**, a novel fourth dimension for scaling large language models that enhances reasoning capabilities without increasing parameter count through strategic parameter reuse.

## 📖 Overview

Virtual Logical Depth (VLD) allows increasing the effective algorithmic depth of neural networks by reusing parameters within the model. Unlike traditional scaling approaches that increase depth, width, or total parameters, VLD maintains constant parameter count while significantly improving reasoning performance.

### VLD Pattern Visualization
View the detailed pattern diagrams:

![VLD Patterns](https://i.imgur.com/8wdar3f.png)


### Key Findings

- **Knowledge vs. Parameters**: At fixed parameter count, VLD leaves knowledge capacity nearly unchanged while knowledge capacity scales with parameter count across models
- **Reasoning vs. Reuse**: VLD substantially improves reasoning ability without increasing parameters, decoupling reasoning from model size  
- **Robustness**: Improvements persist across architectures and configurations, indicating VLD captures general scaling behavior

## 🏗️ Repository Structure

This codebase contains implementations for two main experimental tracks:

### 1. GSM (Mathematical Reasoning) Experiments
- Synthetic mathematical reasoning tasks using iGSM dataset
- Evaluates VLD impact on step-by-step mathematical problem solving
- Tests different VLD patterns (Sequence, Cycle, Inverse Cycle)

### 2. Random Number (Knowledge Capacity) Experiments  
- Information entropy-based measurement of knowledge capacity
- Uses random token sequences to isolate memorization from reasoning
- Quantifies how VLD affects knowledge storage vs. reasoning capabilities

## 🚀 Training

### Setup
Training scripts are located in [`examples/gpt3/`].

Run training using Docker container:

```bash
PYTORCH_IMAGE=nvcr.io/nvidia/pytorch:24.01-py3
CHECKPOINT_PATH="" #<Specify path>
TENSORBOARD_LOGS_PATH=""#<Specify path>
VOCAB_FILE="" #<Specify path to file>/gpt2-vocab.json
MERGE_FILE="" #<Specify path to file>/gpt2-merges.txt
DATA_PATH="" #<Specify path and file prefix>_text_document

docker run \
  --gpus=all \
  --ipc=host \
  --workdir /workspace/megatron-lm \
  -v /path/to/data:/path/to/data \
  -v /path/to/megatron-lm:/workspace/megatron-lm \
  megatron-lm nvcr.io/nvidia/pytorch:24.01-py3 \
  bash examples/gpt3/train_gpt3_175b_distributed.sh $CHECKPOINT_PATH $TENSORBOARD_LOGS_PATH $VOCAB_FILE $MERGE_FILE $DATA_PATH
```

### Model Configurations

**345M Model:**
```bash
--num-layers 12 \
--hidden-size 512 \
--num-attention-heads 8 \
--seq-length 1024 \
--tensor-model-parallel-size 1 \
--pipeline-model-parallel-size 1 \
```

**857M Model:**
```bash
--num-layers 24 \
--hidden-size 1024 \
--num-attention-heads 16 \
--seq-length 2048 \
--tensor-model-parallel-size 1 \
--pipeline-model-parallel-size 1 \
```

## 🔮 Inference

Inference scripts are available in [`examples/inference/`].

## VLD Patterns

The implementation supports three parameter reuse patterns:

- **Sequence**: Sequential repetition of neighboring layers
- **Cycle**: Cyclic repetition of layer blocks  
- **Inverse Cycle**: Reverse-order repetition optimized for gradient flow

## Citation

```bibtex
@article{vld2024,
  title={Beyond Parameters: Exploring Virtual Logic Depth for Scaling Laws},
  author={Anonymous},
  journal={Under review at ICLR 2026},
  year={2024}
}
```

## License

This project is open-sourced to facilitate future research in efficient model scaling strategies.
