# STAR: Adaptive Structure-Aware MoE Router

**STAR: DATA STRUCTURE-AWARE ROUTING VIA INCREMENTAL SUBSPACE LEARNING FOR MOE**

This repository contains the implementation of Adaptive Structure-Aware MoE Gating (STAR), a novel data-driven gating mechanism for Mixture-of-Experts (MoE) models that dynamically interpolates between a standard learnable gating matrix and an evolving principal subspace learned via the Generalized Hebbian Algorithm (GHA).

## Key Features

- **Adaptive Structure-Aware Gating**: Dynamically combines learnable gating with data-driven principal subspace learning
- **Generalized Hebbian Algorithm (GHA)**: Incremental subspace learning for capturing input structure
- **Language Task Support**: Works with various language understanding tasks

## Repository Structure

```
STAR/
├── Language/                 # Language domain experiments
│   ├── dataset/              # Language datasets
│   └── scripts_star/         # Language experiment scripts
├── tutel/                    # Tutel MoE framework
│   └── tutel/gates/star.py   # STAR gate implementation
└── environment.yml           # Conda environment specification
```

## Installation

### Prerequisites

- Python 3.10
- CUDA 11.6+
- PyTorch 1.13.1+

### Setup

1. Clone the repository:
```bash
git clone <repository-url>
cd star
```

2. Create and activate the conda environment:
```bash
conda env create -f environment.yml
conda activate STAR
```

3. Install Tutel (MoE framework):
```bash
cd tutel
python setup.py clean --all
rm -rf build dist *.egg-info
USE_CUDA=1 pip install -e .
cd ..
```

## Usage

### Language Tasks

The project supports various language understanding tasks:

- **GLUE Benchmark**: CoLA, MRPC, MNLI, QNLI, RTE

#### Running Language Experiments

Example for running on GLUE tasks:

```bash
bash Language/scripts_star/cola.sh    # CoLA task
bash Language/scripts_star/mrpc.sh    # MRPC task
```

## STAR Implementation

### Core Components

1. **GHA (Generalized Hebbian Algorithm)**:
   - Incremental principal component analysis
   - Online subspace learning
   - Adaptive basis updates

2. **STARGate**:
   - Combines learnable gating with GHA-based routing
   - Dynamic interpolation between standard and structure-aware routing
   - Adaptive mixing coefficients

## Supported Datasets

### Language
- GLUE Benchmark (8 tasks)

## Acknowledgments

- Built on top of the GMoE framework
- Uses Tutel for efficient MoE implementations
- Integrates with Hugging Face datasets for language tasks