# Breaking the MoE LLM Trilemma: Dynamic Expert Clustering with Structured Compression - Refactored Prototype

This project is a refactored research prototype for DEGUC (Dynamic Expert Grouping with Unified Compression), featuring group clustering + shared bases + low-rank residuals + hierarchical quantization + inactive expert offloading + lightweight two-level routing. This implementation addresses the following issues from the original version:
- Duplicate class names / multi-version conflicts
- Quantization not integrated into forward pass
- Offloading / reloading not properly closed-loop
- Missing training loop
- Low efficiency in expert batch computation
- Missing load balancing loss
- Inconsistent clustering and parameter binding
- Lack of comprehensive metrics system

## Key Features
1. Two-level routing (group-level + intra-group top-k) with load balancing regularization
2. Dynamic online clustering (activation rate filtering + similarity combination) + cluster stability tracking
3. Group shared bases + expert low-rank residuals (A,B) with batched computation
4. Quantization: Optional post-training static quantization for group bases and residual low-rank reconstruction weights with forward pass replacement
5. Inactive expert offloading: Threshold-based pruning -> weight saving -> automatic reloading on routing hits
6. Metrics: Activation rates, load balancing terms, compression ratio estimation, clustering adjustment counts, offloading/reloading counts
7. Trainer: Demonstration training loop (dummy task) + scheduler (clustering/quantization/offloading cycles)
8. Extensible distributed support (current communication uses placeholder interfaces, preserving abstractions)

## Quick Start
```bash
pip install torch
python scripts/run_training.py
```

## Future Extension Suggestions
- Integration with real tasks (NLP / Vision)
- torch.distributed all_to_all token distribution
- Support for per-channel quantization + QAT
- Triton/torch.compile optimization for batch kernels
- Distillation/re-adaptation for post-clustering expert migration

## Directory Structure
See code comments in each submodule for details.
