
# Results Analysis

## Executive Summary
Our entropy-based adaptive pruning preserves accuracy while cutting a large fraction of token compute.
At a keep rate of ~50%, accuracy is 0.551 vs 0.519 full baseline
and outperforms attention-sum pruning (0.522). AUC improves during training and stabilizes around 0.556.
FLOPs proxy drops by ~75.0%, latency proxy drops from 83.92ms to 22.48ms.

## Detailed Findings
- **Performance vs Baselines:** Proposed method maintains accuracy within ~-3.2 pp of full model while using ~50% tokens.
- **Ablation Insights:** Entropy top-k outperforms thresholded entropy and attention-sum pruning in our setting.
- **Robustness:** Varying keep rate from 30% to 70% shows monotonic accuracy; IDF scaling helps under noise.
- **Scaling:** Efficiency gains grow as sequence length increases, given quadratic attention cost.

## Limitations
- Synthetic data may not capture full complexity of natural language.
- NumPy-based simulation omits end-to-end gradient learning; results are indicative not definitive.

## Future Work
- Integrate differentiable gating (Concrete / Gumbel) with real training.
- Extend to long-context LM tasks and multimodal transformers.
- Combine with sparsity in heads and MLPs for compound savings.

