
Summary:
The paper proposes an entropy-based adaptive token pruning mechanism for transformer-style encoders. It presents a clear motivation,
a reproducible simulation framework, and ablations indicating improved efficiency with minimal accuracy loss.

Scores:
- Technical Quality (1-10): 7
- Clarity and Presentation (1-10): 8
- Significance and Impact (1-10): 7
- Experimental Evaluation (1-10): 6

Strengths:
- Well-motivated efficiency problem with clear framing.
- Simple, principled token-importance measure (entropy proxy).
- Reproducible NumPy code and deterministic results.
- Comprehensive plots (loss, AUC, ablation, comparison).
- Practical metrics (FLOPs and latency proxies).
- Clear master’s-project scope and feasibility.

Weaknesses:
- Simulations instead of end-to-end training with gradients.
- Limited datasets; no real NLP benchmarks.
- Lack of statistical significance testing across seeds.
- Theoretical bounds are stated informally; no formal proof.
- No multi-head attention or modern transformer components.

Detailed Comments:
The work is a solid master’s-level step toward principled token pruning. However, demonstrating results on standard text datasets
and integrating a differentiable gating mechanism (Concrete/Gumbel) would considerably strengthen the contribution.
The ablation is helpful but should be more granular (e.g., threshold choices, sensitivity to noise).

Questions for Authors:
1) Can the entropy proxy be combined with attention entropy or gradient-based saliency?
2) How sensitive is performance to the keep-rate and task difficulty?
3) Could you report wall-clock latency on CPU/GPU for small models?
4) Any path to extend to ViTs or multimodal transformers?

Recommendation: Weak Accept
Confidence Level: 3/5
