Exploiting Student Parallelism for Low-latency GPU Inference of BERT-like Models in Online Services

Published: 2025, Last Modified: 25 Jan 2026KDD (2) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Loading