Abstract: Highlights•We explore why the conventional KD underperforms when applied to CTC models.•We propose Factorized KL-divergence for CTC-based models’ KD.•We propose a progressive KD framework to gradually build up the student’s knowledge.
Loading