Abstract: Highlights•Decoupled structure to retain the adaptation advantage from DNN-HMM in end-to-end models.•Applied to attention-based encoder–decoder and neural transducer models.•Flexible domain adaptation with internal language model directly replaced.•Boosted cross-domain speech recognition accuracy while maintaining competitive intra-domain word error rates.•Consistent effectiveness across diverse tasks including the end-to-end speech translation.
Loading