Abstract: Highlights•We adapt CKA to KD for more informative transfer of structures in BERT.•We categorize intra-feature, local inter-feature, and global inter-feature structure.•We propose memory augmentation for global structures distillation method.•We empirically analyze the quantitative and qualitative analysis.•We validate practical usefulness over a wide range of language understanding tasks.
Loading