Abstract: Highlights•We theoretically analyze that minimizing the difference between pairwise similarity discrepancies of student and teacher networks is beneficial to maintaining the consistency of ranking results between them.•We propose a nonlinear pairwise difference relational knowledge loss function to transfer pairwise difference knowledge from a large teacher network to a light student network.•We implement extensive experiments and analyses to show that our method acquires state-of-the-art performance.
Loading