Keywords: Knowledge Distillation, Temperature Scaling, Multi-label Learning, Computer Vision
Abstract: This paper undertakes meticulous scrutiny of the pure logit-based distillation under multi-label learning through the lens of activation function. We begin with empirically clarifying a recently discovered perspective that vanilla sigmoid per se is more suitable than tempered softmax in multi-label distillation, is not entirely correct. After that, we reveal that both the sigmoid and tempered softmax have an intrinsic limitation. In particular, we conclude that ignoring the decisive factor temperature $\tau$ in the sigmoid is the essential reason for its unsatisfactory results. With this regard, we propose unleashing the potential of temperature scaling in the multi-label distillation and present Tempered Logit Distillation (TLD), an embarrassingly simple yet astonishingly performant approach. Specifically, we modify the sigmoid with the temperature scaling mechanism, deriving a new activation function, dubbed as tempered sigmoid. With theoretical and visual analysis, intriguingly, we identify that tempered sigmoid with $\tau$ smaller than 1 provides an effect of hard mining by governing the magnitude of penalties according to the sample difficulty, which is shown as the key property to its success. Our work is accompanied by comprehensive experiments on COCO, PASCAL-VOC, and NUS-WIDE over several architectures across three multi-label learning scenarios: image classification, object detection, and instance segmentation. Distillation results evidence that TLD consistently harvests remarkable performance and surpasses the prior counterparts, demonstrating its superiority and versatility.
Primary Area: other topics in machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 849
Loading