Knowledge Distillation Through Time For Future Event Prediction

Published: 19 Mar 2024, Last Modified: 04 Apr 2024Tiny Papers @ ICLR 2024 PresentEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Deep Learning, Artificial Intelligence, Machine Learning, Knowledge Distillation, Time Series, Seizure Prediction
TL;DR: Our paper introduces knowledge distillation through time (KDTT), a technique in which a teacher model distills future knowledge to a student which is sequentially in the past.
Abstract: Is it possible to learn from the future? Here, we introduce knowledge distillation through time (KDTT). In traditional knowledge distillation (KD), a reliable teacher model is used to train an error-prone student model. The difference between the teacher and student is typically model capacity; the teacher is larger in architecture. In KDTT, the teacher and student models differ in their assigned tasks. The teacher model is tasked with detecting events in sequential data, a simple task compared to the student model, which is challenged with forecasting said events in the future. Through KDTT, the student can use the ’future’ logits from a teacher model to extract temporal uncertainty. We show the efficacy of KDTT on seizure prediction, where the student forecaster achieves a 20.0% average increase in the area under the curve of the receiver operating characteristic (AUC-ROC)
Submission Number: 187
Loading