Keywords: Dropout regularization, Adaptive dropout, Trust-aware learning, Channel reliability, Knowledge-based dropout, Convolutional neural networks (CNNs), Gradient variance reduction, Efficient training
TL;DR: KTAD adaptively drops CNN channels using trust scores, outperforming DropBlock with higher accuracy, lower gradient variance, better accuracy and efficiency across benchmarks.
Abstract: Dropout is a widely used tool for preventing overfitting in convolutional neural networks (CNNs), but standard implementations apply the same rate to every channel, overlooking large differences in their reliability. We introduce Knowledge Trust-Aware Adaptive Dropout (KTAD), a simple drop-in replacement that assigns real-time trust scores to each channel and adapts dropout rates accordingly. We show that this approach preserves informative features while more aggressively regularizing weaker ones. Across SVHN, CIFAR-100, CIFAR-100-C, and ImageNet-32, KTAD consistently outperforms DropBlock, the current standard, achieving up to 2.2 percentage points higher accuracy on CIFAR-100-C and 3.2% better accuracy per training GFLOP on ImageNet-32. Our theoretical analysis shows that adaptive dropout leads to lower gradient variance, faster convergence, and tighter generalization bounds. In over 200 randomized trials, KTAD variants win 60–73% of head-to-head comparisons. These results show that reliability-aware dropout not only improves accuracy but also reduces training cost, making KTAD an efficient replacement for uniform dropout in large-scale CNN training.
Primary Area: optimization
Submission Number: 21089
Loading