DotMatch: Simplified Semi-Supervised Learning with the Log Dot Product Loss

Jonathan Wilton; Nan Ye

DotMatch: Simplified Semi-Supervised Learning with the Log Dot Product Loss

Jonathan Wilton, Nan Ye

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: semi-supervised learning

Abstract: Semi-supervised learning (SSL) algorithms typically work by generating supervisory signals for unsupervised data using the model being trained, but such supervisory signals are generally imperfect, thus various techniques have been proposed to balance the signal-to-noise ratio, such as confidence-based pseudo-labeling, consistency regularization and entropy regularization. However, these methods often require careful tuning of hyperparameters, such as the confidence threshold in pseudo-labeling and the regularization strength in regularization methods, which is often a challenging task, particularly with limited labeled data available for validation. In this paper, we introduce DotMatch, an SSL algorithm that is capable of balancing the signal-to-noise ratio without any algorithm specific hyperparameters. Specifically, we introduce a novel consistency loss on unsupervised data to replace the cross-entropy loss, called the log dot product (LDP) loss, which is simply the negative log of the dot product between the predicted label distributions of weak and strong augmented views of an input. Compared to the cross-entropy loss with soft target, the LDP loss enjoys several benefits in the context of SSL: non confident examples have low impacts on model updates, as in confidence-based pseudo-labeling methods such as SoftMatch; predictions are encouraged to have a low entropy, as in entropy-regularized methods; and interestingly, its gradient is appropriately scaled relative to the gradient of the supervised loss, thus requiring no regularization constant. We additionally combine the LDP loss with distribution alignment to ensure the distribution of predictions on unlabeled data match that of the labeled data. We provide a theoretical analysis to explain the efficacy of DotMatch from the perspective of loss gradients. Extensive experiments show that DotMatch is competitive with state-of-the-art baselines without needing to tune any algorithm-specific hyperparameters for different datasets.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 24517

Loading