Semi-Supervised Dataset Condensation with Dual Consistency Trajectory Matching

Semi-Supervised Dataset Condensation with Dual Consistency Trajectory Matching

ICLR 2026 Conference Submission25316 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Dataset condensation, semi-supervised learning, knowledge distillation

Abstract: Dataset condensation synthesizes a small dataset that preserves the performance of training on the original, large-scale data. However, existing methods rely on fully labeled data, which limits their applicability in real-world scenarios where unlabeled data is abundant. To bridge this gap, we introduce a new task called $\textbf{Semi-Supervised Dataset Condensation}$, which condenses both labeled and unlabeled samples into a small yet informative synthetic labeled dataset, thereby enabling efficient supervised learning. We propose $\textbf{Semi-Supervised Dual Consistency Trajectory Matching (SSD)}$, a method that leverages semi-supervised knowledge distillation. The core of SSD is a two-stage trajectory matching framework that effectively incorporates unlabeled data. First, a teacher model is trained on the original data to generate accurate pseudo-labels using semi-supervised learning. Then, a student model is trained on the entire dataset with a novel \textit{dual consistency regularization} loss. This loss enforces both $\textbf{inter-model}$ consistency (between the student and teacher predictions) and $\textbf{intra-model}$ consistency (for the student model under different input perturbations), ensuring robust performance. By aligning the training trajectories of the student model on the complete dataset and the synthetic dataset, SSD optimizes and obtains a high-quality synthetic dataset. Experiments on image classification benchmarks demonstrate that SSD consistently outperforms previous methods, achieving superior performance and efficiency in dataset condensation.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 25316

Loading