Prioritize Alignment in Dataset Distillation

Zekai Li; Ziyao Guo; Wangbo Zhao; Tianle Zhang; Zhi-Qi Cheng; Samir Khaki; Kaipeng Zhang; Ahmad Sajedi; Kai Wang; Konstantinos N Plataniotis; Yang You

Prioritize Alignment in Dataset Distillation

Zekai Li, Ziyao Guo, Wangbo Zhao, Tianle Zhang, Zhi-Qi Cheng, Samir Khaki, Kaipeng Zhang, Ahmad Sajedi, Kai Wang, Konstantinos N Plataniotis, Yang You

08 May 2024 (modified: 06 Nov 2024)Submitted to NeurIPS 2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: dataset distillation, efficient learning

Abstract: Dataset Distillation aims to compress a large dataset into a significantly more compact, synthetic one without compromising the performance of the trained models. To achieve this, existing methods use the agent model to extract information from the target dataset and embed it into the distilled dataset. Consequently, the quality of extracted and embedded information determines the quality of the distilled dataset. In this work, we find that existing methods introduce misaligned information in both information extraction and embedding stages. To alleviate this, we propose Prioritize Alignment in Dataset Distillation (PAD), which aligns information from the following two perspectives. 1) We prune the target dataset according to the compressing ratio to filter the information that can be extracted by the agent model. 2) We use only deep layers of the agent model to perform the distillation to avoid excessively introducing low-level information. This simple strategy effectively filters out misaligned information and brings non-trivial improvement for mainstream matching-based distillation algorithms. Furthermore, built on trajectory matching, PAD achieves remarkable improvements on various benchmarks, achieving state-of-the-art performance. The code and distilled datasets will be made public.

Primary Area: Machine vision

Submission Number: 2744

Loading