Information Compensation: A Fix for Any-scale Dataset Distillation

ICLR 2024 Workshop DMLR Submission94 Authors

Published: 04 Mar 2024, Last Modified: 02 May 2024DMLR @ ICLR 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: dataset distillation
Abstract: Dataset distillation, a recent machine learning paradigm, aims to compress large datasets into smaller, effective versions. In this paper, we introduce a near-Lossless Information Compression (LIC) approach that directly compresses the key information of original datasets into distilled forms with minimal information loss. Our LIC markedly surpasses existing solutions in both efficiency and effectiveness, demonstrating superior performance across a range of dataset sizes, from CIFAR-10 to ImageNet-1K. For instance, using a ResNet-18 backbone with IPC = 10, LIC distills the entire ImageNet-1K dataset in just 80 minutes, achieving a top-1 validation accuracy of 48%, significantly outperforming the SOTA method SRe2L, which only attains 25% accuracy and requires five times longer to process. We will make our code publicly available.
Primary Subject Area: Other
Paper Type: Research paper: up to 8 pages
DMLR For Good Track: Participate in DMLR for Good Track
Participation Mode: Virtual
Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.
Submission Number: 94
Loading