Optimal Dataset Design for Nurture-then-Nature Teaching

ICLR 2026 Conference Submission22142 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Machine Teaching, Dataset Optimization, Linear Datamodels
TL;DR: We study a novel budget constrained teaching setting called Nurture then Nature teaching and provide optimal and practical algorithms to solve the problem in different settings.
Abstract: Designing an optimal dataset to teach a target concept to a learner has been a well-studied problem in Machine Learning. Prior works have mostly focused on unconstrained single-phase teaching, where the learner learns solely under the guidance of a helpful teacher who can provide any number of examples. In this work, we introduce a more realistic two-phase framework called "Nurture-then-Nature" where the learner first learns under the guidance of a teacher in the 'Nurture' phase, followed by an i.i.d. learning phase from 'Nature'. Importantly, the teacher is constrained to provide a dataset of size up to $B$ and is required to minimize the final error of the learner. We study this problem in the 'instance-agnostic' and 'instance-aware' settings and provide efficient teaching algorithms for each of them. We provide theoretical guarantees and experimental results to support our findings.
Supplementary Material: zip
Primary Area: learning theory
Submission Number: 22142
Loading