QMill: Quantum Data Generation for Effective and Efficient Quantum Machine Learning

ICLR 2026 Conference Submission14710 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Quantum Computing, Dataset Generation, Quantum Machine Learning
TL;DR: This paper introduces QMILL, a framework that generates synthetic quantum data with realistic, variable distributions of entanglement to address the data scarcity bottleneck hindering progress of quantum machine learning.
Abstract: Quantum machine learning (QML) has the potential to transform various fields, especially the ones that utilize quantum datasets, as QML tasks with quantum datasets have provable speedups. Yet, QML’s progress is limited by a lack of suitable quantum datasets for training and evaluation. While methods have been proposed to generate synthetic quantum datasets, these methods fail to accurately capture the entanglement properties necessary for effective generation of QML datasets. This lack of diverse and entanglement-rich data hampers the development and benchmarking of QML models. To address this, we present QMILL, a versatile quantum data generation framework that emulates diverse classical and quantum data distributions with low circuit depth, producing entangled, high-quality dataset samples to support QML advancement.
Primary Area: datasets and benchmarks
Submission Number: 14710
Loading