On the Ability of Developers' Training Data Preservation of Learnware

Hao-Yi Lei; Zhi-Hao Tan; Zhi-Hua Zhou

On the Ability of Developers' Training Data Preservation of Learnware

Hao-Yi Lei, Zhi-Hao Tan, Zhi-Hua Zhou

Published: 25 Sept 2024, Last Modified: 16 Jan 2025NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Learnware, Model Specification, Reduced Kernel Mean Embedding, Data Preservation, Synthetic Data, Learnware Dock System

TL;DR: We conducted a theoretical analysis of the data protection capabilities of the Reduced Kernel Mean Embeding (RKME) specification in learnware.

Abstract: The learnware paradigm aims to enable users to leverage numerous existing well-trained models instead of building machine learning models from scratch. In this paradigm, developers worldwide can submit their well-trained models spontaneously into a learnware dock system, and the system helps developers generate specification for each model to form a learnware. As the key component, a specification should characterize the capabilities of the model, enabling it to be adequately identified and reused, while preserving the developer's original data. Recently, the RKME (Reduced Kernel Mean Embedding) specification was proposed and most commonly utilized. This paper provides a theoretical analysis of RKME specification about its preservation ability for developer's training data. By modeling it as a geometric problem on manifolds and utilizing tools from geometric analysis, we prove that the RKME specification is able to disclose none of the developer's original data and possesses robust defense against common inference attacks, while preserving sufficient information for effective learnware identification.

Primary Area: Learning theory

Submission Number: 10005

Loading