Beimingwu: A Learnware Dock System

Published: 01 Jan 2024, Last Modified: 15 May 2025KDD 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The learnware paradigm proposed by Zhou (2016) aims to enable users to leverage numerous existing high-performing models instead of building machine learning models from scratch. This paradigm envisions that: Any developer worldwide can submit their well-trained models spontaneously into a learnware dock system (formerly known as learnware market). The system uniformly generates a specification for each model to form a learnware and accommodates it. As the key component, a specification should represent the capabilities of the model while preserving developer's original data. Based on the specifications, the learnware dock system can identify and assemble existing learnwares for users to solve new machine learning tasks. Recently, based on reduced kernel mean embedding (RKME) specification, a series of studies have shown the effectiveness of the learnware paradigm theoretically and empirically. However, the realization of a learnware dock system is still missing and remains a big challenge.This paper proposes Beimingwu, the first open-source learnware dock system, providing foundational support for future research. The system provides implementations and extensibility for the entire process of learnware paradigm, including the submitting, usability testing, organization, identification, deployment, and reuse of learnwares. Utilizing Beimingwu, the model development for new user tasks can be significantly streamlined, thanks to integrated architecture and engine design, specifying unified learnware structure and scalable APIs, and the integration of various algorithms for learnware identification and reuse. Notably, this is possible even for users with limited data and minimal expertise in machine learning, without compromising the raw data's security. The system facilitates the future research implementations in learnware-related algorithms and systems, and lays the ground for hosting a vast array of learnwares and establishing a learnware ecosystem. The system is fully open-source and we expect the research community to benefit from the system. The system and research toolkit have been released on GitLink and GitHub.
Loading