HSC: An Artificial Intelligence Service Composition Dataset from Hugging Face

Xiao Wang, Dunlei Rong, Hanchuan Xu, Xiangdong He, Zhongjie Wang

Published: 01 Jan 2024, Last Modified: 04 Oct 2025ICSOC (2) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Service composition, a fundamental concept in service computing, combines multiple independent services to meet complex user requirements. Research in this field requires support from public datasets, but existing ones typically lack necessary requirements and corresponding optimal solutions with the best optimization objective values, making it challenging to assess various methods comprehensively. Additionally, these datasets are often outdated with inaccessible services. To address these limitations, we introduce the HSC dataset - the largest known service composition dataset to date, consisting of 17,536 artificial intelligence (AI) services from Hugging Face. This dataset not only provides service descriptions and quality of service (QoS) but also utilizes large language models (LLMs) to generate 15,000 unique machine learning service composition cases. These cases include different optimization objectives, constraints, workflows, and optimal solutions. Notably, 10,000 cases also contain natural language requirements. To further research in service composition, we developed a benchmark based on the HSC dataset and evaluated recent methods by comparing their results against optimal solutions. Moreover, the HSC dataset has potential applications in other tasks, such as service recommendation and classification. The dataset is available at https://github.com/wangxiaohit/HSC.