From Roots to Fruits: Exploring Lineage for Dataset Recommendations

Published: 01 Jan 2023, Last Modified: 13 Nov 2024DEC@SIGMOD 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Our research article presents a recommender system for datasets, models, and processing steps that is based on utilizing metadata characteristics, content, and usage history to understand the intent of artifacts in a data lineage. Our system utilizes both the availability of metadata characteristics and the corpus of recorded history to uncover interesting associations in the characteristics space and generate recommendations, even in situations where the usage history is incomplete and the metadata characteristics are noisy and poorly named. Our results, obtained from both self-created testbeds and public benchmark datasets like OpenML, demonstrate the effectiveness of our proposed model in assisting data discovery by leveraging available data content and the analytical lifecycle in order to make automated intelligent suggestions by reflecting the expertise of the entire data community.
Loading