Variable-based Learning Considering Topic Specificity in Heterogeneous Data Clustering Tasks

Published: 01 Jan 2023, Last Modified: 08 Jan 2025IEEE Big Data 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Recently, data mining via interdisciplinary co-creation has attracted considerable social attention, and various data have been published, for free or for a fee. Data publication encourages the exchange and combination of data between different institutions, which is helpful for interdisciplinary data collaboration. However, issues pertaining to designing high-quality data for interdisciplinary data discovery remain in data search. Variables are frameworks for data, and reflect the data topics and intent of the data design. In this study, the relationships between data topics and variables for a large dataset were quantitatively investigated to provide suggestions for data design and exploration. The probability of occurrence of variables and their pairs for each topic was determined to elucidate the relationship between the topics and variables; subsequently, clustering was applied based on these relationships.
Loading