Exploring the Fundamental Units of Semantic Representation of Data Using Heterogeneous Variable Network in Data Ecosystems
Abstract: The value creation achieved through the exchange, distribution, and collaboration of data among different organizations has garnered significant attention as a new source of innovation. The mathematical treatment of the meaning of data helps measure its “quality” to formulate evaluation criteria for data exchange between stakeholders with distinct background knowledge in data ecosystems. This study examines the structure of data morphemes, the fundamental units of semantic representation of data, by conducting network and association analyses of variables present in metadata from diverse fields. Network analysis identifies the globally sparse and locally dense characteristics of variable co-occurrence networks and highlights essential relationships and core variables. Key findings include the discovery of “depth,” “sediment/rock,” and “sample code/label” as both universal variables and crucial nodes between datasets used in the experiment. Association analysis reveals vital variable pairs, such as “age” and “ring width” or “latitude” and “longitude.” This research may provide a understanding of the structure and meaningful representation of data, facilitating smooth data exchange and utilization practices among stakeholders with different domains, purposes of data use, and background knowledge in data ecosystems.
Loading