NSDF-Catalog: Lightweight Indexing Service for Democratizing Data Delivery

Published: 01 Jan 2022, Last Modified: 04 Oct 2024UCC 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Across domains massive amounts of scientific data are generated. Because of the large volume of information, data discoverability is a challenge, especially for scientists who have not generated the data or are from other domains. As part of the NSF-funded National Science Data Fabric (NSDF) initiative, we developed a testbed to demonstrate that these boundaries to data discoverability can be overcome. In support of this effort, we identify the need for indexing large-amounts of scientific data across scientific domains. We propose NSDF-Catalog, a lightweight indexing service with minimal metadata that complements existing domain-specific and rich-metadata col-lections. NSDF-Catalog is designed to facilitate multiple related objectives within a flexible microservice to: (i) coordinate data movements and replication of data from origin repositories within the NSDF federation; (ii) build an inventory of existing scientific data to inform the design of next-generation cyberinfrastructure; and (iii) provide a suite of tools for discovery of datasets for cross-disciplinary research. Our service indexes scientific data at a fine-granularity at the file or object level to inform data distribution strategies and to improve the experience for users from the consumer perspective, with the goal of allowing end-to-end dataflow optimizations.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview