In-memory Blockchain: Toward Efficient and Trustworthy Data Provenance for HPC Systems

Published: 01 Jan 2018, Last Modified: 07 Aug 2024IEEE BigData 2018EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The state-of-the-art approaches for tracking data provenance on high-performance computing (HPC) systems are either supported by file systems or relational databases. These techniques shared the same critique on the provenance data’s fidelity and the associated I/O overhead. This paper envisions to track the HPC data provenance using a distributed in-memory ledger—the core technique leveraged by blockchains and proven to be highly trustworthy by many large-scale applications. We pinpoint two system challenges—storage architecture and consensus protocol—for adopting blockchains to HPC and make the following contributions: (i) We design a new in-memory blockchain architecture for HPC systems, exploiting the high-performance network infrastructure InfiniBand and greatly reducing the I/O overhead; and (ii) We develop a new consensus protocol, namely proof-of-reproducibility (PoR), crafted for the new architecture, which takes into account both proof-of-work (PoW) and proof-of-stake (PoS) mechanisms. The correctness of PoR is both theoretically proven and experimentally verified. A prototype system is implemented and evaluated with more than one million transactions, showing 32× speedup compared to the filesystem-based provenance service and four orders of magnitude speedup compared to the database-based provenance service.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview