BG3: A Cost Effective and I/O Efficient Graph Database in Bytedance

Published: 01 Jan 2024, Last Modified: 07 Aug 2024SIGMOD Conference Companion 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: ByteDance's products, including TikTok, Douyin, and Toutiao, generate massive amounts of graph data every day. Previously, we developed ByteGraph, a distributed graph database that manages the large-scale graph data with varying performance requirements. BG3 is deployed on the computation and storage decoupled architecture, which allows for high performance in-memory execution and independent scaling of computation and storage layers. Byte-Graph has demonstrated robust performance throughout its years of service in global-scale applications. However, as the business scale expands and applications evolve, the complexity and volume of graph analysis and processing have also increased. We observe that conventional database design faces issues with high operational costs when dealing with the large-scale graph workloads in social network management. To address this issue, we develop BG3 (ByteGraph 3.0), a cost-effective and high performance distributed graph database which provides three critical components. Firstly, a cost-effective yet query-efficient graph storage engine based on the BW-tree-based memory indices and affordable cloud storage. Secondly, a workload aware space reclamation mechanism, which enhances storage utilization and reduces write amplifications. Thirdly, a lightweight leader-follower synchronization mechanism ensuring strong consistency for scaling out real-time graph analysis. Experimental results demonstrate that BG3 addresses the limitations of ByteGraph, offering a cost-effective, efficient, and scalable solution for processing ByteDance's large-scale graphs.
Loading