Centralization in Decentralized Web: Challenges and Opportunities in IPFS’s Data Management

Published: 29 Jan 2025, Last Modified: 29 Jan 2025WWW 2025 PosterEveryoneRevisionsBibTeXCC BY-SA 4.0
Track: Web mining and content analysis
Keywords: decentralized web, replication, centralization, deduplication, web data management
Abstract: The InterPlanetary File System (IPFS) is a pioneering effort for Web 3.0, well-known for its decentralized infrastructure. However, some recent studies have shown that IPFS exhibits a high degree of centralization and has integrated centralized components for better performance. While this change contradicts the core decentralized ethos of IPFS and introduces risks of hurting the data replication level and thus availability, it also opens some opportunities for better data management and cost savings through deduplication. To explore these challenges and opportunities, we start by collecting an extensive dataset of IPFS internal traffic spanning the last three years with 20+ billion messages. By analyzing this long-term trace, we obtain a more complete and accurate view of how the status of centralization evolves over an extended period. In particular, (1) IPFS shows a low replication level in general, with only about 2.71% of data files replicated more than 5 times. While increasing replication enhances lookup performance and data availability, it adversely affects downloading throughput due to the over- head involved in managing peer connections, (2) there is a clear growing trend in centralization within IPFS in the last 3 years, with just 5% of peers now hosting over 80% of the content, significantly decreasing from 21.38% 3 years ago, which is largely driven by the increase of cloud nodes, (3) the IPFS default deduplication strategy using Fixed-Size Chunking (FSC) is largely inefficient, especially with the current 256KB chunk size, achieving nearly zero efficiency. Although Content-Defined Chunking (CDC) with smaller chunks could save significant storage (about 1.8 PB) and cost, it could impact user performance negatively. We thus design and evaluate a new metadata format that optimizes deduplication without compromising performance.
Submission Number: 1546
Loading