An effective merge strategy based hierarchy for improving small file problem on HDFS

Published: 01 Jan 2016, Last Modified: 17 Apr 2025CCIS 2016EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Hadoop Distributed File System (HDFS) is designed for reliable storage and management of very large file and low-cost storage capability. As HDFS architecture based on master (NameNode) to handle metadata for multiple slaves (DataNode), NameNode often becomes the bottleneck, especially when handing large number of small files. It is a common solution to merge many small files into one big file about this problem. To solve the large small files problem and improve the efficiency of accessing small files, in this paper, we define Logic File Name (LFN) and propose the Small file Merge Strategy Based LFN (SMSBL). SMSBL is a new idea and a new perspective on hierarchy, it improves the correlation of small files in the same block of HDFS effectively based different file system hierarchy, so the performance is amazing facing large small files when HDFS adopted SMSBL with prefetching mechanism. The system efficiency analysis model is established and experimental results demonstrate that SMSBL can solve small file problem in HDFS and has appreciable high hit rate of prefetching files.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview