BlendHouse: A Cloud-Native Vector Database System in ByteHouse

Published: 2025, Last Modified: 06 Jan 2026ICDE 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The rise of unstructured data retrieval in the AI era has created an urgent need for vector databases that manage high-dimensional vector embeddings and provide efficient vector search capabilities for AI applications. Performance, elasticity, and isolation are the key factors for vector databases to serve modern AI applications effectively. Disaggregation of storage and compute is widely recognized as the most effective approach in both academia and industry. Existing work either redesigns specialized vector databases according to the disaggregated architecture or integrates vector search into generalized databases that already use this architecture. However, challenges still remain in building elastic and efficient vector search systems within the disaggregated architecture, such as higher data fetching latency and the highly stateful nature of vector index, which hinder the system's ability to simultaneously achieve high performance, high elasticity and resource isolation. Additionally, a recent trend has emerged to integrate vector search into general-purpose databases, yet the extensibility and generality of integration methodologies have not been systematically studied. In this paper, we present BlendHouse, a cloud-native and generalized vector database system built on top of the disaggregated storage and computation architecture. BlendHouse achieves high performance, high elasticity and resource isolation simultaneously via a suite of optimizations specific to the vector search workload regarding the disaggregated architecture and the relational database. Experimental results demonstrate that BlendHouse outperforms Milvus and pgvector in terms of read and write performance. The integration methodology illustrated in this paper is extensible and general, paving the way for more powerful data management systems in the AI era.
Loading