TL;DR: A Scalable Memory-augmented Neural Network for Sketching Data Streams
Abstract: Sketches, probabilistic structures for estimating item frequencies in infinite data streams with limited space, are widely used across various domains. Recent studies have shifted the focus from handcrafted sketches to neural sketches, leveraging memory-augmented neural networks (MANNs) to enhance the streaming compression capabilities and achieve better space-accuracy trade-offs.
However, existing neural sketches struggle to scale across different data domains and space budgets due to inflexible MANN configurations. In this paper, we introduce a scalable MANN architecture that brings to life the Lego sketch, a novel sketch with superior scalability and accuracy.
Much like assembling creations with modular Lego bricks, the Lego sketch dynamically coordinates multiple memory bricks to adapt to various space budgets and diverse data domains.
Theoretical analysis and empirical studies demonstrate its scalability and superior space-accuracy trade-offs, outperforming existing handcrafted and neural sketches.
Lay Summary: Traditional methods typically employ hand-crafted algorithms to compress large-scale, high-speed data streams and support subsequent queries. Recent approaches using end-to-end learned memory-augmented neural networks have improved the compression accuracy; however, their limited the scalability hinder real-world applications. We introduce a scalable, end-to-end trained memory-augmented neural network architecture for data stream compression that markedly enhances scalability while further improving accuracy. This work overcomes critical barriers to the practical deployment of such end-to-end learning-based compression techniques, paving the way for more accurate and efficient processing of large-scale, high-speed data streams.
Link To Code: https://github.com/FFY0/LegoSketch_ICML
Primary Area: Deep Learning->Everything Else
Keywords: Data Streams; Memory Augmented Neural Networks; Data Compression
Flagged For Ethics Review: true
Submission Number: 2102
Loading