RDAS: A Low Latency and High Throughput Raw Data Engine for Machine Learning Systems

28 Sept 2024 (modified: 16 Oct 2024)ICLR 2025 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: machine learning system, data engine, low latency, high throughput
Abstract: In the era of large pretrained models, a key challenge in deep learning is the underutilization of fine-grained raw data, often replaced by information-lossy normalized data. To bridge this gap, we introduce the Raw Data Aggregation System for Machine Learning (RDAS). RDAS offers a seamless data interface, enabling machine learning systems to directly access unstructured, high-resolution raw event data with minimal latency. At the heart of RDAS lies the Message Book Model, an innovative data representation framework that underpins the system’s ability to handle event data at nanosecond precision. RDAS is structured around three conceptual layers: (i) the Message Layer, featuring dual message aggregators for sequential and random access, which compile raw messages into timestamp specific message book snapshots; (ii) the Feature Layer, which derives user-specified data features from the message book for any given moment; and (iii) the Verification Layer, tasked with real-time error monitoring and integrity assurance of the message book. A C++ implementation of these layers ensures RDAS’s exceptional performance. To validate its effectiveness, we applied RDAS in an Internet of Things (IoT) scenario, demonstrating significant performance enhancements over existing methods in terms of data throughput and latency. Our results underscore RDAS’s potential to revolutionize data processing in machine learning, offering a pathway to leverage the full spectrum of raw data’s granularity and richness.
Primary Area: infrastructure, software libraries, hardware, systems, etc.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 13176
Loading