SuBiTO: Synopsis-based Training Optimization for Continuous Real-Time Neural Learning over Big Streaming Data

Published: 01 Jan 2025, Last Modified: 15 Oct 2025AAAI 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In machine learning applications over Big streaming Data, Neural Networks (NNs) are continuously and rapidly trained over voluminous data arriving at high speeds. As soon as a new version of the NN becomes available, it gets deployed for prediction purposes (e.g. classification). The real-time character of such applications greatly depends on the volume and velocity of the data streams, as well as the NN complexity. Training on large volume of ingested streams or using complex NNs, potentially increases accuracy, but may compromise the real-time character of those applications. In this work, we present SuBiTO, a framework that automatically and continuously learns the training time vs accuracy trade-offs as new data stream in and fine tunes: (i) the number, size and type of NN layers; (ii) the size of the ingested data via stream synopses specific parameters; and (iii) the number of training epochs. Finally, SuBiTO suggests optimal sets of such parameters and detects concept drifts, enabling the human operator adapt these parameters on-the-fly, at runtime.
Loading