Online Cascade Learning for Efficient Inference over Streams

Published: 02 May 2024, Last Modified: 25 Jun 2024ICML 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large Language Models (LLMs) have a natural role in answering complex queries about data streams, but the high computational cost of LLM inference makes them infeasible in many such tasks. We propose *online cascade learning*, the first approach to address this challenge. The objective here is to learn a ``cascade'' of models, starting with lower-capacity models (such as logistic regression) and ending with a powerful LLM, along with a *deferral policy* that determines the model to be used on a given input. We formulate the task of learning cascades online as an imitation-learning problem, where smaller models are updated over time imitating the collected LLM demonstrations, and give a no-regret algorithm for the problem. Experimental results across four benchmarks show that our method parallels LLMs in accuracy while cutting down inference costs by as much as 90% with strong robustness against input distribution shifts, underscoring its efficacy and adaptability in stream processing. Our source code is available at https://github.com/flitternie/online_cascade_learning.
Submission Number: 7641
Loading