Keywords: Optimal filtering, LLMs, mixture-of-experts, time-series-forecasting, financial-market-movement
TL;DR: We propose a dynamic filtering algorithm for combining N expert LLM agents on online forecasting tasks with optimality guarantees of performing at least as good as the strongest expert.
Abstract: We propose MoE-F — a formalized mechanism for combining N pre-trained expert Large Language Models (LLMs) in online time-series prediction tasks by adaptively forecasting the best weighting of LLM predictions at every time step. Our mechanism leverages the conditional information in each expert's running performance to forecast the best combination of LLMs for predicting the time series in its next step. Diverging from static (learned) Mixture of Experts (MoE) methods, our approach employs time-adaptive stochastic filtering techniques to combine experts. By framing the expert selection problem as a finite state-space, continuous-time Hidden Markov model (HMM), we can leverage the Wohman-Shiryaev filter. Our approach first constructs N parallel filters corresponding to each of the N individual LLMs. Each filter proposes its best combination of LLMs, given the information that they have access to. Subsequently, the N filter outputs are optimally aggregated to maximize their robust predictive power, and this update is computed efficiently via a closed-form expression, thus generating our ensemble predictor.
Our contributions are:
- **(I)** the MoE-F algorithm — deployable as a plug-and-play filtering harness,
- **(II)** theoretical optimality guarantees of the proposed filtering-based gating algorithm (via optimality guarantees for its parallel Bayesian filtering and its robust aggregation steps), and
- **(III)** empirical evaluation and ablative results using state-of-the-art foundational and MoE LLMs on a real-world _Financial Market Movement_ task where MoE-F attains a remarkable 17% absolute and 48.5% relative F1 measure improvement over the next best performing individual LLM expert predicting short-horizon market movement based on streaming news. Further, we provide empirical evidence of substantial performance gains in applying MoE-F over specialized models in the _long-horizon time-series forecasting_ domain. Code available on github: https://github.com/raeidsaqur/moe-f
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7448
Loading