Abstract: We consider a distributed time series forecasting problem where multiple distributed nodes each observing a local time series (of potentially different modality) collaborate to make both local and global forecasts. This problem is particularly challenging because each node only observes time series generated from a subset of sources, making it challenging to utilize correlations among different streams for accurate forecasting; and the data streams observed at each node may represent different modalities, leading to heterogeneous computational requirements among nodes. To tackle these challenges, we propose a hierarchical learning framework, consisting of multiple local models and a global model, and provide a suite of efficient training algorithms to achieve high local and global forecasting accuracy. We theoretically establish the convergence of the proposed framework and demonstrate the effectiveness of the proposed approach using several time series forecasting tasks, with the (somewhat surprising) observation that the proposed distributed models can match, or even outperform centralized ones.
Submission Type: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: We have revised the manuscript in response to the reviewers’ and editor’s comments. The key changes are summarized as follows:
1. **Enhanced experimental analysis.** We have provided a more detailed discussion of the numerical experiment results—particularly the performance comparison between DIVIDE and centralized baselines—in Section 5.2 to better illustrate the behavior and advantages of DIVIDE.
2. **Clarified distinctions from related designs.** We have clarified how our method differs from related architectures such as fully connected GNNs and latent master-node designs, with expanded explanations added to Section 3.2.
3. **Refined discussion on communication efficiency.** We have provided additional analysis and discussion comparing our framework with conventional federated learning approaches from the perspectives of communication and information flow in Section 3.3.
4. **Refined discussion on privacy.** We have expanded the discussion of privacy protection, clarified potential risks associated with sharing embeddings/gradients, explained how existing techniques (e.g., differential privacy) may be incorporated, and softened claims to avoid overstatement.
5. **Improved clarity and presentation.** We have also revised the manuscript to improve clarity. This includes refining the statement regarding the Markov property claim, expanding the DIVIDE acronym, and adding details on local model choices for different modalities.
Assigned Action Editor: ~Han-Jia_Ye1
Submission Number: 5277
Loading