\section{Related Work}\label{sec:related}

This work is closely related to the literature on domain generalization, continuous (or gradual) domain adaptation, continual learning. We introduce each topic and discuss their differences with our work.

\textbf{Domain generalization.} The goal is to learn a model on multiple source domains that can generalize to the out-of-distribution samples from an unseen target domain. Depending on the learning strategy, existing works for DG can be roughly classified into three categories: (i) methods based on \textit{domain-invariant representation learning}~\citep{phung2021learning, nguyen2021domain,pham2023fairness}; (ii) methods based on \textit{data manipulation}~\citep{qiao2020learning, zhou2020learning}; (iii) methods by considering DG in general ML paradigms and using approaches such as \textit{meta-learning} \citep{li2018learning, balaji2018metareg}, \textit{gradient operation} \citep{rame2021fishr, tian2022neuron}, \textit{self-supervised learning} \citep{jeon2021feature, li2021domain}, and \textit{distributional robustness} \citep{koh2021wilds, wang2021class}. However, these works assume both source and target domains are sampled from a \textit{stationary} environment and they do not consider the non-stationary patterns across domains; this differs from our setting.

\textbf{Non-stationary domain generalization.} To the best of our knowledge, only a few concurrent works study domain generalization in non-stationary environments \citep{bai2022temporal, qin2022generalizing, zeng2023foresee, xie2024enhancing, zeng2024generalizing, zeng2023latent}. However, the problem settings considered in these works are rather limited. For example, \citet{qin2022generalizing} only focuses on the environments that evolve based on a \textit{consistent} and \textit{stationary} transition function; the approaches in \citet{bai2022temporal, zeng2023foresee,zeng2024generalizing} can only generalize the model to a \textit{single subsequent} target domain; \citet{qin2022generalizing, xie2024enhancing, zeng2023latent} assume that data are aligned across domain sequence. In contrast, this paper considers a more general setting where data may evolve based on non-stationary dynamics, and the proposed algorithm learned from the sequence of unaligned source domains can generate models for multiple unseen target domains.

\textbf{Continuous domain adaptation.} Unlike conventional DA/DG methods that only consider categorical domain labels, continuous DA admits continuous domain labels such as space, time \citep{ortiz2019cdot, wang2020continuously}. Specifically, this line of research considers scenarios where the data distribution changes gradually and domain labels are continuous. Similar to conventional DA, samples from target domain are required to guide the model adaptation process. This is in contrast to this study, which considers the target domains whose samples are inaccessible during training.

\textbf{Gradual domain adaptation.} Similar to continuous DA, Gradual DA also considers continuous domain labels, and the samples from the target domain are accessible during training \citep{kumar2020understanding, chen2020self, chen2021gradual}. The prime difference is that continuous DA focuses on the generalization from a single source domain to a target domain, whereas there are multiple source domains in gradual DA.

\textbf{Continual learning.} The goal is to learn a model continuously from a sequence of tasks. The main focus in continual learning is to overcome the issue of catastrophic forgetting, i.e., prevent forgetting the old knowledge as the model is learned on new tasks \citep{chaudhry2018efficient, kirkpatrick2017overcoming, mallya2018packnet}. This differs from temporal-shift DG (i.e., a special case of our setting) which aims to train a model that can generalize to future domains.    
