\section{Related Work}


% \paragraph{Causality-based few-shot learning.} Previous works have faced difficulties in few-shot learning, mainly due to the limited amount of available data and the high computational cost of training large neural networks. Additionally, the high-dimensional nature of the data makes it difficult to learn meaningful correlations and relationships. By this motivation, causal representation is introduced in few-shot learning. It can be first traced back to~\cite{yue2020interventional}, in which the authors detailed the root cause of the instability in few-shot learning from the perspective of confounding factors in causal inference. They then applied the "backdoor criterion" and proposed a layered sampling approach based on intervention to improve few-shot learning. Subsequently, Lin et al.~\cite{lin2022revisiting} further declared that the previous few-shot learning issues (especially the N-way K-shot scheme problem) can be unified into the "front-door criterion" problem. However, these methods cannot be easily applied to the novel view synthesis since it is challenging to model confounders, such as environmental variables.

% which will be illustrated in detail in the following section. \zhiheng{emphasize the challenge when implementing the NeRF.}


\paragraph{Mutual Information}
Mutual information is a basic concept in information theory and it has many applications in machine learning. \cite{oord2018representation} starts the research for unsupervised  representation learning
train feature extractors by maximizing an estimate of the mutual information (MI)
between different views of the data. This work has
been expanded in various directions, including the explanation of this principle~\citep{tschannen2019mutual}, the experiments improvement in more datasets~\citep{henaff2020data}, and the application of contrastive learning to the multiview setting~\citep{tian2020contrastive}. While their work primarily focuses on unsupervised learning tasks, we center on supervised learning with sparse samples. However, the concept of leveraging information from unlabeled data is also adopted in our approach.

\paragraph{Active Learning} Active learning~\citep{settles2009active} allows a learning algorithm to actively query a user or information source for labeling new data points. It has been widely applied in computer vision tasks~\cite{yi2016scalable,sener2017active,fu2018scalable,zolfaghari2019temporal}. ActiveNeRF~\citep{pan2022activenerf} was the first to integrate active learning into NeRF optimization. We adopt it for sparse view sampling. Unlike ActiveNeRF, which focuses on uncertainty reduction, our approach explores mutual information from both macro and micro perspectives.
% This strategy proves to be computationally efficient and yields superior performance.

\paragraph{Few-shot Novel View Synthesis} 
% \zhiheng{add. Following the three categories as in the introduction...} 
NeRF~\citep{mildenhall2020nerf} has become one of the most important methods for synthesizing new viewpoints in 3D scenes~\citep{xiangli2021citynerf,fridovich2022plenoxels,takikawa2021neural,yu2021plenoctrees,tancik2022block,hedman2021baking}. A growing number of recent works have studied few-shot novel view
synthesis via NeRF~\citep{wang2021nerf,martin2021nerf,meng2021gnerf,kim2022infonerf,deng2022nerdi, wang2023sparsenerf}. First, diffusion-model-based methods use generative inference as supplementary information. 
% NeRDi \cite{deng2022nerdi} proposes a single-view NeRF synthesis framework with general image priors from 2D diffusion models. 
SparseFusion~\citep{zhou2022sparsefusion} distills  a
3D consistent scene representation from a view-conditioned latent diffusion model. 
% These methods are useful but require additional training of large-scale generative models.
Second, some methods additionally extrapolate the scene’s geometry and appearance to a new viewpoint. DietNeRF~\citep{jain2021putting} introduces semantic consistency loss between observed and unseen views.
%based on pre-trained CLIP models.
Third, some methods use regularization to mitigate overfitting and incorporate prior knowledge. RegNeRF~\citep{niemeyer2022regnerf} regularizes geometry and appearance from unobserved viewpoints, while FreeNeRF~\citep{yang2023freenerf} constrains the input frequency range. However, the lack of a unified theoretical foundation hinders comprehensive explanation and optimization. We aim to propose a generic framework with interpretable metrics to address this gap.

% In our work, we aim to unify these specific previous works into our more general framework.