
% motivation of use DRSL instead of adversial robust-MLE and distributionally robust MLE
% 1. for distributionally robust MLE - the innner minization problem cannot be solved exactly, it can only be approximated. this conclution is also true for adversial robust-MLE on models operates in continuos domain. In other words, only some tractable models that operates in discrete domain (such as cutset networks, arithmetic circuits, PSDDs) can admit exact solution for inner optimization. 
% While in our case, the inner optimization problem can be solved exactly in linearithmic (n log n) time, which is very efficient.

% 2. for adversial robust-MLE, it is optimizing the worst case of its neighbors within certain distance and treat all of them equally important and this may not be so reasonable. for distributionally robust MLE, it boils down to similar form as adversial robust-MLE, but the way they choosing the neightbors are different, so it may not be equally important, but still suffer from point 1.


% logic flow

% tpms, properties, exact likelihoods etc.
% model learns using mle, assume no cooruption and noise.
% so far, robust learning tpm is not well-studied, only a few works [cite peddi work] focuse on the binary or discrete domain. hanmming distance
% in addition they use adversarial mle optimizing on neighbors.
% two motivations above. 
% what we do - we expolore distribution - what we do here. k-l divergence.


Tractable Probabilistic Models (TPMs), including Probabilistic Sentential Decision Diagrams (PSDDs)~\citep{kisa2014probabilistic}, Arithmetic Circuits (ACs)~\citep{darwiche2003differential}, Sum-Product Networks (SPNs)~\citep{poon2011sum}, and Cutset Networks (CNs)~\citep{rahman_cutset_2014}, have gained significant attention and research interest in recent years. 
These models fall under a unified framework known as probabilistic circuits~\citep{choi2020probabilistic}, and they offer promising solutions for modelling the uncertainties.
One of the key features that makes these models particularly attractive is their ability to perform certain inferences in polynomial time such as exact likelihood calculations, or in some cases, finding the most probable assignment for unobserved variables given evidence~\citep{rahman2019cutset,dong2023new,molina2018mixed}. 

The learning of TPMs mainly relies on the Maximum Likelihood Estimation (MLE) framework, which assumes the training data is free of corruption and noise, while effectively representing the underlying data distribution.
However, this assumption can fail in practice due to a wide range of factors such as measurement errors, label noise and sample selection bias. 
Existing research on robust learning of TPMs has mostly been limited to binary or discrete domains~\citep{peddi2022robust}. In this work, we address the challenges of learning robust TPMs in continuous domains.


The Robust MLE framework provides a foundation for exploring the robust learning of TPMs~\citep{bertsimas2019robust}, which can be further divided into Adversarial Robust MLE (ARM) and Distributionally Robust MLE (DRM). ARM optimizes against the worst-case scenarios among neighboring data observations within a certain distance, treating each neighboring point with equal importance, a practice that might not align with real-world scenarios. On the other hand, DRM faces an inherent challenge where the inner minimization problem is often intractable. Importantly, this challenge extends to ARM when applied to probabilistic models operating in continuous domains~\footnote{Certain tractable models, with a static ordering of variables, that operate in discrete domains such as arithmetic circuits, PSDDs, etc. admit exact solutions in polynomial time~\citep{peddi2022robust}.}. These limitations can have negative impacts on both the effectiveness and efficiency of the robust MLE framework.


In this work, we leverage the Distributionally Robust Supervised Learning (DRSL) framework~\citep{namkoong2016stochastic, hu2018does} for learning robust TPMs in continuous domains. 
Specifically, we first show that the DRSL framework can be utilized to learn distributionally robust probabilistic models by designing a special loss function based on the negative log density of data points.
We further demonstrate the efficiency of our approach by showing that the inner optimization problem can be solved exactly in linearithmic time, while the outer optimization problem is equivalent to a standard MLE problem on weighted data.
In essence, our approach offers an efficient way to equip probabilistic models with distributional robustness, while only requiring that the underlying probabilistic models admit tractable loglikelihood computation and efficient learning on weighted data.

This paper makes the following contributions:
\begin{enumerate}
    \item  We introduce a novel application of the DRSL framework for learning distributionally robust probabilistic models. This presents an important alternative for the development of robust probabilistic models.

    \item We develop an efficient algorithm capable of finding the exact solution to the inner optimization objective of the adversarial minimization problem within the DRSL framework, when the KL-divergence is employed as the metric for measuring distributional distances.  Moreover, we demonstrate that the outer optimization problem aligns with the standard MLE learning process on weighted data.

    \item  We conduct empirical evaluations on the proposed algorithm and methods by learning robust continuous TPMs and evaluating the loglikelihoods on both initial uncorrupted and adversarial test sets against their counterparts learned through the standard MLE framework across nine real-world datasets.

\end{enumerate}




% 1. TPMs models are typically learn through the maximum likelihood estimation  (MLE) framework. 
% 2. the MLE framework assume that the input training data is free of corruption and noise, which is usually not true in practice. 
% 3. There are little existing research focuses on the robust learning of TPMs (i.e. this topic is not well-studied in current research community), and they are mainly focusing on binary or discrete domains. 
% 4. In this work, we foucs on the robust learning of continuous TPMs.

% 1. the existing research on robust learning of TPMs is typically based on Robust MLE learning, which can be further breaks down into adversarial robust MLE or distributionally robust MLE. 
% 2. For adversarial robust MLE,  it is optimizing the worst case of its neighbors within certain distance and treat all of these neighboring points equally important and this may not be so reasonable in practice. 
% 3. For distributionally robust MLE - the inner minimization problem cannot be solved exactly, it can only be approximated, and this conclusion is also true for adversarial robust-MLE on models operates in continuous domain. Only certain tractable models with static ordering of variables that operates in discrete domain (such as networks, arithmetic circuits, PSDDs) can admit exact solution for inner optimization in polynomial time.

% 1. In this paper, we utilize the distributionally robust supervised learning (DRSL) framework to learn distributionally robust TPMs in continuous domains.
% 2. We show that the DRSL framework can be applied to learn distributionally robust probabilistic models by designing a special loss function ( the negative log density of input data points) . 
% 3. In addition, we show that the inner optimization problem can be solved exactly in linearithmic time, and the outer optimization problem  can be treated as the standard MLE problem on weighted data. 
% 4.In other words, the distributionally robust probabilistic model can be learned efficiently as long as the probabilistic model allows exact loglikelihood computation and efficient learning on weighted data.


% our contribution. 
% 1. we show how drsl can be use for an alternative way to learn robust probabilistic models
% 2. we propose efficiet exact algorithms for the adversal minimization problem when kl divergence is used . and the out is equiv to learn on weighted data. 
% 3. we empirically evaluated the proposed aproach ...
