% \documentclass{uai2024} % for initial submission
\documentclass[accepted]{uai2024} % after acceptance, for a revised version; 
% also before submission to see how the non-anonymous paper would look like 
                        
%% There is a class option to choose the math font
% \documentclass[mathfont=ptmx]{uai2024} % ptmx math instead of Computer
                                         % Modern (has noticeable issues)
% \documentclass[mathfont=newtx]{uai2024} % newtx fonts (improves upon
                                          % ptmx; less tested, no support)
% NOTE: Only keep *one* line above as appropriate, as it will be replaced
%       automatically for papers to be published. Do not make any other
%       change above this note for an accepted version.

%% Choose your variant of English; be consistent
\usepackage[american]{babel}
% \usepackage[british]{babel}

%% Some suggested packages, as needed:
\usepackage{natbib} % has a nice set of citation styles and commands
    \bibliographystyle{plainnat}
    \renewcommand{\bibsection}{\subsubsection*{References}}
\usepackage{mathtools} % amsmath with fixes and additions
% \usepackage{siunitx} % for proper typesetting of numbers and units
\usepackage{booktabs} % commands to create good-looking tables
\usepackage{tikz} % nice language for creating drawings and diagrams

\hypersetup{
    colorlinks,
    linkcolor={black},
    citecolor={blue!50!black},
    urlcolor={black}
}


%% Provided macros
% \smaller: Because the class footnote size is essentially LaTeX's \small,
%           redefining \footnotesize, we provide the original \footnotesize
%           using this macro.
%           (Use only sparingly, e.g., in drawings, as it is quite small.)

%% Self-defined macros
\newcommand{\swap}[3][-]{#3#1#2} % just an example

\title{Instructions for Authors: Title in Title Case}

% The standard author block has changed for UAI 2024 to provide
% more space for long author lists and allow for complex affiliations
%
% All author information is authomatically removed by the class for the
% anonymous submission version of your paper, so you can already add your
% information below.
%
% Add authors
% \author[1]{\href{mailto:<jj@example.edu>?Subject=Your UAI 2024 paper}{Jane~J.~von~O'L\'opez}{}}
% \author[1]{Harry~Q.~Bovik}
% \author[1,2]{Further~Coauthor}
% \author[3]{Further~Coauthor}
% \author[1]{Further~Coauthor}
% \author[3]{Further~Coauthor}
% \author[3,1]{Further~Coauthor}
% % Add affiliations after the authors
% \affil[1]{%
%     Computer Science Dept.\\
%     Cranberry University\\
%     Pittsburgh, Pennsylvania, USA
% }
% \affil[2]{%
%     Second Affiliation\\
%     Address\\
%     …
% }
% \affil[3]{%
%     Another Affiliation\\
%     Address\\
%     …
%   }
\author{\href{mailto:<pezeshkb@uci.edu>?Subject=Abstraction Sampling - UAI 2024}{Bobak Pezeshki}{}}
\author{\href{mailto:<kkask@uci.edu>?Subject=Abstraction Sampling - UAI 2024}{Kalev Kask}{}}
\author{\href{mailto:<ihler@ics.uci.edu>?Subject=Abstraction Sampling - UAI 2024}{Alexander Ihler}{}}
\author{\href{mailto:<dechter@ics.uci.edu>?Subject=Abstraction Sampling - UAI 2024}{Rina Dechter}{}}
% Add affiliations after the authors
\affil[1]{%
    University of California, Irvine
}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% CUSTOM PACKAGES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\usepackage{caption}
\usepackage{subcaption}
\usepackage{float}
\usepackage{xspace} % package being used for \newcommand to remove extra space
                    %     when a command is invoked without an argument list
\usepackage{textcase}
\usepackage[toc, nopostdot]{glossaries}
% \usepackage{amsmath}
\usepackage{amsthm, amssymb}
\usepackage{mathtools}
\usepackage{enumitem}
\usepackage{refcount}
\usepackage[leftmargin=6pt, vskip=3pt-\parskip]{quoting}
\usepackage[titlenumbered,ruled, linesnumbered]{algorithm2e}
\usepackage{mathrsfs} %for \mathscr
\usepackage[font=smaller,labelfont=bf]{caption}
% \usepackage[font=small,labelfont=bf]{subcaption}
% \usepackage[labelfont=bf]{caption}
% \usepackage[labelfont=bf]{subcaption}
\usepackage{xcolor}
\input{_colors}
\usepackage{newfloat}
\usepackage{chngcntr}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% CUSTOM COMMANDS %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%create new float environment called figure with it's own counter
\DeclareFloatingEnvironment[name=Plot]{plotfigure} 

%create new float environment called tablefigure with it's own counter
\DeclareFloatingEnvironment[name=Table]{tablefigure} 

%set the floats table and tablefigure to use the same counters
\makeatletter\let\c@tablefigure\c@table\makeatother 

%consider the floats table and tablecounter as the same set of floats (so location in document will be in order in which they appear)
\makeatletter\let\ftype@tablefigure\ftype@table\makeatother 

\DeclareMathOperator*{\argmin}{argmin}
\DeclareMathOperator*{\argmax}{argmax}
\DeclareMathOperator*{\proj}{proj}
\mathchardef\mhyphen="2D % Define a "math hyphen"

% algorithm2e
% \newcommand\commentstyle[1]{\textcolor{cadmiumgreen}{#1}}
\SetCommentSty{commentstyle}
\SetKwInOut{Input}{input}
\SetKwInOut{Output}{output}

\newtheoremstyle{break}
  {\topsep}{\topsep}%
  {\itshape}{}%
  {\bfseries}{}%
  {\newline}{}%
\theoremstyle{break}
% \newtheorem{theorem}{Theorem}[subsubsection]
\newtheorem{theorem}{Theorem}[section]
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{proposition}[theorem]{Proposition}
% \newtheorem{definition}{Definition}[subsubsection]
\newtheorem{definition}{Definition}[section]

\input{_cmds}
\renewcommand*{\glstextformat}{\textbf}

\renewcommand{\quote}{\list{}{\rightmargin=\leftmargin\topsep=0pt}\item\relax}







%%% for supplemental

\usepackage{enumitem}
    \setlistdepth{9}
    \setlist[itemize,1]{label=$\bullet$}
    \setlist[itemize,2]{label=$\cdot$}
    \setlist[itemize,3]{label=$\cdot$}
    \setlist[itemize,4]{label=$\cdot$}
    \setlist[itemize,5]{label=$\cdot$}
    \setlist[itemize,6]{label=$\cdot$}
    \setlist[itemize,7]{label=$\cdot$}
    \setlist[itemize,8]{label=$\cdot$}
    \setlist[itemize,9]{label=$\cdot$}
    \renewlist{itemize}{itemize}{9}
    



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\setcounter{secnumdepth}{3} %May be changed to 1 or 2 if section numbers are desired.
\setcounter{tocdepth}{3}

\title{Value-Based Abstraction Functions for Abstraction Sampling
%Abstraction Sampling with Heuristic-Based, HR-Based,\\ and Proposal-Based Abstraction Functions
}


\input{_gls}


\begin{document}
    % \onecolumn
    \setlength{\abovedisplayskip}{3pt}
    \setlength{\belowdisplayskip}{3pt}

    \maketitle
    
    \begin{abstract}
        \vspace{-12pt}
        Monte Carlo methods are powerful tools for solving problems involving complex probability distributions. Despite their versatility, these methods often suffer from inefficiencies, especially when dealing with rare events. As such, importance sampling emerged as a prominent technique for alleviating these challenges. Recently, a new scheme called Abstraction Sampling was developed that incorporated stratification to importance sampling over graphical models. However, existing work only explored a limited set of abstraction functions that guide  stratification. This study introduces three new classes of abstraction functions combined with seven distinct partitioning schemes, resulting in twenty-one new abstraction functions, each motivated by theory and intuition from both search and sampling domains. An extensive empirical analysis on over 400 problems compares these new schemes highlighting several well-performing candidates. 
    \end{abstract}

    % \vfill\eject
    % \tableofcontents
    
    % \clearpage
    \vspace{-4pt}
    \section{Introduction} \label{sec:introduction}
    \vspace{-4pt}
        The partition function ($Z$) is an important quantity in probabilistic graphical model inference and is often estimated using Monte Carlo methods such as Importance Sampling (IS) \citep{Rubinstein_2016,liu2015probabilistic,DBLP:journals/ai/GogateD11}. Inspired by the works of \citet{knuth75} and \citet{Chen92}, a framework called Abstraction Sampling (AS) \citep{DBLP:conf/uai/BrokaDIK18} was introduced extending IS by enabling samples to represent multiple configurations.  
        AS uses concepts from Stratified Sampling \citep{Rubinstein_2016,rizzo_2007} and Compact Search  \citep{DBLP:journals/ai/DechterM07,DBLP:journals/ai/MarinescuD09a} to build a sampled subtree called a \textit{probe} which is then used to compute an estimate.  Probes are built 
        level-by-level 
        according to a variable ordering where, at each level, an \textit{abstraction function} groups nodes into \textit{abstract states} from which representative nodes are selected
        % and reweighted 
        %(according to a proposal distribution) 
        to extend paths in the probe.
        
        Using what are referred to as context-based abstraction functions, \citet{DBLP:conf/uai/BrokaDIK18} showed competitive performance of AS against IS, Weighted Mini-Bucket Importance Sampling (wMBIS) \citep{liu2015probabilistic,DBLP:conf/uai/IhlerFDO12}, and IJGP-SampleSearch (IJGP-ss) \citep{DBLP:journals/ai/GogateD11}. \citet{kask20-scaling-up-as} improved Abstraction Sampling scalability with the AOAS algorithm that more efficiently applied AS to AND/OR search spaces.  AOAS showed improved performance,
        % its superior performance using the same context-based abstraction functions against previous versions of Abstraction Sampling
        % (and thus implicitly also against IS, wMBIS, and IJGP-ss) 
        % and 
        additionally comparing to state-of-the-art scheme Dynamic Importance Sampling (DIS) \citep{lou2019interleave}.
        
        However, AS development has lacked exploration of diverse and potentially more effective abstraction functions.  While \citet{hsiao23-gnn-dynamic-as} proposed using graph neural networks to learn abstraction functions, such methods 
        %have the drawback of requiring 
        require learning on a corpus of similar problems before use.

        
       %\rina{ 
       \vspace{-4pt}
       \paragraph{Contributions.} This work provides a detailed study of new abstraction schemes for AS. We present a new class of abstractions defined by real-valued functions aimed at capturing relevant similarity features between nodes.  Three classes of this new framework are introduced and augmented by seven partitioning strategies.  
       A purely randomized scheme is also introduced.  
       An extensive empirical evaluation is performed on over 400 problems, comparing our novel schemes against: each other, the previous relCB and randCB abstraction functions \citep{DBLP:conf/uai/BrokaDIK18,kask20-scaling-up-as}, and implicitly against IS, wMBIS, IJGP-ss, and DIS.
       %}


    Our experiments identify three schemes in particular
    %: \textit{equalDistQB3}, \textit{equalDistQB4}, and \textit{simpleRand} 
    that perform significantly better than any previous scheme. 
    %The former two use the unnormalized AOAS proposal as a measure of node similarity for greedily grouping nodes into abstract states of roughly equal mass under the proposal, while the latter uses completely randomized node groupings into roughly equal cardinality abstract states. 
    % and that they tend to perform best when allotted a greater number of abstract states. 
    Our results demonstrate a significant improvement for one of the most competitive sampling schemes, thus also yielding a substantial computational advance for one of the most challenging tasks in probabilistic inference.

% text feels a little overblown: significant improvement/most competitive/most challenging
    % \vspace{-4pt}
    \section{General Background} \label{sec:background}
    \vspace{-4pt}

%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%

        \paragraph{Graphical Models.}
            
            % \begin{figure}[]
            %     \centering
            % 	\includegraphics[scale=0.25]{images/AncestorBranchingMass.pdf}
            % 	\vspace{-6pt}\caption{Ancestor branching mass of an AND node.}
            % 	\label{fig-ancestor-branching-mass}
            % \end{figure}
    
            % % \begin{comment}
            % \begin{figure}[]
            %     \centering
            % 	\includegraphics[scale=0.25]{images/ProperAbstractionGroups.pdf}
            % 	\vspace{-6pt}\caption{Scope of proper abstractions.}
            % 	\label{fig-proper-abstraction-groups}
            % \end{figure}
            % % \end{comment}
            
            A \textit{graphical model}, such as a Bayesian or Markov network \citep{pearl88,darwiche-book,DBLP:series/synthesis/2013Dechter}, can be defined by  a 3-tuple
            $\mathcal{M} \! = \! (\mathbf{X,D,F})$, where
            $\mathbf{X}$
            is a set of variables,
            and $\mathbf{D}$
            is the set of variable domains, and $\mathbf{F}$ is a set of functions such that each function $f_{\bs{\alpha}} \in \mathbf{F}$ is defined over a subset of variables $\bs{\alpha} \subseteq \bs{X}$
            (called the function's scope) capturing local interactions. 
            $\mathcal{M}$ defines a global function, often a factorized probability distribution on $\mathbf{X}$,
            $P(\mathbf{X}) = \frac{1}{Z} \prod_{\alpha}f_\alpha(X_\alpha)$, where 
            $Z = \sum_X \prod_{\alpha} f_\alpha(X_\alpha)$,
            % \begin{align} \label{eq:partition-function-def}
            %     Z = \sum_X \prod_{\alpha} f_\alpha(X_\alpha),
            % \end{align}
            known as the partition function, is a normalization factor.  A \textit{primal graph} $\mathcal{G} \! = \! (\mathbf{V,E})$ of $\mathcal{M}$ associates each variable with a node ($\mathbf{V} \! = \! \mathbf{X}$) with edges $e \! \in \! \mathbf{E}$ connecting nodes whose variables interact locally, appearing in the scope of the same functions.  
            %Intuitively, the primal graph connects pairs of variables that interact locally.

        \vspace{-6pt}
        \paragraph{Search Spaces of Graphical Models.} 
            % A graphical model can be transformed into a weighted state space graph.
            % In an OR search space, which is constructed layer-by-layer relative to a variable ordering, paths from the root to the leaves represent \textbf{full configurations} - or assignments to all variables - where each successive level corresponds to an assignment of the next variable in the ordering.
            
            A graphical model can
            % also 
            be transformed  into a compact AND/OR search space to leverage conditional independence and facilitate efficient search algorithms \citep{DBLP:journals/ai/DechterM07}. 
            
            Given a primal graph $\mathcal{G}$, an AND/OR search space is defined relative to a \textit{pseudo tree} $\mathcal{T} \! = \! (\mathbf{V,E'})$, a directed rooted tree that
            %\bobak{spans $\mathcal{G}$ (but may not include arcs connecting nodes.  Ex, X-Y-Z; chain pseudo tree Y-X-Z).} 
            captures conditional independence encoded in the model. A pseudo tree \PT is constructed according to a variable ordering such that every arc of $\mathcal{G}$ not in $\mathbf{E'}$ is a back-arc in \PT. This construction ensures conditional independence of any variable and its descendants from variables found in the other branches of \PT given assignments to their common ancestors. 
            The pseudo tree in \figlink{fig:primal-graph-and-pseudo-tree} was constructed 
            %from the corresponding primal graph 
            using a variable ordering $o = [A, B, C, D]$. The dashed line shows an edge in the primal graph that is missing from \PT, but that would be a back-arc if it were present.
            From its structure we see that variables $C$ and $D$ are independent of $B$ given assignment to $A$. Here $A$ is referred to as a \emph{branching variable} since it branches to multiple children.   
            
            
            
            
            \begin{figure}[!htb]
            \vspace{-2pt}
            	\centering
            	\begin{subfigure}{0.9\linewidth}
            	\centering
            	       \includegraphics[width=0.8\linewidth]{UAI-24/_attachments/images/pseudotree.png}
                        \vspace{-14pt}\caption{}
                        \label{fig:primal-graph-and-pseudo-tree}
            	\end{subfigure}
                    \begin{subfigure}{0.9\linewidth}
                    \vspace{+8pt}
            	\centering
                        \includegraphics[width=0.95\linewidth]{UAI-24/_attachments/images/AncestorBranchingMass_withArcCosts.pdf}
                        \vspace{-2pt}\caption{}
                        \label{fig:ancestor-branching-mass}
                    \end{subfigure}
                \captionsetup{width=.95\linewidth}
            	\vspace{-10pt}\caption{A full AND/OR tree representing 16 possible full configurations of binary variables $A,B,C,$ and $D$ guided by the pseudo tree shown in  subfigure (a) above.  The path cost for the highlighted node $n_{A=0,C=1}$ at the end of the path $\rightarrow \!\! (A \!\! = \!\! 0) \!\! \rightarrow  \!\! (C \!\! = \!\! 1)$ is $g(n_{A=0,C=1})= 10 \mul 5$.  The value of the subtree under $n_{A=0,C=1}$ is $Z(n_{A=0,C=1}) = 2 \mul 1 + 3 \mul 1$. Boxed in green is the ancestor branching subtree for $n_{A=0,C=1}$ and it has the value $R(n_{A=0,C=1}) = 1 \mul 1 + 4 \mul 1$.  Thus, $Q(n_{A=0,C=1}) = (10 \mul 5) \mul{} ( 1 \mul 1 + 4 \mul 1) \mul{} ( 2 \mul 1 + 3 \mul 1)$. \vspace{-14pt}}
                        \label{fig:pseudo-tree-with-ancestor-branching-mass}
            \end{figure}
            
            Guided by a
            pseudo tree \PT, an \emph{AND/OR search tree}
            $T$ has alternating levels of OR nodes
            corresponding to variables and AND nodes corresponding to
            possible assignments to those variables.  \figlink{fig:pseudo-tree-with-ancestor-branching-mass} shows an AND/OR search tree and its guiding pseudo tree.  Note that in the pseudo tree, variables $B$ and $C$ extend to different branches from $A$.  Similarly, in the AND/OR search tree, we see OR nodes $B$ and $C$ extending to different branches under each 
            %AND node of $A$.
            possible assignment of $A$.
            % to the variables, 
            %with edge costs extracted from
            %the original functions \citep{DBLP:journals/ai/DechterM07} such that %(By this logic, we can think of the nodes of an OR tree as AND nodes).  
            % Let $n$ be an AND node in $T_{\tau}$, also denoted $n_X$ if $X$ is the last variable of its partial configuration.
            An arc into an AND node $n_{X}$ of variable $X$ %(or the arc from its OR parent to the AND node)
            has a cost $c(n_{X})$ equal to the product of functions $f_{\bs{\alpha}} \in \F$ such that the path to $n_{X}$ fully instantiates all $X' \in \bs{\alpha}$ and such that 
            %\bobak{$\set{X} \subseteq \bs{\alpha}$ or $X \in \bs{\alpha}$?} 
            $X \in \bs{\alpha}$ \citep{DBLP:journals/ai/DechterM07}.
            % \textcolor{red}{Moved to section "Value of A Node": (see \figlink{fig-simple}(c)).}  

        \vspace{-6pt}
        \paragraph{Notation.}
            Capital letters ($X$) represent variables and small letters ($x$) their values.  Boldfaced letters represent a collection. Boldfaced capital letters ({\bf X}) denote a collection of variables,
            $|{\bf X}|$ its cardinality, 
            $D_{\X}$ their joint domains (all possible configurations of \X), 
            and bolded $\xx$ a particular realization in that joint domain (a particular configuration of \X).
            % Abusing notation, operations $\bigoplus_{\X}$ (ex. $\sum_{\X}$) imply...
            % \begin{align}
            %     \begin{split}
            %         \bigoplus_{\X}
            %                         &\iff \bigoplus_{\xx \in D_{\X}}\\
            %                         &\iff \bigoplus_{x_{1} \in D_{X_{1}}} \bigoplus_{x_{2} \in D_{X_{2}}} ... \bigoplus_{x_{|\X|} \in D_{X_{|\X|}}}
            %     \end{split}
            % \end{align}
            % For variable sets $\bs{\alpha} \subseteq \bs{\beta}$ with \textbf{b} being a particular configuration of $\bs{\beta}$ and $\proj_{\bs{\alpha}}(\textbf{b})$ the projection of \textbf{b} on $\bs{\alpha}$, we define $f_{\bs{\alpha}}(\textbf{b}) = f_{\bs{\alpha}}(\proj_{\bs{\alpha}}(\textbf{b}))$.

            
            In the context of search, $n$ is used generally to represent nodes in a search tree.  
            For AND/OR search trees, $n_{X}$ is used to specifically refer to an AND node associated with variable $X$, and $Y_{n_X}\!$ the OR node associated with variable $Y$ that is the child of $n_{X}$. 
            $ch(n)$ are the children of node $n$. 
            $path(n)$ is the configuration of the variables along the path from the root of a search tree $T$ to node $n$ according to assignments corresponding to that path.  
            For the highlighted node $n$ in 
            \figlink{fig:ancestor-branching-mass},
            $path(n) = \set{\teq{A}{0}, \teq{C}{1}}$. 
            $varpath(n)$ is the set of variables that $path(n)$ provides a configuration for.  In \figlink{fig:ancestor-branching-mass} $varpath(n) = \set{A,C}$.  The cost of the arc to an AND node $n_{X}$ is
            \begin{align}
                c(n_{X}) = \hspace{-84pt}
                    \prod_{ \hspace{+72pt}
                        % f_{\bs{\alpha}} \mst \bs{\alpha} \subseteq varpath(n_{X}) \tn{ and } X \in \bs{\alpha}
                        f \in \setst{f_{\bs{\alpha}} \in \F}{ \bs{\alpha} \subseteq varpath(n_{X}), \; X \in \bs{\alpha}}
                    } 
                \hspace{-84pt}
                f(path(n_{X})).
            \end{align}
            or $1$, vacuously. Letting $anc(n)$ be the AND node ancestors of $n$ in the search tree, the cost of $path(n)$ is $g(n) = \prod_{n' \in anc(n)} c(n')$. 
            %and equals the product of the arc costs along the path to $n$. 
            In \figlink{fig:ancestor-branching-mass}, $g(n) = 10 \, \mul \, 5$.
            %, in our example corresponding to a single OR node $D_{n}$ corresponding to variable $D$.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%% NEW VERSION
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
        % In order use AND/OR search spaces effectively for solving computational tasks, there are several quantities that become important to evaluate and understand.  We will describe these next.
        We now define some important quantities involved in evaluating AND/OR search spaces.

        \vspace{-6pt}
        \paragraph{$\bs{Z(n)}$.} \label{sec:partition-function-of-a-node}  
            The total cost of the subtree rooted at $n$.
            For an AND node $n_{X}$ with children OR nodes $Y_{n_{X}} \in ch(n_{X})$, $Z(n_{X})$ satisfies
            \begin{equation} \label{eq:and-or-z-prod}
                Z(n_{X}) = \prod_{Y_{n_{X}} \in ch(n_{X})} Z(Y_{n_X})
            \end{equation}
            such that for OR nodes $Y_{n_{X}}$
            \begin{equation} \label{eq:and-or-z-sum}
                Z(Y_{n_X}) = \sum_{n_Y \in ch(Y_{n_X})}  c(n_Y) \cdot Z(n_Y)
            \end{equation}
            with $Z(n_{X}) = 1$ in the case $n_{X}$ has no children.

            % \begin{figure}[h]
            %     \vspace{-2pt}
            %     \centering
            %     \includegraphics[scale=0.4]{./_attachments/images/example-Z-of-n-and-or-search-space-threo.pdf}
            %     \captionsetup{width=.95\linewidth}
            %     \vspace{-6pt}\caption{The subtree contributing to $Z(n_{T=0})$ is highlighted above.  Using \eqlink{eq:and-or-z-prod} and \eqlink{eq:and-or-z-sum}, $Z(n_{T=0}) = 1.0$.\vspace{-4pt}}
            %     \label{fig:ex-z-of-n-in-and-or-threo}
            % \end{figure}
            
            Note that given $n_{\varnothing}$ as the dummy root node of AND/OR tree $T$, $Z(n_{\varnothing}) = Z$ of the underlying model \M. We denote estimation of $Z(n)$ as $\hat{Z}(n)$.  Heuristic estimates of $Z(n)$ are more specifically denoted as $h(n)$.

        \vspace{-6pt}
        \paragraph{$\bs{R(n)}$.} \label{sec:ancestor-branching-mass-of-a-node}
            On the path from the root of an AND/OR tree $T$ to some node $n_{X}$, there may be an intermediate node $n_{Y}$ associated with branching variable $Y$ in the guiding pseudo tree \PT. (In \figlink{fig:ancestor-branching-mass}, on the path to the highlighted node $n_{A=0,C=1}$, node $n_{A=0}$ is traversed where $A$ is a branching variable in \PT of \figlink{fig:primal-graph-and-pseudo-tree}).  When this happens, the remaining variables of the model are split between different branches.  
            %\po{(In the AND/OR tree in \figlink{fig:ancestor-branching-mass}, the left branch under the node $n_{A=0}$ contains variable $B$ but not $C$ or $D$, and that the right branch contains $C$ and $D$ but not $B$)}.  
            Thus, the $Z(n)$ of any node down one of the branches will necessarily omit the costs from the configurations of the variables included in the other branch(es). 
            $R(n_{X})$, or the \textit{ancestor branching mass}, captures 
            %the mass for all configurations of the variables that branched off of - and thus not on or below - the path to $n_{X}$. 
            these omitted costs.
            (In \figlink{fig:ancestor-branching-mass}, the green box shows the portion of $T$ corresponding to $R(n_{A=0,C=1})$).
            %=Z(B_{n_{A=0}})$.
            %\textcolor{cadmiumgreen}{\textbf{(Omiting mention that this captures $Z(B_{n_{A=0}})$, the significance being that $B$ is missing from $path(n_{A=0,C=1})$ and under $n_{A=0,C=1}$.  Thoughts?) [Seems ok]}}
            %which is missing from the computation of $Z(n_{A=0,C=1})$).
            %(That same boxed portion would also be the ancestor branching mass for the sibling node of the red node, and also for any of their children).

            % \po{
            % More formally, let $br(n_{X})$ be the set of ancestor nodes $n_{Y_{i}}$ on the path to $n_{X}$ such that each $Y_{i}$ is a branching variable ancestor of $X$ in \PT.  Let $W_{n_{Y_{i}}}$ be the child OR node of each $n_{Y_{i}}$ that is also on the path to $n_{X}$.  
            % (For example, in \figlink{fig:ancestor-branching-mass} $br(n_{A=0,C=1}) = \set{n_{A=0}}$, in this case $A$ being the only branching variable ancestor of $C$ in the guiding pseudo tree, and the child OR node on the path to $n_{A=0,C=1}$ being $C_{n_{A=0}}$).  
            % We then define $R(n_{X})$ simply as: 
            %  %$R(n_{X}) =   \prod_{n_{Y} \in br(n_{X})} \frac{Z(n_{Y})}{ Z(W_{n_{Y}})}$.
            %  \begin{align}
            %      \label{eq:Rn-defined-by-ratio-of-z}
            %      R(n_{X}) =   \prod_{n_{Y} \in br(n_{X})} \frac{Z(n_{Y})}{ Z(W_{n_{Y}})}
            %  \end{align}
            %  }
             
            More formally, let $br(n_{X})$ be the set of ancestor nodes $n_{Y_{i}}$ of $n_{X}$ such that each $Y_{i}$ is a branching variable ancestor of $X$ in \PT.  
            %Let $ch_{\neg path(n_{X})}(n_{Y_{i}})$ be the children OR node of each $n_{Y_{i}}$ that is \semph{not} on the path to $n_{X}$. 
            %AI: updated shorter
            We then define $R(n_{X})$ simply as: 
            \begin{align}
                \label{eq:Rn-defined-by-branchings-off-the-path}
%                R(n_{X}) = \hspace{-6pt}  \prod_{n_{Y} \in br(n_{X})} \hspace{-58pt}\prod_{\hspace{62pt}W_{n_{Y}} \in ch_{\neg path(n_{X})}(n_{Y})} \hspace{-56pt}Z(W_{n_{Y}})
                R(n_{X}) = \prod_{n_{Y} \in br(n_{X})} \prod_{\substack{W_{n_{Y}} \in ch(n_{Y})\\W_{n_Y}\not \in path(n_X)}} Z(W_{n_{Y}}),
            \end{align}
            % i.e., the product of all siblings of ancestor or-nodes of $n_X$. \todo{not correct; omits cost c(n)}
            (In \figlink{fig:ancestor-branching-mass}, $br(n_{A=0,C=1}) = \set{n_{A=0}}$, $A$ being the only branching variable ancestor of $C$ in \PT, and $B_{n_{A=0}}$ the only respective child OR node \semph{not} not on the path to $n_{A=0,C=1}$.  Thus, $R(n_{A=0,C=1})=Z(B_{n_{A=0}})$).
            We denote approximations to $R(n)$ as $r(n)$.
             

        \vspace{-6pt}
        \paragraph{$\bs{Q(n)}$.} \label{sec:q-of-a-node}
            We can now concisely define a quantity $Q(n)$ as the contribution to $Z$ from all full configurations consistent with $path(n)$. In other words, $Q(n)$ is the unnormalized measure of the configuration $path(n)$, 
            %\po{based on the joint distribution defined by \M}, 
            with $P(path(n)) = \frac{Q(n)}{Z}$.  The quantity $Q(n)$ obeys: 
            %Q(n) = g(n)  \! \cdot \!  R(n)  \! \cdot \!  Z(n)$.
            \begin{align}
                Q(n) = g(n)  \! \cdot \!  R(n)  \! \cdot \!  Z(n).
            \end{align}
                
             \textbf{Example.} In \figlink{fig:ancestor-branching-mass}, consider the path from the root to the red node $n_{A= 0,C=1}$. Following $n_{A=0}$ to our node, we see OR node $B_{n_{A=0}}$ branches off of the path.
             So, 
             \begin{small}
                 %$Q(n_{A=0,C=1}) = g(n_{A=0,C=1}) \! \cdot \! R(n_{A=0,C=1}) \! \cdot \! Z(n_{A=0,C=1}) = g(n_{A=0,C=1}) \mul Z(n_{A=0,B}) \! \cdot \! Z(n_{A=0,C=1})$.
                 % \begin{alignat}{3}
                 % \begin{split}
                 %    Q(n_{A=0,C=1}) &= g(n_{A=0,C=1}) \! \cdot \! R(n_{A=0,C=1}) \! \cdot \! Z(n_{A=0,C=1}) \\
                 %    &= g(n_{A=0,C=1}) \mul \;\; Z(B_{n_{A=0}})\;\; \! \cdot \! Z(n_{A=0,C=1}) \\
                 %    &= \;\;\; (10 \mul 5) \;\;\; \mul \;\; ( 1 \mul 1 + 4 \mul 1)\;\; \! \mul \! ( 2 \mul 1 + 3 \mul 1) 
                 % \end{split}
                 % \end{alignat}
                 \begin{alignat*}{7}
                    Q(n_{A=0,C=1}) &= g(n_{A=0,C=1}) &\;\mul{}\;& R(n_{A=0,C=1}) &\;\mul{}\;& Z(n_{A=0,C=1}) \\
                    &= g(n_{A=0,C=1}) &\;\mul{}\;&  Z(B_{n_{A=0}}) &\;\mul{}\;& Z(n_{A=0,C=1}) \\
                    &= (10 \mul 5) &\;\mul{}\;& ( 1 \mul 1 + 4 \mul 1) &\;\mul{}\;& ( 2 \mul 1 + 3 \mul 1) 
                 \end{alignat*}
             \end{small}     
             
        \vspace{-6pt}
        \paragraph{Stratified Importance Sampling.} 
            Abstraction Sampling builds on Stratified Importance Sampling, which in turn builds on Importance Sampling and Stratified Sampling. \emph{Importance Sampling} is  a Monte Carlo scheme used for approximating likelihood queries \citep{Rubinstein_2016,liu2015probabilistic,DBLP:journals/ai/GogateD11}.
            %\citep{Rubinstein_2016,DBLP:journals/ai/GogateD11,liu2015probabilistic}.
            {\em Stratified Sampling} is a variance reduction technique for sampling a search space by first dividing it into disjoint strata \citep{Rubinstein_2016}. 
            % The two can be merged to further reduce variance.
            In \emph{Stratified Importance Sampling}, the sample space is first divided into $k$ strata,
            %of equal area under the distribution $p$, 
            then representatives from each strata chosen and re-weighted to represent the omitted members of their respective strata. %, and uses these representatives to form an estimator over the entire model. 
            \citet{rizzo_2007} shows that to reduce overall variance given strata of equal mass under the proposal, the sum of the variances within the strata should be minimized.
            
            
            \newcommand{\soltree}{\hat{x}_M}
            \newcommand{\parttree}{\bar x}
            
            
    

%%%%%%%%%%%%%%%%%%%%%%%%%5
%%%%%%%%%%%%%%%%%%%%%%%%%5
%%%%%%%%%%%%%%%%%%%%%%%%%5
%%%%%%%%%%%%%%%%%%%%%%%%%5
%%%%%%%%%%%%%%%%%%%%%%%%%5
%%%%%%%%%%%%%%%%%%%%%%%%%5
%%%%%%%%%%%%%%%%%%%%%%%%%5
%%%%%%%%%%%%%%%%%%%%%%%%%5


    \section{Abstraction Sampling}\label{sec:abstraction-sampling}
    \vspace{-4pt}
        {\em Abstraction Sampling} (AS) \citep{DBLP:conf/uai/BrokaDIK18} applies concepts of Stratified Importance Sampling to sampling over probabilistic graphical models. 
        %An abstraction event in Abstraction Sampling is analogous to sampling representatives from strata in stratified importance sampling and reweighing to account for the rest of the members that were not chosen.  
        AS is guided by an abstraction
        function $a(\cdot)$ that dictates how nodes are partitioned into \textit{abstract states} (abstract states being analogous to strata in stratified sampling). A search tree is iteratively expanded along a variable ordering. %variable by variable, 
       After each expansion, $a(\cdot)$ is used to group nodes into abstract states.  Then AS uses an importance-sampling-like process to select a  representative from each abstract state and reweights it using importance sampling weights to account for the unselected nodes it represents.  The selected nodes are then further expanded and the process iterates.
       This process yields
       %leading to the generation of 
       a weighted sampled subtree of the full search tree $T$ as a sample, referred to as a \textit{probe}.  It is important to note that AS probes can contain multiple full configurations, whereas samples from importance sampling are each only a single full configuration.

        \setlength{\textfloatsep}{6pt}
        \begin{algorithm}[t!]
                \caption{AOAS Overview}
                \label{alg:aoas-overview}
            \begin{footnotesize}
            \begin{enumerate}
                \vspace{2pt}
                \item \textbf{Initialization:}
                    Begin with a dummy root node $r$.
                \item \textbf{Probe Generation:}
                    Proceeding in a DFS manner according to a pseudo tree $\PT$...
                    \begin{enumerate}
                        \item \textbf{Expansion:} \label{alg:aoas-overview:expansion}
                            Generate children nodes $n$ corresponding to the next variable in the DFS ordering of $\PT$. Inherit $w(n)$ from parents and assign appropriate $g(n), h(n), \tn{and } r(n)$ values.
                        \item \textbf{Abstraction:} \label{alg:aoas-overview:abstraction}
                            \begin{enumerate}
                                \item \textbf{Form Abstract States:}
                                    Using $a(\cdot)$, partition newly expanded nodes into abstract states.
                                \item \textbf{Select Representative:}
                                    Using proposal $p(n) \propto q(n)$, stochastically select a representative from each abstract state and reweigh it such that $w(n) \leftarrow \frac{w(n)}{p(n)}$
                            \end{enumerate}
                        \item \textbf{Backtrack:} \label{alg:aoas-overview:backtracking}
                            After reaching a leaf in $\PT$, recursively backtrack until reaching the node that extends to the next unexplored branch of $\PT$. While backtracking, update parent node $n'$'s $\hat{Z}(n')$ estimates based on its children's $w(n), g(n),$ and $\hat{Z}(n)$ values.
                        \item \textbf{Repeat:}
                            Repeat steps \ref{alg:aoas-overview:expansion}-\ref{alg:aoas-overview:backtracking} until backtracking to the root node.
                    \end{enumerate}
                \item \textbf{Return:}
                    $\hat{Z} = w(r)\,\hat{Z}(r)$ for the root node $r$.
            \end{enumerate}
            \end{footnotesize}
        \end{algorithm}


        \begin{figure}[!htb]
          \begin{subfigure}{0.245\textwidth}
            \includegraphics[width=0.98\linewidth]{./_attachments/images/AlgorithmTraces/AOAS-step1.pdf}
            \caption{}
            \label{fig:AOAS-step1}
          \end{subfigure}%
          \begin{subfigure}{0.245\textwidth}
            \includegraphics[width=0.98\linewidth]{./_attachments/images/AlgorithmTraces/AOAS-step2.pdf}
            \caption{}
            \label{fig:AOAS-step2}
          \end{subfigure}
          \begin{subfigure}{0.245\textwidth}
            \includegraphics[width=0.98\linewidth]{./_attachments/images/AlgorithmTraces/AOAS-step3.pdf}
            \caption{}
            \label{fig:AOAS-step3}
          \end{subfigure}%
          \begin{subfigure}{0.245\textwidth}
            \includegraphics[width=0.58\linewidth]{./_attachments/images/AlgorithmTraces/AOAS-step4.pdf}
            \caption{}
            \label{fig:AOAS-step4}
          \end{subfigure}
          \begin{subfigure}{0.245\textwidth}
            \includegraphics[width=0.58\linewidth]{./_attachments/images/AlgorithmTraces/AOAS-step5.pdf}
            \caption{}
            \label{fig:AOAS-step5}
          \end{subfigure}%
          \begin{subfigure}{0.245\textwidth}
            \includegraphics[width=0.58\linewidth]{./_attachments/images/AlgorithmTraces/AOAS-step6.pdf}
            \caption{}
            \label{fig:AOAS-step6}
          \end{subfigure}
          \captionsetup{width=.95\linewidth}
          \vspace{-18pt}
          \caption{From \protect\cite{kask20-scaling-up-as}, a sample trace of AOAS following ordering $B\rightarrow A\rightarrow C\rightarrow D$. Transparent nodes indicate portions of the reachable search space yet to be explored.  Gray boxes indicate nodes considered for abstraction.  Nodes with the same domain values (also indicated by the same color) are abstracted into the same abstract state.  Only one node of each color is stochastically selected as a representative for its respective abstract state. Step (c) shows an optional pruning step.  Step (f) shows the final probe capturing four full configurations: $\teq{B}{0},\teq{A}{0},\teq{C}{0},\teq{D}{0}$, $\teq{B}{0},\teq{A}{1},\teq{C}{0},\teq{D}{0}$, $\teq{B}{0},\teq{A}{0},\teq{C}{1},\teq{D}{1}$, $B=0,A=1,C=1,D=1$. }
          \vspace{+6pt}
          \label{fig:aoas-trace}
        \end{figure}

        \vspace{-6pt}
        \paragraph{AOAS.}


        \begin{figure}[!htb]
            \vspace{-6pt}
            \centering
            \includegraphics[width=0.75\linewidth]{UAI-24/_attachments/images/proposal.pdf}
            \captionsetup{width=.95\linewidth}
            \vspace{-6pt}\caption{The unnormalized proposal distribution $w(n)q(n)$ visualized to show it considering nodes previously abstracted (via $w(n)$), the ancestor branching mass (via $r(n)$), current path cost (via $g(n)$), and subtree mass (via $h(n))$.
            \vspace{+12pt}
            }
                    \label{fig:proposal}
        \end{figure}
        
            Taking Abstraction Sampling further, \citet{kask20-scaling-up-as} introduced algorithm AOAS
            %(\textbf{A}nd/\textbf{O}R \textbf{A}bstraction \textbf{S}ampling) 
            that more effectively applied Abstraction Sampling to AND/OR search spaces and significantly improved its performance. AOAS uses a proposal function $p(n) \propto  w(n) \hyperref[sec:q-of-a-node]{q(n)} = w(n) g(n)  h(n)  r(n)$ where a weight $w(n)$ accounts for the nodes previously abstracted into the path to $n$, $g(n)$ is the cost of the path to $n$, $h(n)$ is a heuristic estimate of \hyperref[sec:partition-function-of-a-node]{$Z(n)$}, and $r(n)$ is an estimate of \hyperref[sec:ancestor-branching-mass-of-a-node]{$R(n)$} (see \figlink{fig:proposal}). 
            \alglink{alg:aoas-overview} provides a high level description of the AOAS procedure.  \figlink{fig:aoas-trace} shows a sample trace of AOAS from \citet{kask20-scaling-up-as}.  A more detailed version of the algorithm and detailed description of the sample trace can be found in the Supplemental Materials.




         
         % A key hyper-parameter used is $nAbs$ that bounds the number of abstract states at each level, and thus bounds the size of each probe. If each search node is placed in its own abstract state, the result will be a pure search algorithm, but for that $nabs$ would be exponential \cite{}.
        

        % \vspace{-4pt}        
        





    \section{Value-Based Abstractions} \label{sec:ordered-value-based-abstraction-functions}
    \vspace{-4pt}

        The choice of abstraction function is a crucial aspect of Abstraction Sampling but has only received limited attention so far. The main focus of this work is to identify new abstraction functions that significantly improve AS performance.
        
        \vspace{-6pt}
        \paragraph{Existing State-of-the-Art: Context-Based Abstraction Functions.} \label{sec:abstraction-sampling:existing-abstraction-functions}
           \citet{DBLP:conf/uai/BrokaDIK18} designed abstractions based on assignments to a 
           variable's context $C(X)$ - a subset of its ancestors in $\cal T$ whose assignments uniquely determine the AND/OR subtree below it \citep{DBLP:journals/ai/DechterM07}. 
           % and thus its $Z(n)$.
           %Therefore abstracting nodes together that have the same context configuration ensures that they have the same $Z(n)$.
           However, the number of configurations to a context %$|\D_{C(X)}|$, 
           is exponential in the context's size.
           %In the {\relax context}  approach abstractions were employed.
           %was explored which is to select a subset of the context variables hoping that this will group nodes having similar $Z(n)$.
           %yielding too many abstract states, unless the induced-width is boundsd.  and is infeasible to use if the induced-width of the graph is high. Thus, 
          Thus, \citet{DBLP:conf/uai/BrokaDIK18} and \citet{kask20-scaling-up-as} 
          used \emph{relaxed} context-based (\textit{RelCB}) and \emph{randomized} context-based (\textit{RandCB}) abstractions to control the number of abstract states.  RelCB
          %is controlled by a \emph{level} parameter $j$  %parameterized by a level $j$, 
          %selecting the closest $j \! - \! 1$ variables from a variable's context (ie. its {\em relaxed context}) plus itself. 
          uses a parameter $nCtx$ 
          that groups nodes with the same configuration over the most recent $nCtx \! - \! 1$ context variables (the relaxed context) into the same abstract state. With a domain size of $k$, this yields at most $k^{nCtx}$ abstract states at each level.  
          RandCB considers the entire context but bounds the number of abstract states per level based on an $nAbs$ parameter and by using a randomized hashing scheme to associate each full context assignment to one of the $nAbs$ abstract states.
          %each of the $nAbs$ abstract states with a subset of possible full context assignments.
    
          % \rina{Bobak: you have to mention the notion of "granularity and to say that $nabs$ denoted the granularity.}
         % \rina{In the next section we introduced a class of abstraction functions which we call "value-based".}

        \vspace{-6pt}
        \paragraph{Value-Based Abstractions.}
            We now introduce a new way to form
            %framework for 
            abstractions that we call Value-Based Abstractions. They are defined by  
            %Value-based abstraction functions consist of two parts: 
            (1) a positive real-valued function $\mu: D_{\X} \rightarrow \mathbb{R^{+}}$,
            where $D_{\X}$ is a set of configurations for the variables \X,
            %that assigns a real positive number to each node $n$,
            and by (2) a partitioning scheme $\psi_{\mu}$ that assigns nodes to abstract states based on their $\mu$ value and in an order-consistent manner as defined next. 
            
            \begin{definition}[Value-Ordered Partitioning] 
            \label{def:value-ordered-partitioning}
                Given $nAbs$ and a function $\mu: D_{\X} \rightarrow \mathbb{R^{+}}$, a partitioning function 
                $\psi_{\mu}: D_{\X}  \rightarrow \{A_1,A_2,...A_{nAbs} \}$,
                %$\psi_{\mu}: D_{\X}  \rightarrow \mathbb{I^{+}}$,
                is order-consistent with $\mu$ relative to the $nAbs$ abstract states if for any
                $n_1 \in  \bs{A_i}$ and $ n_2 \in \bs{A_j}$,  $i<j$   $\Leftrightarrow \mu(n_1) \leq \mu(n_2)$.
                \vspace{-6pt}
            \end{definition}
    
            % We categorize value-based abstraction functions by their different combinations of $\mu$ and $\psi$ functions.

        % \rina{bobak, I thing algorithm2 should be removed and the whole discussion simplified. You can talk about how to generate the abstract cases for the specific cases. I comment it}
        %\alglink{alg:general-ordered-value-based-abstraction-function} provides a general value-based abstraction scheme that maintains an ordering of nodes according to $\mu(n)$. Assuming the value function $\mu(\cdot)$ is not dominating, the complexity is determined by the complexity of the partitioning function used.

        %\rina{removed.}
        %We next present three value-based abstraction classes, each based on a unique $\mu$.  Subsequently, we will  present seven ordered partitioning schemes that, in conjunction with a $\mu$, are used %with \alglink{alg:general-ordered-value-based-abstraction-function} 
        %to define a unique value-ordered abstraction function.
        

        \subsection{Value-Based Abstraction Classes} \label{sec:value-based-abstraction-classes}
        \vspace{-4pt}
        
            We introduce three Value-Based Abstraction classes, each characterized by a unique value function $\mu$ that signifies a notion of similarity between nodes.   We will subsequently provide partitioning schemes that, together with $\mu$, will yield a set of full abstraction functions.
            
            % In this work we present three value-based abstraction classes: Heuristic-Based (HB), HR-Based (HRB), and Q-Based (QB) abstraction value-classes.  Each is motivated by theory in search or sampling discussed in \seclink{sec:paradigms}, and each can be used with node partitioning schemes (\seclink{sec:ordered-partitioning-schemes}), which together form a value-ordered abstraction function.

            % The three types of guiding value functions we will use as a basis for abstraction function are 1) Heuristic-based, 
            % % 2) Heuristic and Ancestral Branching-based, 
            % 2) Heuristic and Ancestral Branching-based (or HR-based), 
            % and 3) Q-based value functions. 
            % %when Q is the proposal function. W

            \vspace{-6pt}
            \paragraph{1. Heuristic-Based Abstractions.} \label{sec:value-based-abstraction-classes:HB}

            
                % \begin{quote}
                %     $\mu(n) = h(n)$
                % \end{quote}
                
                %Using the motivation of abstracting nodes with similar subtree $Z(n)$ intuited from previous work and concepts of graph search,
                Heuristic-Based (HB) abstractions use $\mu(n) = h(n)$, where $h(n)$ is a heuristic estimate of $Z(n)$.  Unlike partial or hashed contexts as used by \citet{DBLP:conf/uai/BrokaDIK18}, heuristic estimates of $Z(n)$ can often provide
                \textit{quantitative} insight into potential similarities of $Z(n)$ values. In particular, this intuition holds when using %wMBE 
                heuristics that provide bounds on $Z(n)$ such as those via Weighted Mini-Bucket Elimination (wMBE) \citep{DBLP:journals/jacm/DechterR03,DBLP:conf/icml/LiuI11}. 
    
                % In conjunction with the node partitioning schemes that will be presented in \seclink{sec:ordered-partitioning-schemes}, the presented HB abstraction functions aim to form abstractions such that nodes with similar $Z(n)$ are grouped together.

    
            \vspace{-6pt}
            \paragraph{2. Heuristic and Ancestral Branching-Based Abstractions.} \label{sec:value-based-abstraction-classes:HRB}

    \shrink{
                % \begin{quote}
                %     $\mu(n) = h(n)  \! \cdot \!  r(n)$
                % \end{quote}

                Consider the following definition of "exact" abstraction functions:
                \begin{definition}[Exact Abstraction Function]
                     An abstraction function $a(\cdot)$ is exact for an Abstraction Sampling algorithm, AS, if use of $a(\cdot)$ with AS always leads to AS estimates having zero variance and $\hat{Z} = Z$ for every AS probe.
                \end{definition}

                }

                Recall that
                %$h(n)$ is a heuristic estimate of \hyperref[sec:partition-function-of-a-node]{$Z(n)$} and 
                $r(n)$ is an estimate of $n$'s \hyperref[sec:partition-function-of-a-node]{ancestor branching mass $R(n)$}. We can show that:
                % \vspace{-4pt}
                 \begin{theorem}[AOAS Exact Abstractions] \label{thm:aoas-proportionality-exact-proposal}
              %  \begin{theorem}[AOAS Exact Abstractions from $h(n)r(n)$ vs. $Z(n)R(n)$ %Proportionality] \label{thm:aoas-proportionality-exact-proposal}
                      If an abstraction function $a(\cdot)$ forms abstract states $\bs{A_{i}} \in \bs{A}$ such that 
                      $\exists c_i \in \mathbb{R}^{+}, \forall n \in \bs{A_{i}}, \frac{h(n)r(n)}{Z(n)R(n)} = c_i $
                      %for some $\propto_{i} \in %\!\!  \mathbb{R}_{>0}$ 
                      %$\forall n \in \bs{A_{i}}, 
                     % \frac{h(n)r(n)}{Z(n)R(n)} = \; \propto_{i}$ for some $\propto_{i} \in %\!\!  \mathbb{R}_{>0}$ 
                      whenever $Z(n)R(n) > 0$ \, (or $h(n)r(n) = 0$ otherwise), 
                     % $Z(n)R(n) \in \!\!  \mathbb{R}_{>0}$, or $h(n)r(n) = 0$ otherwise, then 
                     then AOAS is exact with its estimates having zero variance.
                     % (Proof in Supplemental Materials).
            \vspace{-6pt}
                \end{theorem}
    
                This observation suggests to use $\mathfrak{hr}(n) = \frac{h(n)r(n)}{Z(n)R(n)}$ as a similarity measure. When nodes having close $\mathfrak{hr}$ values are placed in the same abstract state it can lead to a reduction in variance of the resulting estimate.  However, without access to $Z(n)$ or $R(n)$ we cannot evaluate this ratio directly. Instead we use the intuition that grouping based on $h(n)r(n)$ may result in sets of nodes also with similar $Z(n)R(n)$, and thus result in similar $\mathfrak{hr}(n)$. We call such schemes that use $\mu(n) = h(n)r(n)$ HR-Based (HRB) abstractions.
                %as a surrogate for similarity of this ratio and group nodes accordingly.
    

            \vspace{-6pt}
            \paragraph{3. Q-Based Abstractions.} \label{sec:value-based-abstraction-classes:QB}
    
                % \begin{quote}
                %     $\mu(n) = w(n) \! \cdot \! g(n) \! \cdot \! h(n) \! \cdot \! r(n)$
                % \end{quote}
    
                Another intuition for generating abstractions comes from statistics theory.  In his work on stratified Importance Sampling, \citet{rizzo_2007} showed the potential of overall variance reduction by forming strata (abstract states) having equal mass under the proposal distribution and that minimizes the variance within each strata.  Thus, since our proposal $p$ is proportional to $w(n)q(n)$, we use  $\mu(n) = w(n)q(n) \! = \! w(n)g(n)h(n)r(n) $ in what are called Q-based (QB) abstractions.
    
                % \rina{remove (unclear) In addition to serving as an un-normalized proposal function, $q(n)$ also estimates $n$'s
                % contribution to the overall $Z$. Therefore, $q(n)$ estimates the impact of $n$ (and all previously abstracted nodes that $n$ represents) on the overall $Z$.}
                %which could be a valuable quantity to base our choice of nodes on as discussed in \seclink{sec:paradigms:combined}.
     


        \subsection{Ordered Partitioning Schemes} \label{sec:ordered-partitioning-schemes}
        \vspace{-4pt}
        
           Next we describe seven partitioning schemes $\psi$ to be used with $\mu$ to partition the nodes $\bs{n}$ into abstract states.
           Together, $\mu$ and $\psi$ define a value-based abstraction function.   
           % For brevity, we have omitted the algorithmic representation of the partitioning schemes, which can be found in the Supplemental Materials.
                
            
            %We now present seven distinct schemes of partitioning nodes into abstract states such that nodes are sorted according to a provided abstraction value function $\mu(\cdot)$. In addition to defining each scheme we also describe the motivation behind its creation and show the results on a running example we will use presented below.

            \vspace{-6pt}
            \paragraph{Running Example.} \label{sec:ordered-partitioning-schemes:running-example} 
            We will use a running example to illustrate the result of using various partitioning schemes.
            
               % As we motivate and describe the various partitioning schemes, we will also provide examples of the abstract states that would result from partitioning nodes with the following $\mu(n)$:
               Assume we have eight nodes with the following $\mu(n)$:
                \begin{align} \label{eq:running-partitioning-example}
                    % \set{
                        1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 10, 100
                    % }
                \end{align}
               and want to partition the nodes into $nAbs=4$ abstract states.  As we describe each partitioning scheme, we also demonstrate how the scheme would partition these nodes. 

                \begin{algorithm}[!t]
                    \caption{$\Part{simpleVB}$}
                    \label{alg:psi-simpleVB}
                    \begin{footnotesize}
                        \SetInd{0.25em}{0.55em}
                        \DontPrintSemicolon 
                        $baseCardinality \leftarrow \floor{\frac{|\bs{n}|}{nAbs}}$\\
                        $extras \leftarrow |\bs{n}| \mod nAbs$\\
                        $\bs{n^{*}} \leftarrow SORT(\bs{n},\mu, \tn{low-to-high})$\\
                        $j_{begin} \leftarrow 1$\\
                        \ForEach{$i \leftarrow 1,...,nAbs$}{
                            \uIf{$extras > 0$}{
                                $j_{end} \leftarrow j_{begin} + baseCardinality$\\
                                $extras \leftarrow extras - 1$
                            }
                            \uElse{
                                $j_{end} \leftarrow j_{begin} + baseCardinality - 1$
                            }
                            $\bs{A_{i}} \leftarrow \set{n^{*}_{{j_{begin}}}, ..., n^{*}_{{j_{end}}}}$\\
                            $j_{begin} \leftarrow j_{end}+1$
                        }
                        $\bs{A} \leftarrow \cup_{i = 1}^{nAbs} \bs{A_{i}}$\\
                        \Return $\bs{A}$       
                    % }
                    \end{footnotesize}
                \end{algorithm}

            \vspace{-6pt}
            \paragraph{\NoCaseChange{1. SimpleVB}.} \label{sec:ordered-partitioning-schemes:simpleVB}
    
                The \textit{simpleVB} (simple value-based) scheme groups nodes having similar $\mu(n)$ into the same state by a simple 2-step process: 
                1) nodes are ordered by $\mu(n)$ (low to high), and 2) nodes are partitioned into abstract states with [approximately] equal cardinality.
                
                % 2) the ordered nodes are partitioned into [approximately] equal cardinality abstract states.
    
                % \textit{Time Complexity:}
                %     Partitioning is achieved via one pass through $|\bs{n}|$ leading to $\mathcal{O}(|\bs{n}|)$ time complexity.
                % \textit{Space Complexity:}
                %     No more than linear space is required.  $\mathcal{O}(|\bs{n}|)$.
                \textit{\hyperref[sec:ordered-partitioning-schemes:running-example]{Running Example}:}
                    \smallset{1.0, 1.1}, \smallset{1.2, 1.3}, \smallset{1.4, 1.5}, \smallset{10, 100}.
                    
               % Nodes are partitioned evenly, and through its simplicity 
                This method leverages speed 
                % allowing for abstractions to be formed quickly 
                 while still aiming to roughly group nodes with similar $\mu(n)$ together.
                %\rina{complexity?}
                %\footnotetext{\label{ftn:ordered-schemes-maintain-sort-order}Such that nodes maintain sort order $o$ across all abstract states.}
    
    
            \vspace{-6pt}
            \paragraph{\NoCaseChange{2. minVarVB.}} \label{sec:ordered-partitioning-schemes:minVarVB}
    
                % The \textit{minVarVB} scheme 
                \textit{minVarVB} uses Ward's Minimum Variance Hierarchical Clustering, also known as Ward's Method \citep{ward1963} (\alglink{alg:wards-method}), to cluster nodes into $nAbs$ abstract states. Use of Ward's method minimizes total within variance of $\mu(\cdot)$ across all abstract states.  
                %Ward's Method (\alglink{alg:wards-method}) is an agglomerative hierarchical clustering algorithm that creates a dendrogram by iteratively merging clusters. 
                Ward's Method can be combined with Lance-Williams linear distance updates \citep{LanceWillaims1967-distanceUpdates} to increase efficiency.
                More details on Ward's Method and Lance-Williams linear distance updates are found in the Supplemental Materials.

                \begin{algorithm}[!htb]
                    \caption{Ward's Method}
                    \label{alg:wards-method}
                    \begin{footnotesize}
                        \begin{enumerate}
                        \item \textbf{Initialization:} Treat each data point as an individual cluster. Assign each cluster a label.
                        
                        \item \textbf{Compute Pairwise Distances:} Calculate the pairwise distances between all clusters. Various distance metrics can be used, such as Euclidean distance.
                        
                        \item \textbf{Cluster Merging Iteration:} 
                          \begin{enumerate}
                            \item Identify the pair of clusters $\bs{C_{i}}$ and $\bs{C_{j}}$ that, when merged into a new cluster $\bs{C_{ij}}$, results in the smallest increase in the overall within-cluster variance. This is determined using the formula:\\
                            \vspace{4pt}
                            $\Delta Var = Var(\bs{C_{ij}}) - (Var(\bs{C_{i}}) + Var(\bs{C_{j}}))$\\
                            \vspace{4pt}
                              where \(Var(\bs{C_{ij}})\) is the variance of the merged cluster, and \(Var(\bs{C_{i}})\) and \(Var(\bs{C_{j}})\) are the variances of clusters $\bs{C_{i}}$ and $\bs{C_{j}}$, respectively.
                            \item Update distance measures between the newly merged cluster and all other clusters.
                          \end{enumerate}
                        
                        \item \textbf{Repeat:} Repeat steps 2-3 until the desired number of clusters is achieved.
                        \end{enumerate}
                    \end{footnotesize}
                \end{algorithm}

                % \textit{Time Complexity:\footnote{\label{ftn:time-complexity-assumes-constant-time-v}Assuming $\mu(n)$ is $\mathcal{O}(1)$ in both time and space.}}
                %     The choice of clusters to merge generally leads to having a $\mathcal{O}(|\bs{n^{*}}|^{3})$ time complexity due to the need to compare pair-wise distances between all clusters at each iteration.  However, in the case where nodes are distributed linearly in one dimension, only neighboring distances need to be considered at each iteration and can be made efficient by use of a priority queue, however since the Lance-Williams distance updates themselves take linear time, once per iteration, the reduced time complexity is still $\mathcal{O}(|\bs{n}|^{2})$.
                % \textit{Space Complexity:\super{\ref{ftn:time-complexity-assumes-constant-time-v}}}
                %     The space complexity is implementation dependent, with most time-efficient variants making use of a distance matrix leading to $\mathcal{O}(|\bs{n}|^{2})$ space complexity.
                \textit{\hyperref[sec:ordered-partitioning-schemes:running-example]{Running Example}:}
                    \smallset{1.0, 1.1, 1.2}, \smallset{1.3, 1.4, 1.5}, \smallset{10}, \smallset{100}.
                    
                In contrast to \textit{simpleVB}, \textit{minVarVB} places considerable computational resources into computing abstractions by using Ward's Method.  Thus \textit{minVarVB} leads to fewer probes being generated but provably forms abstractions that minimize the total within variance of $\mu(n)$ among the abstract states.
    
    
            \vspace{-6pt}
            \paragraph{\NoCaseChange{3. equalDistVB}.} \label{sec:ordered-partitioning-schemes:equalDistVB}
    
               % \textit{equalDistVB} 
                % Building upon the ideas of \textit{minVarVB} and the simplicity of \textit{simpleVB}, 
                In attempt to combine the intuition from \textit{minVarVB} and the speed of \textit{simpleVB}, 
                \textit{equalDistVB} greedily adds nodes in order of $\mu$ (low to high) into an abstract state $\bs{A_{i}}$ until
                \begin{align}
                    \mu(\bs{A_{1,...,i}}) \! = \!\! \sum_{j=1}^{i} \!\!\!\!\! \sum_{\;\;\;\; n \in \bs{A_{j}}} \!\!\!\! \mu(n) \geq \mathcal{Q}_{i} \! = \! \frac{i \cdot \sum_{n' \in \bs{n}} \mu(n')}{nAbs},
                \end{align}
                i.e., until the total sum of node values from $\bs{A_{1}},...,\bs{A_{i}}$ reaches or exceeds 
                $\frac{i}{nAbs}$ of the total across all of the nodes being partitioned.
                %the $\frac{i}{nAbs}$ quantile $\mathcal{Q}_{i}$.
                When paired with Q-based abstractions, 
                %the \textit{equalDistVB} schemes also attempts to
                \textit{equalDistVB} aims to partition nodes into equal mass states under the proposal, motivated by \citet{rizzo_2007}.  
                
                %This in corresponds to the condition in \citet{rizzo_2007}'s proposition for stratified importance sampling variance reduction.
    
                % \textit{Time Complexity:\super{\ref{ftn:time-complexity-assumes-constant-time-v}}}
                %     $Z(A_{1...i})$ can be updated progressively in constant time, and thus computation of $\mathcal{Q}_{i}$ at each iteration can also be done in constant time. Partitioning is achieved via one pass through $|\bs{n}|$ leading to $\mathcal{O}(|\bs{n}|)$ time complexity.
                % \textit{Space Complexity:\super{\ref{ftn:time-complexity-assumes-constant-time-v}}}
                %     No more than linear space is required.  $\mathcal{O}(|\bs{n}|)$.
                %\rina{For our running example we can get: }
                \textit{\hyperref[sec:ordered-partitioning-schemes:running-example]{Running Example}:}
                    \smallset{1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 10, 100}, \!\!\! \smallset{}, \!\!\! \smallset{}, \!\!\! \smallset{}.
                
                Although \textit{equalDistVB} hopes to strike a balance between efficiency and low variance of $\mu(n)$ within each abstract state,
                %intuitions previously explored while maintaining speed, 
                from the running example we can see it may yield undesirable partitionings for skewed distributions of $\mu(\cdot)$ values.  In the example, 
                all of the nodes need to be placed into the first of four abstract states before the sum of their values reaches/exceeds $\frac{1}{4}$ of the total of all nodes being partitioned.  Thus, the remaining abstract states end up empty.
                %the first quantile is only reached after all the nodes have been added to the first abstract state, leaving no nodes remaining to be partitioned into subsequent abstract states. 
    
    
            \vspace{-6pt}
            \paragraph{\NoCaseChange{4. equalDistVB2}.} \label{sec:ordered-partitioning-schemes:equalDistVB2}
                
                A second version of the equalDist scheme, \textit{equalDistVB2}, follows the same general strategy as \textit{equalDistVB} but uses a reversed sort ordering in attempt to mitigate overfilling of abstract states. Modifying the sort order from \textcolor{navyblue}{low-to-high} 
                to \textcolor{navyblue}{high-to-low} in Line \ref*{alg:psi-equalDistVB-combined:sort} of \alglink{alg:psi-equalDistVB-combined} converts \textit{equalDistVB} to \textit{equalDistVB2}.

                \textit{\hyperref[sec:ordered-partitioning-schemes:running-example]{Running Example}:}
                    \smallset{100}, \smallset{}, \smallset{}, \smallset{10, 1.5, 1.4, 1.3, 1.2, 1.1, 1.0}
                    
                    We see that \textit{equalDistVB2} can still over-pack abstract states.
                The next two variants aim to mitigate this issue further.


                    \begin{algorithm}[t]
                    \caption{$\Part{equalDistVB}$}
                    \label{alg:psi-equalDistVB-combined}
                    \begin{footnotesize}
                        \SetInd{0.25em}{0.55em}
                        \DontPrintSemicolon 
                        $\bs{n^{*}} \leftarrow SORT(\bs{n},\mu, \tn{\textcolor{navyblue}{low-to-high}})$ \label{alg:psi-equalDistVB-combined:sort}\\
                        $j \leftarrow 1$\\
                        \ForEach{$i \leftarrow 1,...,nAbs$}{
                            \textcolor{navyblue}{$\bs{A_{i}} \leftarrow \set{}$} \label{alg:psi-equalDistVB-combined:initialize-abstract-state}\\
                            \While{\textcolor{navyblue}{$\mu(\bs{A_{1,...,i}}) < \mathcal{Q}_{i}$}}{ \label{alg:psi-equalDistVB-combined:node-fill-cut-off}
                                $\bs{A_{i}} \leftarrow A_{i} \cup \set{n^{*}_{{j}}}$\\
                                $j \leftarrow j + 1$
                            }
                        }
                        $\bs{A} \leftarrow \cup_{i = 1}^{nAbs} \bs{A_{i}}$\\
                        \Return $\bs{A}$       
                    % }
                    \end{footnotesize}
                \end{algorithm}

    
            \vspace{-6pt}
            \paragraph{\NoCaseChange{5. equalDistVB3}.} \label{sec:ordered-partitioning-schemes:equalDistVB3}
        
                In order to further lessen over-packing and ensure abstract states are not left empty, \textit{equalDistVB3} modifies \textit{equalDistVB2} so that, after processing each abstract state, the next state always has a node added to it by default before checking the abstract state fill condition. Modifying the sort order from \textcolor{navyblue}{low-to-high}  to \textcolor{navyblue}{high-to-low} in Line \ref*{alg:psi-equalDistVB-combined:sort} and
                \textcolor{navyblue}{$\bs{A_{i}} \leftarrow \set{}$} to \textcolor{navyblue}{$\bs{A_{i}} \leftarrow \set{n^{*}_{{j}}}; j \leftarrow j+1;$} in Line \ref*{alg:psi-equalDistVB-combined:initialize-abstract-state} of \alglink{alg:psi-equalDistVB-combined}
                converts \textit{equalDistVB} to \textit{equalDistVB3}.
                
                \textit{\hyperref[sec:ordered-partitioning-schemes:running-example]{Running Example}:}
                    \smallset{100}, \smallset{10}, \smallset{1.5}, \smallset{1.4, 1.3, 1.2, 1.1, 1.0}.
                    
                While still very efficient, \textit{equalDistVB3} ensures that the provided $nAbs$ granularity is honored, allowing users better control of the search vs.~sampling interpolation possible with Abstraction Sampling.
    
    
            \vspace{-6pt}
            \paragraph{\NoCaseChange{6. equalDistVB4}.} \label{sec:ordered-partitioning-schemes:equalDistVB4}
    
                The final equalDist variant, \textit{equalDistVB4}, aims for more even partitioning. %than the previous variants 
                %by recomputing the fill condition that guides the filling of abstract states. 
                %for each abstract state with respect to the nodes remaining to be partitioned. 
                Before processing each abstract state $\bs{A_{i}}$, a new cut-off is determined based the remaining nodes $\bs{n_{rm}^{*}}$ and remaining abstract states:
                % \begin{align} \label{eq:progressive-quantile-i}
                %     \widehat{\mathcal{Q}}_{i} = \frac{\sum_{n \in \bs{n^{*}}} Z(n) - Z(\bs{A_{1,...,i-1}})}{nAbs-i+1}
                % \end{align}
                \begin{align} \label{eq:progressive-quantile-i}
                    \widehat{\mathcal{Q}}_{i} = \frac{\sum_{n \in \bs{n_{rm}^{*}}} \mu(n)}{nAbs-i+1}.
                \end{align}
                Nodes are added to abstract state $\bs{A_{i}}$ while $\mu(\bs{A_{i}}) < \widehat{\mathcal{Q}}_{i}$. Modifying the sort order from 
                \textcolor{navyblue}{low-to-high} to \textcolor{navyblue}{high-to-low} in Line \ref*{alg:psi-equalDistVB-combined:sort} and  \textcolor{navyblue}{$\mu(\bs{A_{1,...,i}}) < \mathcal{Q}_{i}$} to \textcolor{navyblue}{$\mu(\bs{A_{i}}) < \widehat{\mathcal{Q}}_{i}$} in Line \ref*{alg:psi-equalDistVB-combined:node-fill-cut-off} of \alglink{alg:psi-equalDistVB-combined}
                converts \textit{equalDistVB} to \textit{equalDistVB4}.
                
                \textit{\hyperref[sec:ordered-partitioning-schemes:running-example]{Running Example}:}
                    \smallset{100}, \smallset{10}, \smallset{1.5, 1.4, 1.3}, \smallset{1.2, 1.1, 1.0}.
                
                Still computationally efficient, \textit{equalDistVB4} spreads nodes with small values more evenly across abstract states.

    
            \vspace{-6pt}
            \paragraph{\NoCaseChange{7. randVB}.} \label{sec:ordered-partitioning-schemes:randVB}
    
                It can be beneficial to rely on randomness to ensure a diverse sampling of abstractions.  \textit{randVB} does this by sampling $nAbs\!-\!1$ partition points uniformly at random and without replacement from between nodes sorted according to $\mu(\cdot)$, and then partitions the nodes accordingly. The resulting abstract states ensure that nodes are still grouped according to $\mu(\cdot)$, but the sizes of those groups vary.

                \begin{algorithm}[!htb]
                    \caption{$\Part{randVB}$}
                    \label{alg:psi-randVB}
                    \begin{footnotesize}
                        \SetInd{0.25em}{0.55em}
                        \DontPrintSemicolon 
                    % \Input{A set of nodes $\bs{n}$ to be partitioned into $nAbs$ abstract states; a value function $\mu(.)$}
                    % \Output{\hyperref[def:value-ordered-partitioning]{Value-ordered partitioning} of $\bs{n}$ into abstract states $\bs{A} = \setst{\bs{A_{i}}}{i \in \set{1,...,nAbs}}$ }
                    
                    % \Begin{
                        $\bs{s} \sim Unif(\setst{\bs{M} \subseteq \set{1,...,|\bs{n}|-1}}{|\bs{M}|=nAbs-1})$\\
                        $\bs{s^{*}_{}} \leftarrow SORT(\bs{s})$\\
                        $\bs{n^{*}} \leftarrow SORT(\bs{n},\mu, \tn{high-to-low})$\\
                        $j \leftarrow 1$\\
                        \ForEach{$i \leftarrow 1,...,nAbs\!-\!1$}{
                            $\bs{A_{i}} \leftarrow \set{n^{*}_{j},...,n^{*}_{s^{*}_{i}}}$\\
                            $j \leftarrow s^{*}_{i}+1$
                        }
                        $\bs{A_{nAbs}} = \set{n^{*}_{j},...,n^{*}_{|n^{*}|}}$\\
                        $\bs{A} \leftarrow \cup_{i = 1}^{nAbs} \bs{A_{i}}$\\
                        \Return $\bs{A}$       
                    % }
                    \end{footnotesize}
                \end{algorithm}
                
                % \textit{Time Complexity:\super{\ref{ftn:time-complexity-assumes-constant-time-v}}}
                %     $\mathcal{O}(|\bs{n}|)$ time complexity.
                % \textit{Space Complexity:\super{\ref{ftn:time-complexity-assumes-constant-time-v}}}
                %     No more than linear space is required.  $\mathcal{O}(|\bs{n}|)$.
                \vspace{-6pt}
                \textit{\hyperref[sec:ordered-partitioning-schemes:running-example]{Running Example}:}
                    ex1: \smallset{100, 10}, \smallset{1.5}, \smallset{1.4, 1.3, 1.2}, \smallset{1.1, 1.0};
                    ex2: \smallset{100}, \smallset{10, 1.5, 1.4, 1.3}, \smallset{1.2, 1.1}, \smallset{1.0};
                    etc.

            \vspace{-6pt}
            \paragraph{Complexity.} 
                Assuming $\mu(\cdot)$ is $\mathcal{O}(1)$, each of the proposed partitioning schemes have time complexity $\mathcal{O}(|\bs{n}| \, \log |\bs{n}|)$ and space complexity $\mathcal{O}(|\bs{n}|)$, with the exception of \textit{minVarVB}, which requires $\mathcal{O}(|\bs{n}|^{2})$ for both.  More details can be found in the Supplemental Materials.




    \section{Random-Only Abstractions} \label{sec:randomized-abstractions}
        \vspace{-4pt}

        Another unexplored approach was to use purely randomized abstraction schemes. At first glance, one may not expect such schemes to perform well, but randomization in concert with an informative heuristic and proposal can be beneficial.

            \vspace{-6pt}
            \paragraph{Intuition.} % for Good Behavior.}
                First, given an informative heuristic, the stochastic selection of a representative node \emph{within} each abstract state using a good proposal function will typically opt for nodes that represent greater mass, which is generally beneficial in importance sampling. Second, the randomness of node assignments to the abstract states 
                %allows for the stochastic selection of nodes to be in relation to different combinations of nodes.  This 
                enables nodes that may otherwise have little chance of being selected to occasionally have a greater chance of selection, leading to a more diverse distribution of probes.
        
                % Not included in results (does worse anyway)
                % Similarly, by randomizing the sizes of the abstract states, we can allow for different distributions for selection within abstract states, again allowing for occasional greater chance of selection of nodes that may otherwise have low chances of selection thus leading to more diverse probes.

        


            \vspace{-6pt}
            \paragraph{\NoCaseChange{The simpleRand Scheme}.} \label{sec:purely-randomized-abstractions:simpleRand}
    
                More concisely referred to as RAND, the simpleRand scheme partitions nodes via a 2-step process: 1) nodes first are shuffled to create a uniformly random permutation, and then 2) the nodes are partitioned into (approximately) equal cardinality $nAbs$ abstract states.

                \begin{algorithm}[!htb]
                    \caption{$\Part{simpleRand}$}
                    \label{alg:RAND}
                    \begin{footnotesize}
                        \SetInd{0.25em}{0.55em}
                        \DontPrintSemicolon 
                    % \Input{A set of nodes $\bs{n}$ to be partitioned into $nAbs$ abstract states; a value function $\mu(.)$}
                    % \Output{\hyperref[def:value-ordered-partitioning]{Value-ordered partitioning} of $\bs{n}$ into abstract states $\bs{A} = \setst{\bs{A_{i}}}{i \in \set{1,...,nAbs}}$ such that $\forall \bs{A_{i}},\bs{A_{j}} \in \bs{A}, -1 \leq |\bs{A_{i}}|-|\bs{A_{j}}| \leq 1$}
                    
                    % \Begin{
                        $baseCardinality \leftarrow \floor{\frac{|\bs{n}|}{nAbs}}$\\
                        $extras \leftarrow |\bs{n}| \mod nAbs$\\
                        $\bs{n^{*}} \leftarrow RANDOM\us SHUFFLE(\bs{n})$\\
                        $j_{begin} \leftarrow 1$\\
                        \ForEach{$i \leftarrow 1,...,nAbs$}{
                            \uIf{$extras > 0$}{
                                $j_{end} \leftarrow j_{begin} + baseCardinality$\\
                                $extras \leftarrow extras - 1$
                            }
                            \uElse{
                                $j_{end} \leftarrow j_{begin} + baseCardinality - 1$
                            }
                            $\bs{A_{i}} \leftarrow \set{n^{*}_{{j_{begin}}}, ..., n^{*}_{{j_{end}}}}$\\
                            $j_{begin} \leftarrow j_{end}+1$
                        }
                        $\bs{A} \leftarrow \cup_{i = 1}^{nAbs} \bs{A_{i}}$\\
                        \Return $\bs{A}$       
                    % }
                    \end{footnotesize}
                \end{algorithm} 

                
                \textit{\hyperref[sec:ordered-partitioning-schemes:running-example]{Running Example}:}
                    \smallset{1.4, 1.1}, \smallset{1.2, 10}, \smallset{1.0, 1.3}, \smallset{100, 1.5}.

            \vspace{-6pt}
            \paragraph{Complexity.} 
                Both time and space are $\mathcal{O}(|\bs{n}|)$.

               

        


    \section{Empirical Evaluation} \label{sec:empirical-evaluation}
        \vspace{-4pt}

        %%%%%%%%%%%%%%%%%%% AS Algorithms Tested

        % \vspace{-6pt}
        \paragraph{Overview.}
            All combinations of \hyperref[sec:value-based-abstraction-classes]{Value-Based Abstraction Classes}: Heuristic-Based (\textbf{HB}), HR-Based (\textbf{HRB}), and Q-Based (\textbf{QB}); with each of the \hyperref[sec:ordered-partitioning-schemes]{Ordered Partitioning Schemes}: \semph{simpleVB}, \semph{minVarVB}, \semph{equalDistVB1-4}, and \semph{randVB}; were tested, resulting in twenty-one value-based abstraction functions.  
            The formerly evaluated \hyperref[sec:abstraction-sampling:existing-abstraction-functions]{context-based (\textbf{CTX}) abstraction functions}: randCB and relCB were compared against.  
            In addition, the \hyperref[sec:purely-randomized-abstractions:simpleRand]{purely random abstraction function, \textbf{RAND}}, was also included.  
            With the exception of RelCB, each abstraction function uses a hyper parameter, $nAbs$, which bounds the number of abstract states at any level. RelCB instead uses an $nCtx$ parameter that limits the number of context variables used in assigning abstract states.  To facilitate comparison, we report RelCB's $nCtx$ parameter instead as an equivalent $nAbs$ parameter assuming a domain size of $2$.  (For example, if RelCB was run using $nCtx = 6$, we report it with $nAbs = 2^{6}$). All abstraction functions were tested using the AOAS algorithm \citep{kask20-scaling-up-as}.  All algorithms were implemented in C++. All experiments were run on a 2.66 GHz processor and allotted 8 GB of memory.
        
        
        
        %%%%%%%%%%%%%%%%%%% Heuristic Description
        \vspace{-6pt}
        \paragraph{Heuristics.}
            To inform the sampling proposal, Weighted Mini-Bucket Elimination (wMBE) \citep{DBLP:journals/jacm/DechterR03,DBLP:conf/icml/LiuI11} -- which pairs well with AND/OR search \citep{Mateescu-and-or-search-and-variable-elimination} -- is used as a heuristic.  The i-bound (\textbf{iB}) parameter controls the strength of wMBE, where higher i-bounds generally lead to stronger heuristics, and thus better proposals, at the expense of more computation and memory. We standardize our experiments by using the same i-bound when comparing across algorithms. 
        
        

        
        %%%%%%%%%%%%%%%%%%% Benchmark Description
        \vspace{-6pt}
        \paragraph{Benchmarks.}
            
            In line with previous work on Abstraction Sampling, we perform experiments on the same set of over 400 problems from five benchmarks: DBN, Grids, Linkage-Type4, Pedigree, and Promedas used by \citet{kask20-scaling-up-as}. 


            
            
            \begin{centering}
            \begin{table}[!b]
                \centering
                \captionsetup{width=.95\linewidth}
                \caption{
                    \textbf{Exact Benchmark Statistics}. Average statistics for Exact problems. \textbf{N}: number of instances, \textbf{\tabs{X}}: average number of variables, \textbf{k}: average of problems' largest domain sizes, \textbf{w\super{*}}: average induced tree-width, \textbf{d}: average \PT depth. 
                    \label{tbl:small-benchmark-statistics}
                }
                \vspace{-6pt}
                \begin{footnotesize}                    
                \begin{tabular}{lrrrrr}
                  \toprule
                  Benchmark &   N &   |\textbf{X}| &     k &          w* &        d \\ 
                  \midrule
                        DBN &  66 &      67 &          2 &      29 &      30 \\ 
                      Grids &   8 &     250 &          2 &      22 &      49 \\ 
                   Pedigree &  25 &     690 &          5 &      25 &      89 \\ 
                   Promedas &  65 &     612 &          2 &      21 &      62 \\ 
                  \bottomrule
                \end{tabular}
                \end{footnotesize}
            \end{table}
            \vspace{-6pt}
            \end{centering}

           \begin{centering}
           \begin{table}[!b]
               \centering
               \captionsetup{width=.95\linewidth}
                \caption{
                    \textbf{LARGE Benchmark Statistics}. Average statistics for LARGE problems. \textbf{N}: number of instances, \textbf{\tabs{X}}: average number of variables, \textbf{k}: average of problems' largest domain sizes, \textbf{w\super{*}}: average induced tree-width, \textbf{d}: average \PT depth. 
                    \label{tbl:large-benchmark-statistics}
                }
                \vspace{-6pt}
                \begin{footnotesize}    
                \begin{tabular}{lrrrrr}
                  \toprule
                  Benchmark &   N &   |\textbf{X}| &        k &          w* &        d \\ 
                  \midrule
                            DBN &   48 &     216 &        2 &     78 &    78\\
                          Grids &   19 &    3432 &        2 &    117 &   220\\
                  Linkage-Type4 &   82 &    6550 &        5 &     45 &   761\\
                       Promedas &  173 &    1194 &        2 &     72 &   114\\
                  \bottomrule
                \end{tabular}
                \end{footnotesize}
           \end{table}
            \end{centering}
            
            We refer to problem instances with known $Z$ values as \textit{Exact}.  Larger problems without exact solutions are called \textit{LARGE}.  For LARGE problems, estimates from 10hr of AOAS using the RAND - RAND being well performing -  are used as the reference $Z$ value.  When experimenting on Exact problems, algorithms use a small i-bound of 5 (weakening the heuristic estimates) and were given a limited time of 300sec to increase difficulty.  For LARGE problems, an i-bound of 10 and time limit of 1200sec are used.

            %\bg{(Removed statement of focusing more on Exact problems)}
            % For both brevity and preciseness, we focus on results from the Exact problem instances. 
            % Results for LARGE problems can be found in the Supplemental Materials and their trends generally agree with those from the EXACT problems.

        
        
        %%%%%%%%%%%%%%%%%%% Performance Measure
        \vspace{-6pt}
        \paragraph{Performance Measure.}
            To evaluate performance, we define error as:    
            $Error = |\log_{10} \hat{Z} - \log_{10} Z^{*}|$,
            where $\hat{Z}$ is the estimate obtained from AS and $Z^{*}$ is the reference $Z$ value.  For Exact problems, $Z^{*}=Z$.
            %For LARGE problems whose true $Z$ is unknown, $Z^{*}$ was determined by a 10hr estimate produced by AOAS using the RAND abstraction scheme. \todo{repetetive?}

        \hfill
        \vspace{-24pt}
        \subsection{Results} \label{sec:empirical-evaluation:results}
        \vspace{-4pt}
        
            % \subsubsection{Aggregated Results Tables}
            
                % \vspace{-6pt}
                \paragraph{Summary Comparison.}
                    To examine potential of the different methods, we tested each algorithm with a range of $nAbs \! \in \! \smallset{1, 4, 16, 64, 256, 512, 1024, 2048}$. For each $nAbs$ and  benchmark, we calculated the average error across problems of the benchmark and identified the $nAbs$ that resulted in the lowest average error. \tablink{tbl:small-aggregations} focuses on Exact problems and shows this lowest average error and corresponding $nAbs$ for each algorithm.
                    %and benchmark, 
                    %highlighting schemes that performed well across all benchmarks.  
                    \tablink{tbl:large-qb-aggregations} shows the corresponding results for LARGE problems on the better performing QB and RAND classes, and the CTX class for comparison.  If an algorithm was unable to produce a positive Monte Carlo $Z$ estimate for a problem (denoted "Fail"), the wMBE heuristic bound was used as its $Z$ estimate and error computed accordingly. We highlight the best performing schemes.
    
                    % Tables \ref{tbl:DBN_aggregation}-\ref{tbl:Promedas_aggregation} show aggregated performance of the various Value-Based Abstraction Classes with the various Partitioning Schemes on problems of DBN, Grids, Linkage-Type4, and Promedas benchmarks.

                    \begin{tablefigure*}[!htb]
                        \centering
                        \begin{subtablefigure}{0.95\linewidth}
                            \centering     %%% not \center
                            \includegraphics[width=0.98\linewidth]{UAI-24/_attachments/Results/ALL-SMALL-aggregations-i-5-t-300.pdf}
                            \vspace{-4pt}
                            \caption{}
                            \label{tbl:small-aggregations}
                        \end{subtablefigure}
                        \begin{subtablefigure}{0.945\linewidth}
                            \centering
                            \includegraphics[width=0.895\linewidth]{UAI-24/_attachments/Results/QB-CTX-RAND-LARGE-aggregations-i-10-t-1200_NEW-REF-Z.pdf}
                            \vspace{-4pt}
                            \caption{}
                            \label{tbl:large-qb-aggregations}
                        \end{subtablefigure}
                        \captionsetup{width=.95\linewidth}
                        \vspace{-10pt}\caption{\textbf{Summary Comparison}. Each table shows the Abstraction Class (\textit{Class}), Partitioning Scheme (\textit{Scheme}), bound on the number of abstract states per level (\textit{nAbs}), number of problems for which a positive solution could not be estimated (\textit{Fail}), and average $\log_{10}Z$ error (\textit{Avg. Error}) across Exact problems (subtable (a)) and LARGE problems (subtable (b)) in each benchmark.  Color bars visualize error magnitudes. We hightliht the best performing algorithms: 
                        %Highlighted schemes are 
                        those for which: (1) difference in total average error (summed across the benchmarks) with respect to the best such total was less than 15\% of the best, and (2) within each individual benchmark, the difference in average error with respect to the best average error was less than 35\% of the best. (An exception to the latter criterion was granted to Exact DBN, on which the best average error from equalDistQB3 was unusually low).
                        %Subtable (a) shows results on Exact problems. Subtable (b) shows results on LARGE problems.
                %\vspace{-4pt}
                        }
                        \label{tbl:summary-aggregations}
                    \end{tablefigure*}

    
                \vspace{-6pt}
                \paragraph{Comparison using 100 Samples.} \label{sec:empirical-evaluation:results:aggregation-tables:set-number-of-samples}

                    \begin{tablefigure}[!htb]
                        \vspace{-6pt}
                        \centering
                        \includegraphics[width=0.99\linewidth]{UAI-24/_attachments/Results/ALL-SMALL-iB-5-nAbs-256-nR-100-QB-CB-RAND.pdf}
                        \captionsetup{width=.95\linewidth}
                        \vspace{-6pt}\caption{\textbf{100-Sample Comparison}. For abstraction granularity of $nAbs=256$, aggregated statistics (as described in \tablink{tbl:summary-aggregations}) for Exact problems of each benchmark with each algorithm allotted 100 samples.
                        }
                        \label{tbl:results:ALL-SMALL-iB-5-nAbs-256-nR-100-QB-CB-RAND}
                        % \vspace{-4pt}
                    \end{tablefigure}
        
                    To assess the quality of abstraction functions in an implementation-agnostic manner and irrespective of resulting probe-sizes or speed,
                    %of processing abstractions,
                    %However, as detailed in \seclink{sec:ordered-partitioning-schemes}, some schemes may exhibit variations in execution time, and implementation differences can contribute to this variability. 
                    % And as discussed in \seclink{sec:empirical-evaluation:results:abstraction-speed-plot}, probe sizes can also vary. 
                    %Probe sizes can also vary between use of different abstraction functions.
                    %To circumvent these artifacts, 
                    we conducted experiments using a one-hundred sample limit (\textbf{m-100}). 
                    % rather than a time constraint. 
                    \tablink{tbl:results:ALL-SMALL-iB-5-nAbs-256-nR-100-QB-CB-RAND} shows these results on Exact problems for the better performing QB and Rand classes using $nAbs=256$.  $nAbs=256$ was chosen as (1) it is an intermediate granularity and (2) all schemes produced 100 samples in a reasonable time. We highlight the best performing schemes.

            \vspace{-6pt}
            \paragraph{Varying \NoCaseChange{nAbs}.}

                \begin{tablefigure}[!htb]
                    \vspace{-6pt}
                    \centering
                    \includegraphics[width=0.99\linewidth]{UAI-24/_attachments/Results/varying-nAbs-SMALL-i-5-t-300-best-QB.pdf}
                    \captionsetup{width=.95\linewidth}
                    \vspace{-6pt}\caption{\textbf{Varying nAbs}. Average error when using $nAbs \in \set{4, 64, 1024}$ for minVarQB, equalDistQB3, equalDistQB4, the CTX based algorithms, and RAND, each with iB-5 and time limit of 300 sec.
                }
                % \vspace{-6pt}
                    \label{tbl:varying-nAbs-SMALL-i-5-t-300-best-QB}
                \end{tablefigure}


                
                %To see the effect of changing $nAbs$,
                \tablink{tbl:varying-nAbs-SMALL-i-5-t-300-best-QB} shows average error for 
                %\small{$nAbs \in \smallset{4, 64, 1024}$} 
                $nAbs \! \in \! \smallset{4, 64, 1024}$
                on Exact problems of each benchmark.  We focus on the better performing variants of QB: minVarQB, equalDistQB3, equalDistQB4; the purely randomized scheme RAND; and the context-based schemes (CTX) for comparison. In \figlink{plt:results:error-vs-nAbs-plot-minVarQB-iB-5} and \figlink{plt:results:error-vs-nAbs-plot-equalDistQB4-iB-5}, we also show average error across a wider array of $nAbs$ for minVarQB and equalDistQB4, respectively, the latter also acting as a representative for the profile of equalDistQB3 and RAND.

                \begin{figure}[htb!]
                \vspace{-6pt}
                    \centering
                    \includegraphics[width=0.99\linewidth]{UAI-24/_attachments/Results/error-vs-nAbs-plot-minVarQB-iB-5}
                    \captionsetup{width=.95\linewidth}
                    \vspace{-8pt}\caption{\textbf{Varying $\bs{nAbs}$ for minVarQB}. Average error on Exact problems using iB-5 and time limit 300 sec for each benchmark at various abstraction granularities (in $\log_2$).
                }
                    \label{plt:results:error-vs-nAbs-plot-minVarQB-iB-5}
                \end{figure}
                

                \begin{figure}[!htb]
                \vspace{-14pt}
                    \centering
                    \includegraphics[width=0.99\linewidth]{UAI-24/_attachments/Results/error-vs-nAbs-plot-equalDistQB4-iB-5}
                    \captionsetup{width=.95\linewidth}
                    \vspace{-8pt}\caption{\textbf{Varying $\bs{nAbs}$ for equalDistQB4}. Average error on Exact problems using iB-5 and time limit 300 sec for each benchmark at various abstraction granularities (in $\log_2$).
                % \vspace{-4pt}
                }
                    \label{plt:results:error-vs-nAbs-plot-equalDistQB4-iB-5}
                \end{figure}



            \vspace{-6pt}
            \paragraph{Time Series Plot.}

                \figlink{plt:results:grid20x20.f15-time-series} and \figlink{plt:results:or_chain_209.fg-time-series} show time-series results for the better performing QB algorithms, RAND, and  CTX schemes on a representative Grids and Promedas problem.  Each algorithm was plotted with the $nAbs$ that resulted in the lowest average error for the respective benchmark.  
                %Each plot line is labeled with the scheme, $nAbs$ used, and the final $Error$.

                \begin{figure}[tb]
                    \centering
                    \includegraphics[width=0.99\linewidth]{UAI-24/_attachments/Results/grid20x20.f15-time-series.png}
                    \captionsetup{width=.95\linewidth}
                    \vspace{-8pt}\caption{Z estimates from various algorithms versus time on Exact Grids problem grid20x20.f15  using $iB=5$. The dashed black line shows the true Z value.
                \vspace{-12pt}
                }
                    \label{plt:results:grid20x20.f15-time-series}
                \end{figure}

                \begin{figure}[tb]
                    \centering
                    \includegraphics[width=0.95\linewidth]{UAI-24/_attachments/Results/or_chain_209.fg-time-series.png}
                    \captionsetup{width=.99\linewidth}
                    \vspace{-8pt}\caption{Z estimates from various algorithms versus time on Exact Promedas problem or\us chain\us 209.fg  using $iB=5$. The dashed black line shows the true Z value.
                % \vspace{-4pt}
                }
                    \label{plt:results:or_chain_209.fg-time-series}
                \end{figure}

                 



        % \vspace{-14pt}
        \subsection{Analysis} \label{sec:empirical-evaluation:analysis}
        \vspace{-4pt}

            % \vspace{-6pt}
            \paragraph{Comparison with Context-Based Schemes.}

                \tablink{tbl:small-aggregations} shows that there is always a partitioning scheme for HB and HRB that can outperform the best CTX scheme on Exact problems.  For HB, the \textit{simple} and \textit{rand} partitioning schemes perform best, whereas for the HRB class it is more benchmark dependent.  
                QB with \textit{minVar}, \textit{equalDist3}, and \textit{equalDist4} partitioning outperform the CTX schemes across all benchmarks.  RAND also consistently outperforms the CTX schemes.  Results from \tablink{tbl:large-qb-aggregations} on LARGE problems agree, with the exception of QB with \textit{minVar} and RAND, which fall slightly shy of randCB's performance on Promedas.

            \vspace{-6pt}
            \paragraph{Comparison with Purely Randomized Abstractions.}
                \tablink{tbl:summary-aggregations} shows RAND is a particularly well performing scheme across all benchmarks.  However, the QB class using the \textit{equalDist3} and \textit{equalDist4} strategies is consistently comparable or better than the purely randomized scheme. No other scheme does as well.

            \vspace{-6pt}
            \paragraph{Comparison with Non Abstraction Sampling Schemes.}
                In prior work by \citet{DBLP:conf/uai/BrokaDIK18} and \citet{kask20-scaling-up-as}, Abstraction Sampling using CTX based abstractions was shown as competitive against several powerful schemes such as Importance Sampling (IS), Weighted Mini-Bucket Importance Sampling (wMBIS) \citep{liu2015probabilistic}, IJGP-SampleSearch (IJGP-ss) \citep{DBLP:journals/ai/GogateD11}, and Dynamic Importance Sampling \citep{lou2019interleave}.  Thus, superior performance against CTX schemes implicitly indicates competitiveness against the these other methods.
                %non-Abstraction Sampling schemes.

            \vspace{-6pt}
            \paragraph{Abstraction Quality of the QB Schemes.}
                When drawing an equal number of samples with the same abstraction granularity of $nAbs=256$ (\tablink{tbl:results:ALL-SMALL-iB-5-nAbs-256-nR-100-QB-CB-RAND}), QB with \textit{equalDist3} and \textit{equalDist4} and RAND are well performing as seen when using a time limit (\tablink{tbl:summary-aggregations}).  
                A key difference is that QB with \textit{minVar}, which had showed slightly worse performance under a time limit, is now best.
                This in part explains the success of QB \textit{equalDist3} and \textit{equalDist4}, which try to emulate QB \textit{minVar} while using faster greedy strategies.

            \vspace{-6pt}
            \paragraph{Anytime Behavior.}
                \figlink{plt:results:grid20x20.f15-time-series} and \figlink{plt:results:or_chain_209.fg-time-series} show that Abstraction Sampling  estimates continue to improve as time progresses.  We also notice that estimates are often underestimates that increase over time, a common phenomenon of importance sampling due to the proposal distribution’s tails.
                


            % \subsubsection{The Effect of \NoCaseChange{iB}

            \vspace{-6pt}
            \paragraph{Choice of Abstraction Granularity.}
                From \tablink{tbl:varying-nAbs-SMALL-i-5-t-300-best-QB} we see that for the well performing QB \textit{equalDist3} and \textit{equalDist4} schemes and for the RAND scheme there is a trend that greater $nAbs$ 
                %(corresponding to a greater allotment of abstract states) 
                improves performance.
                %to a point and then has little effect.
                \figlink{plt:results:error-vs-nAbs-plot-equalDistQB4-iB-5} further supports this for QB with \textit{equalDist4}, for which plots of QB \textit{equalDist3} and RAND have similar profiles (omitted for brevity).  However in \figlink{plt:results:error-vs-nAbs-plot-minVarQB-iB-5} and \tablink{tbl:varying-nAbs-SMALL-i-5-t-300-best-QB} we see that for \textit{minVar} error begins to increase when $nAbs$ becomes too high.  This can be explained by  the higher computational cost of forming \textit{minVar} abstractions (which is more time consuming), leaving less time for probe generation.

            \vspace{-6pt}
            \paragraph{Summary of Results.}
               Experiments show the QB scheme with \textit{equalDist3} or \textit{equalDistQB4} and RAND performing the best of the newly proposed abstraction functions, significantly outperforming the former state-of-the-art (\figlink{fig:results:performance-matrix}).  
               These schemes tend to improve  as the abstraction granularity $nAbs$ increases up to a point, past which we see little difference in performance.  Thus, our study suggests that 
               %given an i-bound, we suggest use of one of 
               these three abstraction schemes should be the first choice when using AOAS, and be used with the largest $nAbs$ feasible.

                \begin{figure}
                    \centering
                    \includegraphics[width=0.50\linewidth]{UAI-24/_attachments/Results/performance-matrix.pdf}
                    \captionsetup{width=.95\linewidth}
                    \vspace{-6pt}\caption{\textbf{Performance Matrix}. Relative average performance of value-based schemes vs.~existing %state-of-the-art
                    context-based abstractions.  Values $> 1.00$ indicate superior performance.
                % \vspace{-10pt}
                }
                    \label{fig:results:performance-matrix}
                \end{figure}




    \section{Conclusion} \label{sec:conlcusion}
    \vspace{-4pt}
            
        This exploration of abstraction functions for use with AND/OR Abstraction Sampling (AS) featured a new value-based abstraction framework, introducing three abstraction classes: HB, QB, and HRB each defined by real-valued functions that aim to capture informative elements from search and sampling to guide abstractions and improve AS performance. Each class was tested with each of seven node partitioning schemes to form twenty-one new abstraction functions. Additionally, a new purely randomized abstraction scheme, RAND, was presented that places nodes into equal cardinality abstract states completely at random.
        
        Results from an extensive empirical evaluation on over 400 benchmark problems show two of the QB based schemes (\textit{equalDistQB3}, and \textit{equalDistQB4}) and the RAND scheme having superior performance consistently and throughout all benchmarks. In particular, performance was significantly improved relative to former state-of-the-art context-based abstractions, and thus also implicitly against Importance Sampling, Weighted Mini-Bucket Importance Sampling, IJGP-SampleSearch, and Dynamic Importance Sampling.
        
        Based on this study and earlier findings, we believe that AOAS is one of the best schemes for estimating the partition function to date.
        Future work will explore adjusting the abstraction schemes to problem instances through learning and also the potential for applying adaptive sampling. 
        %We observed a trend that allotting these abstraction functions a high number of abstract states helps them perform best. 

        \vspace{-4pt}
        \subsubsection*{Acknowledgements} 
        \vspace{-8pt}
            Thank you to the reviewers for their valuable comments and suggestions. This work was supported in part by NSF grants IIS-2008516 and CNS-2321786.


        
\clearpage
    % \bibliographystyle{named}
    \bibliography{ref}





\end{document}