% \documentclass{uai2023} % for initial submission
\documentclass[accepted]{./uai_camera_ready/uai2023} % after acceptance, for a revised
                                    % version; also before submission to
                                    % see how the non-anonymous paper
                                    % would look like
%% There is a class option to choose the math font
% \documentclass[mathfont=ptmx]{uai2023} % ptmx math instead of Computer
                                         % Modern (has noticable issues)
% \documentclass[mathfont=newtx]{uai2023} % newtx fonts (improves upon
                                          % ptmx; less tested, no support)
% NOTE: Only keep *one* line above as appropriate, as it will be replaced
%       automatically for papers to be published. Do not make any other
%       change above this note for an accepted version.

%% Choose your variant of English; be consistent
\usepackage[american]{babel}
% \usepackage[british]{babel}

%% Some suggested packages, as needed:
% \usepackage{natbib} % has a nice set of citation styles and commands
\usepackage{preamble}
\bibliographystyle{plainnat}
    \renewcommand{\bibsection}{\subsubsection*{References}}
\usepackage{mathtools} % amsmath with fixes and additions
% \usepackage{siunitx} % for proper typesetting of numbers and units
\usepackage{booktabs} % commands to create good-looking tables
\usepackage{tikz} % nice language for creating drawings and diagrams

%% Provided macros
\input{newMacros}
% \smaller: Because the class footnote size is essentially LaTeX's \small,
%           redefining \footnotesize, we provide the original \footnotesize
%           using this macro.
%           (Use only sparingly, e.g., in drawings, as it is quite small.)

%% Self-defined macros
\newcommand{\swap}[3][-]{#3#1#2} % just an example


% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% \usepackage{xr-hyper} 
% \makeatletter
% \newcommand*{\addFileDependency}[1]{% argument=file name and extension
% \typeout{(#1)}% latexmk will find this if $recorder=0
% % however, in that case, it will ignore #1 if it is a .aux or 
% % .pdf file etc and it exists! If it doesn't exist, it will appear 
% % in the list of dependents regardless)
% %
% % Write the following if you want it to appear in \listfiles 
% % --- although not really necessary and latexmk doesn't use this
% %
% \@addtofilelist{#1}
% %
% % latexmk will find this message if #1 doesn't exist (yet)
% \IfFileExists{#1}{}{\typeout{No file #1.}}
% }\makeatother

% \newcommand*{\myexternaldocument}[1]{%
% \externaldocument{#1}%
% \addFileDependency{#1.tex}%
% \addFileDependency{#1.aux}%
% }
% \myexternaldocument{uai_camera_ready_supp}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%% Only generate the first 11 pages in the pdf 
% \usepackage[1-11]{pagesel}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\title{Risk-limiting Financial Audits via Weighted Sampling without Replacement}
% The standard author block has changed for UAI 2023 to provide
% more space for long author lists and allow for complex affiliations
%
% All author information is authomatically removed by the class for the
% anonymous submission version of your paper, so you can already add your
% information below.
%
% Add authors
\author[1]{\href{mailto:<shubhan2@andrew.cmu.edu>?Subject=Your UAI 2023 paper}{Shubhanshu Shekhar}{}}
\author[1]{Ziyu Xu}
\author[2, 3]{Zachary Lipton}
\author[3]{Pierre Liang}
\author[1, 2]{Aaditya Ramdas}
% Add affiliations after the authors
\affil[1]{%
    Department of Statistics and Data Science\\
    Carnegie Mellon University\\
    Pittsburgh, Pennsylvania, USA
}
\affil[2]{%
    Machine Learning Department\\
    Carnegie Mellon University\\
    Pittsburgh, Pennsylvania, USA
}
\affil[3]{%
    Tepper School of Business\\
    Carnegie Mellon University\\
    Pittsburgh, Pennsylvania, USA
}

\begin{document}
\maketitle

\begin{abstract}
We introduce the notion of risk-limiting financial audits~(RLFA): \blue{procedures that manually evaluate a subset of $N$ financial transactions to  check the validity of a claimed assertion $\mc{A}$ about the transactions. More specifically, RLFA satisfy two properties: 
(i) if $\mc{A}$ is false, they correctly disprove it with probability at least $1-\delta$, and (ii) they validate the correctness of $\mc{A}$ with probability $1$, if it is true.}
We propose a general RLFA strategy, by  constructing new confidence sequences~(CSs)  for the weighted average of $N$ unknown values,  based on samples drawn without replacement  from a (randomized) weighted sampling scheme.  Next, we develop methods to improve the quality of CSs by incorporating side information about  the unknown values. We show that when the side information is sufficiently accurate, it can directly drive the sampling. For the case where the accuracy is unknown \emph{a priori}, we introduce an alternative approach using control variates. Crucially, our construction adapts to the quality of side information by strongly leveraging the side information if it is highly predictive, and learning to ignore it if it is uninformative.  
Our methods also recover the state-of-the-art bounds for the special case of uniformly sampled observations with no side information, which has already found applications in election auditing. The harder weighted case with general side information solves the more challenging problem of AI-assisted financial auditing.

\end{abstract}

% v1


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Introduction}
    \label{sec:introduction}


    Consider the following scenario: in a given year, 
    a company has $N$ recorded financial transactions 
    with reported monetary values $M(i) \in (0, \infty)$ 
    for each $i \in [N] \coloneqq \{1, \dots, N\}$.
    As required by law, an external auditor is required 
    to attest with ``reasonable assurance'' 
    about whether the financial records 
    as a whole are free from ``material misstatement.''
    %must check that (in most cases) these transactions are accounted for correctly,
   For example, the company has cash receipts for sales of products, 
   and it wants to ensure that the reported monetary value 
   matches the true amount that was made on the sales 
   according to prescribed accounting rules as some receipts 
   may actually represent past sales or future deliveries. 
   This can be done, for instance, 
   by manually examining the entire sales process 
   to determine the true sales amount 
   against the the amount recorded by the company.
    %verifying that the correct amount of money was actually added to the company's bank account.
    Since the task of \emph{auditing} each transaction
    can be %increasingly 
    complex %and is done by a person, 
    requires substantial human labor
    it can be prohibitively expensive to perform
    a comprehensive audit of a company's records.

    Suppose that the auditor has built 
    an AI system for ``automated auditing'',
    i.e., this AI system can output predictions 
    about the accuracy of a transaction value, 
    based on receipts, OCR (optical character recognition), databases, etc. 
    Such systems are in a state of active development and deployment,
    and the high level of industry demand is unsurprising
    given the remarkable predictive capabilities 
    of modern machine learning algorithms.
    But there's a catch: because the system is trained and deployed
    on differently distributed data, 
    its accuracy on a new set of records 
    in a new time period
    is unknown \emph{a priori}.
    Even if anecdotally, the AI system seems to perform reasonably well
    on data collected from a variety of companies, 
    we cannot make statistically certifiable conclusions
    based solely on the output of the AI system 
    on a new company and/or in a new time period.
    % 
    % However, the auditor cannot assume the system is reliable
    % due to its complexity and the nonstationarity of the audited transactions and evidence. 
    Thus we can think of AI systems in deployment
    as black boxes for which we have
    (reasonable) hopes of high accuracy
    but lack formal guarantees.


    The auditor's goal is to minimize the amount of manual auditing that must be done by a person, while accurately estimating the true monetary amount of those transactions that have not manually audited. When the AI system is accurate, we want to reduce the amount of human auditing effort required. More importantly, we want a statistically rigorous conclusion regardless of the AI system accuracy. Hence, our method should interpolate between using predictions to reduce its uncertainty rapidly when the system is accurate, and the most efficient AI-free strategy if it is inaccurate.

    \paragraph{Problem setup and notation.} Denote the unknown misstated fraction of the $i$th transaction as $f(i) \in [0,1]$, for each $i \in [N]$. In other words, if $M^*(i)$ denotes the true value of the transaction $i$, and $M(i)$ is the reported value, then~\footnote{We are primarily concerned with estimating the downside that arises from misstatement, e.g., $M(i)$ represents the money that should have been received for a sale, and $M^*(i)$ represents the actual money received. In this scenario, we may lose at most $M(i)$ amount of money if $M^*(i) = 0$. Hence, we assume $f(i) \in [0, 1]$. 
    % We leave the unbounded case for future work.
    } $f(i) = |M^*(i)-M(i)|/M(i)$. We can normalize the reported transaction values by the sum over all transaction values to get a weight $\pi(i) \coloneqq M(i) / (\sum_{i = 1}^N M(i))$ for each $i \in [N]$, where $\sum_{i=1}^n \pi(i) = 1$. The auditor wishes to obtain an estimate of  $m^* = \sum_{i=1}^N \pi(i)f(i)$, the fraction of the total monetary value that is misstated, up to an accuracy $\varepsilon \in [0,1]$. By $S(i)$, we denote the \emph{side information}, a score for the $i$th transaction 
    that (ideally) predicts $f(i)$.
    In our setup, the side information can be generated through any method, e.g., through an AI system that automatically analyzes the documents 
    a human auditor would use, may also be available to the auditor.
    Each transaction can be evaluated 
    by the auditor to reveal $M^*(i)$ 
    (or equivalently, $f(i)$). 
    % Thus, \emph{given an $\varepsilon>0$, 
    % in what order should the transactions be audited 
    % to estimate $m^*$ within $\varepsilon$ additive accuracy, 
    % using the fewest number of calls to the auditor?}

    
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    % \paragraph{Risk-limiting financial audit (RLFA).} Formally, a $(\varepsilon, \delta)$-\textit{risk-limiting financial audit (RFLA)} is a procedure that outputs an interval $\mc{C}$ where $|\mc{C}| \leq \varepsilon$ and $\mc{C}$ contains the true misstated fraction, $m^*$, with probability at least $1 - \delta$. This is a natural generalization of risk-limiting audits that are used to ensure statistically valid election auditing \citep{stark_conservative_statistical_2008a,stark_cast_canvass_2009,lindeman2012gentle} to the financial setting, where each transaction is weighted by its reported monetary value (as opposed to uniform weighting for all votes in the election setting). We also consider other possible definitions of an RLFA in \Cref{sec:alt-defs}. Our goal is to produce $\mc{C}$ that satisfies the conditions of an RLFA with as few audits, i.e, queries of $f$, as possible. To produce such an interval, we propose a framework for building RLFAs by constructing confidence sequences, which we introduce next.

    \paragraph{Risk-limiting financial audit (RLFA).} Motivated by the analogous concept of risk limiting election audits~\citep{stark_conservative_statistical_2008a, stark_cast_canvass_2009}, we  use risk limiting financial audits~(RLFA) to refer to any procedure that checks the validity of an assertion $\mc{A}$ about the true misstated fraction $m^*$ by manually evaluating a subset of the $N$ transactions. Formally, given a risk limit $\delta \in (0,1)$, an RLFA method should satisfy these two properties: 
    \begin{itemize}
        \item If $\mc{A}$ is false, it correctly identifies this with probability at least $1-\delta$, while manually auditing as few transactions as possible. 
        \item If $\mc{A}$ is true, the procedure never refutes it. 
    \end{itemize}
    Following~\citet{stark2020sets, waudby2021rilacs}, we consider assertions about $m^*$ lying in a subset of $[0,1]$, and we overload the notation to use $\mc{A}$ to denote both the assertion, and the subset of $[0,1]$. Then, the auditing task can be stated as a sequential hypothesis testing problem
    \begin{align}
        H_0: m^* \not \in \mc{A}, \quad \text{vs.} \quad 
        H_1: m^* \in \mc{A}. 
    \end{align}
    The task then reduces to defining a stopping time $\tau \equiv \tau(\mc{A}, \delta)$ at which we stop and reject $H_0$, satisfying the properties: (i) $\mathbb{P}_{H_0}\lp \tau < N \rp \leq \delta$, and (ii) $\mathbb{P}_{H_1}\lp \tau < N \rp = 1$. 
    % 
    We refer to this formulation as \emph{regulatory}~(or external) RLFA, since it takes the perspective of an external auditor asked to verify the claim $\mc{A}$. This formulation takes a hypothesis testing perspective of auditing, similar to the existing approaches in prior works in this area. 
    
    An interesting variation of the above formulation is the \emph{friendly}~(or internal) RLFA, where an in-house auditor also performs the correction in the reported value of each manually evaluated transaction. In other words, for every manually evaluated transaction $i \in [N]$, we have $f(i)=0$. As a result, we can define the notion of residual misstated fraction after $t$ transactions~($I_1, \ldots, I_t$) have been audited, denoted by $m^*_t = m^* - (\sum_{j=1}^T M(I_j)f(I_j))/(\sum_{i=1}^N M(i))$. Using this term, we can now define RLFA from an estimation perspective. In particular, for any given assertion $\mc{A} \subset [0,1]$, we can consider the test: 
    \begin{align}
        &H_{0}: \cap_{n=1}^N \cup_{t=n}^N \{m^*_t \not \in \mc{A} \},   \\
        \text{versus} \quad &H_{1}: \cup_{n=1}^N \cap_{t=n}^N \{m^*_t \in \mc{A}\}.\quad \label{eq:friendly-rlfa}
    \end{align}
    The auditor's objective, as before, is to define a stopping time $\tau$ at which to reject $H_0$. One constraint required by this formulation, to reject $H_0$ at some $t<N$ is  that the set $\mc{A}$ must be such that if $m^*_t \in \mc{A}$ for some $t$, then $m^*_{t'} \in \mc{A}$  for all $t' > t$. A sufficient condition for this is if $\mc{A} = [0, \varepsilon]$ for some $\varepsilon \in (0,1)$. With this choice, the connection to estimation is explicit: the audit stops as soon as the residual misstated fraction $m^*_t$ falls below $\varepsilon$; or equivalently, it stops as soon as the misstated fraction $m^*$  is known within an accuracy of $\varepsilon$.
    For simplicity, we will focus on this specific instance of RLFA for the rest of this paper, and we formally record its definition next. 
    \begin{definition}[$\rlfa$]
    \label{def:eps-delta-rlfa}
        For $\epsilon, \delta \in (0,1)$, consider the RLFA problem with assertion $\mc{A} = [0,\epsilon]$, and $H_0$ and $H_1$ as defined in~\eqref{eq:friendly-rlfa}. Then, an $\rlfa$ procedure is any stopping time $\tau = \tau(\varepsilon, \delta)$ that satisfies $\mathbb{P}_{H_0} \lp \tau < N \rp \leq \delta$, and $\mathbb{P}_{H_1}(\tau<N)=1$. 
    \end{definition}

    
    Our general strategy for developing $\rlfa$ procedures relies on constructing confidence sequences for $m^*$; that is, a sequence of sets $\{\mc{C}_t \subset [0,1]\}$ that satisfy $\mathbb{P}\lp \forall t \in [N]: m^* \in \mc{C}_t \rp \geq 1-\delta$.  The \emph{risk limit} $\delta \in (0,1)$ plays a vital role, as it allows for the possibility of certifying $\mc{A}$ by evaluating only a small subset of the $N$ transactions~(i.e., $\tau << N$). That is, if $\delta$ were $0$, 
    then the best strategy is simply to audit the transactions 
    in decreasing order of their reported monetary value,
    and stop only when the remaining transactions 
    constitute smaller than an $\varepsilon$ fraction of the total. 
    However, we as we show in this paper, 
    even for a small $\delta >0$~(e.g., $0.01$), 
    there exist strategies based on randomized sampling \wor 
    that allow us to stop much earlier. 
    In other words, for each $t \in [N]$, 
    we adaptively construct a sampling distribution $q_t$ 
    over the remaining $N-t+1$ unaudited transactions,
    and sample $I_t$, the index of the $t$th transaction to audit, 
    according to $q_t$. 
    We then obtain $f(I_t)$ through manual auditing, 
    and incorporate this new information 
    to update our estimate of $m^*$. 
    If our residual uncertainty is sufficiently small~(i.e., 
    smaller than $\varepsilon$), we stop sampling. 
    Otherwise, we continue the process 
    by drawing the next index, $I_{t+1}$, 
    according to an appropriately chosen distribution $q_{t+1}$.  
    
    Before presenting the technical details,
    we note that we use $(X_t)_{t \in \mathbb{I}}$ 
    to denote a sequence of objects indexed by a set $\mathbb{I}$, 
    and the $t$th object is $X_t$. 
    We drop the indexing subscript if it is clear from context. 
    For any $t \in [N]$, 
    we use $\filtration_t \coloneqq \sigma(\{I_i\}_{i \in [t]})$ 
    to denote the sigma-algebra over our query selections 
    for the first $t$ queries.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    
    \paragraph{Confidence sequences for sequential estimation.}  Let $T \in [N]$ be a random stopping time, that is,  a random variable for which the event $\{T = t\}$ belongs to $\filtration_t$ for each $t \in [N]$, and  let $\mc{T}$ denote the universe of all such stopping times. \emph{Confidence sequences}~\citep{lai1976confidence, howard2021time} (CSs), or time-uniform confidence intervals, are sequences of intervals, $(\mc{C}_t)_{t \in [N]}$, that satisfy
    \begin{align}
        \label{eq:conf-seq}
        \underset{T \in \mc{T}}{\sup}\ \prob{m^* \not\in \mc{C}_T} \leq \delta \Leftrightarrow
        \mathbb{P} \lp \exists t \in [N]:  m^* \not \in \mc{C}_t \rp \leq \delta,
    \end{align} where $\delta \in (0, 1)$ is a fixed error level. \citet{ramdas_admissible_anytime-valid_2020} showed the equivalence above, i.e., that any sequence of intervals $(\mc{C}_t)$ that satisfies one side of the implication will immediately satisfy the other as well. 
    % Hence, both sides of the implication are interchangeable definitions for a CS.
    
    Using this equivalence, we can define a simple $\rlfa$ procedure: construct a CS for $m^*$, denoted by $(\mc{C}_t)$, and produce $\mc{C}_\tau$ where $\tau$ is the following stopping time:
    \begin{gather}
    \tau = \tau(\varepsilon, \delta) \defined  \min \{t \geq 1: |\mc{C}_t| \leq \varepsilon \}. \label{eq:TauDef}
    \end{gather}
    %As a result, constructing any CS $(\mc{C}_t)$ for $m^*$, and stopping at time $\tau$ .
    The width of all nontrivial CSs converges to zero as $t \to N$, 
    and thus the above stopping time is well-defined, 
    and is usually smaller than $N$. To see its relation to $\rlfa$ procedure, see~\Cref{remark:logical-cs-friendly-rlfa}. 
    
    Note that the only source of randomness in this problem 
    is the randomized sampling strategy $(q_t)_{t \in [N]}$, 
    used to select transactions for manual evaluation. 
    Hence, $(q_t)_{t \in [N]}$ is another design choice for us to make. 
    To summarize, our goal in  this paper is 
    to \textbf{(i)} design sampling strategies $(q_t)$, 
    and \textbf{(ii)} develop methods of aggregating 
    the information so collected with any available side information, 
    in order to construct CSs for $m^*$
    whose width decays rapidly to $0$. 
    
    Among existing works in literature, the recent papers by~\citet{waudby2020estimating, waudby2020confidence} 
    are the most closely related to our work. 
    In these works, the authors considered 
    the problem of estimating the average value of $N$ items 
    via \wor sampling---however, they considered only uniform sampling, 
    and estimating only the unweighted mean of the population. 
    Our methods work with any sampling scheme, 
    and can estimate any weighted mean; 
    we recover their existing results 
    in Appendix~D. 

%    \begin{algorithm}[h]
%            \label{alg:AuditingProblem}
%            \caption{Procedure for estimating $m^*$ with additive accuracy $\varepsilon$ and confidence $1-\delta$, by designing sampling strategies $(q_t)_{t \in [N]}$ and a CS $(\mathcal{C}_t)_{t \in [N]}$.}
%            \begin{algorithmic}
%                \STATE \textbf{Input:} $(M_i)_{i \in [N]}$ proposed transaction costs. $(S(i))_{i \in [N]}$ side information (optional). Desired error level $\delta \in (0, 1)$, and interval width $\varepsilon \in (0, 1)$.
%                \STATE $F_0 \coloneqq \emptyset$
%                \FOR{$t \in 1, 2, \dots, N$}
%                \STATE Construct $q_t$ from $F_{t - 1}$ (and $(S(i))_{i \in [N]}$ if available).
%                    \STATE Sample $I_t \sim q_t$.
%                    \STATE Query $f(I_t)$.
%                    \STATE $F_t \coloneqq F_{t - 1} \cup \{(t, I_t, f_{I_t})\}$
%                    \STATE 
%                    \STATE Construct $\mathcal{C}_t$ from $F_t$ (and $(S(i))_{i \in [N]}$ if available).
%                    \STATE \textbf{if }$|\mathcal{C}_t| < \varepsilon$ \textbf{then break}
%                \ENDFOR
%                \STATE $\pi_i \coloneqq M_i / (\sum_{i \in [N]} M_i)$ for $i \in [N]$
%                \STATE $m^* \coloneqq \sum_{i \in [N]} \pi(i)f(i)$
%                \STATE \textbf{Ensures:} $\prob{m^* \in \mathcal{C}_t \text{ for all }t \in [N]} \geq 1 - \delta$.
%            \end{algorithmic}
%        \end{algorithm}
        %% related work on fixed sample size CI

        \paragraph{\wor confidence intervals for a fixed sample size.}~Most existing results on concentration inequalities for observations drawn via \wor sampling focus on the fixed sample size setting, starting with  \citet{hoeffding1963probability}, who bounded the probability of deviation of the unweighted empirical mean with \wor sampling in terms of the range of the observations. In particular, \citet{hoeffding1963probability} showed that for observations $X_{I_1}, \ldots, X_{I_n} \in [a,b]$ drawn uniformly \wor from $N$ values $(X_i)_{i \in [N]}$, we have
        \begin{align}
        {
            \label{eq:hoeffding-fixed-time}
            \mathbb{P}\lp \tfrac{\sum_{t=1}^n X_{I_t}}{n}  - \tfrac{\sum_{t=1}^N X_i}{N}  > \varepsilon \rp \leq \exp \lp - \tfrac{2 n \varepsilon^2}{(b-a)^2} \rp.}
        \end{align}
        In \wor sampling, as the sample size $n$ approaches $N$, the total number of items, we expect the empirical estimate to approximate the true average very accurately. This observation, not captured by the above bound, was made formal by \citet{serfling1974probability}, who showed that the $n$ in~\eqref{eq:hoeffding-fixed-time} can be replaced by $\frac{n}{1-(n-1)/N}$, thus highlighting the significant improvement possible for larger $n$ values.
        \citet{ben2018weighted} prove a Hoeffding style concentration inequality on the unweighted sample mean to its own expectation, which is a different estimand than the weighted population mean.  Finally, in the unweighted case, \citet{bardenet2015concentration} obtained variance adaptive Bernstein and empirical-Bernstein variants of Serfling's results, that are tighter in cases where the variance of the observations is small. 
        These results appear to be incomparable to those of~\citet{waudby2020confidence,waudby2020estimating}, 
        that have found successful application to auditing elections~\cite{waudby2021rilacs}.
        % When auditing elections, all ballots are equal, but in financial auditing, the transactions are unequal, and thus the need for the (sequential) weighted WoR methods we develop.

    \subsection{Contributions}
    \label{subsec:overview}
    We introduce the concept of \emph{risk limiting financial audits}~(RLFA) that generalizes 
    the notion of a risk-limiting audits introduced by \citet{stark_conservative_statistical_2008a} for election auditing. 
    % \blue{Unlike risk-limiting audits, where the main concern is testing an announced result, the objective of an RLFA can be generalized to also  estimate the misstated monetary fraction of the reported financial transactions.} 
    In particular, we make the following key technical contributions:
        \begin{enumerate}[itemsep=0em, leftmargin=*]
            \item \emph{New CSs for weighted means with non-uniform sampling.} To design an $\rlfa$ procedure, we construct novel CSs for $m^*$ that are based on a betting method that was pioneered in \citep{waudby2020estimating} in \Cref{sec:side information}, as well as Hoeffding and empirical-Bernstein CSs in Appendix~C (which are looser but have a simple analytical form). Our results generalize previous methods in two ways: \textbf{(i)} they can estimate the weighted mean of $N$ items, and \textbf{(ii)} they work with adaptive, data-dependent, sampling strategies.
            % 
            In particular, our betting CSs, which we show empirically are the most powerful in Appendix~E) are based on simultaneously playing  gambling games with an aim to disprove the possibility that $m^* =m$, for each $m \in [0, 1]$. Values for $m$, where we accumulate much wealth are eliminated from the CS. Consequently, we develop a simple, lucrative betting strategy for this setting (\kelly), which is equivalent to formulating narrower CSs. 
            %
            \item \emph{Adaptive sampling strategies that minimize CS width.} In addition to designing CSes that are intrinsically narrow, we are also able to change the sampling distribution of the transactions at each time step, and develop a sampling strategy that will minimize CS width in concert with any valid CS construction. 
            We propose two sampling strategies, \propM and \propMS, the latter of which can incorporate approximately accurate scores $(S(i))_{i \in [N]}$ to improve the sample efficiency of our CSs. This is accomplished by choosing the sampling distribution, at each time step, that maximizes the wealth accumulated by the betting strategies that underlie our CSs. We find that this is approximately equivalent to choosing the sampling distribution with the minimal variance, and we show that our sampling strategies result in a noticeable improvement over uniform sampling through simulations in \Cref{sec:experiments}. 
            %
            \item \emph{Robust use of side information to tighten CSs.} Finally, in~\Cref{sec:side information}, we develop a principled way of leveraging any available side information, inspired by the idea of control variates used for variance reduction in  Monte Carlo sampling. Interestingly, our method adapts to the quality of the side information---if $(S(i))_{i \in [N]}$ and $(f(i))_{i \in [N]}$ are highly correlated, the resulting CSs are tighter, while in the case of uncorrelated $(S(i))$, we simply learn to discard the side information. 
            % 
            % the resulting CS. When the side information has no correlation, the resulting CS is simply the one that would have been produced if no side information was provided at all. Consequently, any informative side information generally improves our CSs.
\end{enumerate}



% v0

% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% \section{Introduction}
%     \label{sec:introduction}
%     One of the greatest strengths of AI systems (e.g., large deep learning models) is their ability to synthesize inhumanly large amounts of complex data into meaningful answers and predictions. One of the greatest weaknesses of AI systems is that they often do not have statistical guarantees about their performance. One such area we would hope for guarantees about the AI's output is in the field of financial auditing.

%     Consider the following scenario: in a given year, a company has $N$ recorded financial transactions with reported monetary values $M(i) \in (0, \infty)$ for each $i \in [N] \coloneqq \{1, \dots, N\}$. As required by law, an external auditor is required to attest with ``reasonable assurance'' about whether the financial records as a whole are free from ``material misstatement.''
%     %must check that (in most cases) these transactions are accounted for correctly,
%    For example, the company has cash receipts for sales of products, and it wants to ensure that the reported monetary value matches the true amount that was made on the sales according to prescribed accounting rules as some receipts may actually represent past sales or future deliveries. This can be done, for instance, by manually examining the entire sales process to determine the true sales amount against the the amount recorded by the company.
%     %verifying that the correct amount of money was actually added to the company's bank account.
%     However, the task of \emph{auditing} each transaction still must be done by a person, and can be increasingly complex --- hence, auditing transactions incurs a large human capital cost for the company.

%     Suppose that the auditor has built an AI system for ``automated auditing'', i.e., this AI system can output predictions about the accuracy of a transaction value, based on receipts, OCR (optical character recognition), databases, etc. However, the auditor cannot assume the system is reliable, due to its complexity and the nonstationarity of the audited transactions and evidence. We think of the AI system as a black box with no formal guarantees, but hope it is accurate.

%     The auditor's goal is to minimize the amount of manual auditing that must be done by a human, while accurately estimating the true monetary amount of those transactions that have not manually audited. When the AI system is accurate, we want to reduce the amount of human auditing effort required. More importantly, we want a statistically rigorous conclusion regardless of the AI system accuracy. Hence, our method should interpolate between using predictions to reduce its uncertainty rapidly when the system is accurate, and the most efficient strategy AI-free strategy when the system is inaccurate.

%     \noindent \textbf{Problem setup and notation.} Denote the unknown misstated fraction of the $i$th transaction as $f(i) \in [0,1]$, for each $i \in [N]$. In other words, if $M^*(i)$ denotes the true value of the transaction $i$, and $M(i)$ is the reported value, then~\footnote{We are primarily concerned with estimating the downside that arises from misstatement, e.g., $M(i)$ represents the money that should have been received for a sale, and $M^*(i)$ represents the actual money received. In this scenario, we may lose at most $M(i)$ amount of money if $M^*(i) = 0$. Hence, we assume $f(i) \in [0, 1]$. We leave the unbounded case for future work.} $f(i) = |M^*(i)-M(i)|/M(i)$. We can normalize the reported transaction value of each transaction by the sum over all transaction values to get a weight $\pi(i) \coloneqq M(i) / (\sum_{i = 1}^N M(i))$ for each $i \in [N]$, and $\sum_{i=1}^n \pi(i) = 1$ as a result. The company wishes to obtain an estimate of  $m^* = \sum_{i=1}^N \pi(i)f(i)$, the fraction of the total monetary value that is misstated, up to an accuracy $\varepsilon \in [0,1]$. The company may have access to \emph{side information} $S(i)$, a score for the $i$th transaction that is predictor of $f(i)$, or generated using any method, e.g., through the AI system that automatically analyzes the documents a human auditor would use.
%     Each transaction can be evaluated by a human auditor to reveal $M^*(i)$ (or equivalently, $f(i)$). Thus, \emph{given an $\varepsilon>0$, in what order should the company assign the transactions to the auditor to estimate $m^*$ within $\varepsilon$ additive accuracy, using the fewest number of calls to the human auditor?}

%     If we allow for no uncertainty, i.e., we want to produce a confidence interval (CI) $m^*$ with 100\% confidence, then the best strategy is to audit the transactions in decreasing order of their reported value, and stop when the remaining transactions constitute smaller than an $\varepsilon$ fraction of the total. However, if we want to provide an estimate of $m^*$ that is $\varepsilon$-accurate with probability at least $1-\delta$, for a tolerance level $\delta \in (0,1)$ like 0.01,  we show that there exist strategies based on randomized sampling \wor that allow us to stop much earlier, i.e., for each $t \in [N]$, we adaptively construct a sampling distribution $q_t$ over the transactions that are not part of the first $t - 1$ audited transactions and sample $I_t$, the index of the $t$th transaction to audit, according to $q_t$. Let $(X_t)_{t \in \mathbb{I}}$ denotes a sequence of objects where the index set is $\mathbb{I}$, and the $t$th object is $X_t$. We drop the indexing subscript if it is clear from context. Let $\filtration_t \coloneqq \sigma(\{I_i\}_{i \in [t]})$ be the sigma-algebra over our query selections for the first $t$ queries. In other words, $(\filtration_t)_{t \in [N]}$ forms a filtration that captures the queries we know so far. Finally, we would like to stop as soon as we can ensure we can provide a $\varepsilon$ sized interval that captures $m^*$ with $1 - \delta$ probability. %In this setup, we also show that if the available side information is either sufficiently accurate, or sufficiently correlated with the true misstated fractions, then we can exploit this to further reduce the auditor requests.

%     \noindent \textbf{Estimation with weighted sampling WoR} The problem discussed above requires us to design a method for quantifying the uncertainty about the parameter $m^*$ based on the information collected by sampling without replacement from $f(i)$ according to user chosen distributions. Thus, we can conceptualize this problem as, for any sequence of data-adaptive sampling distributions $(q_t)$, and set of weights $(\pi(i))_{i \in [N]}$, construct a sequence of intervals $(\mc{C}_t)_{t \in [N]}$, such that the following property holds:
%     \begin{align}
%         \prob{m^* \not\in \mc{C}_{\tau}} \leq \delta \label{eq:GoalCI}
%     \end{align}
%      where  \(\tau = \tau(\varepsilon, \delta) \defined  \min \{t \geq 1: |\mc{C}_t| \leq \varepsilon \}\), i.e., the first time the first interval that has at most $\varepsilon$ additive error. This is equivalent to requiring that $\mc{C}_\tau$ is a $(1 - \delta)$-confidence interval (CI). Note that $\tau$ is a data-dependent stopping time, and not (necessarily) a fixed time. Typical CIs only possess their error controlling properties for a predetermined fixed time, and \textit{not} data-dependent stopping times.
    
%     \noindent \textbf{Confidence sequences for estimation and testing.} % We summarize the setup of the problem and a framework for algorithms solving the problem in \Cref{alg:AuditingProblem}.
%     Thus, the key technical tool we employ to tackle this problem are \emph{confidence sequences}~\citep{lai1976confidence, howard2021time} (CSs), or time-uniform confidence sets. Define a stopping time $T \in [N]$ as a random variable where $\ind{T = t}$ is predictable w.r.t.\ $\filtration_t$ for each $t \in [N]$ and let $\mc{T}$ be the universe of stopping times. CSs are sequences of intervals, $(\mc{C}_t)_{t \in [N]}$, that satisfy
%     \begin{align}
%         \label{eq:conf-seq}
%         \underset{T \in \mc{T}}{\sup}\ \prob{m^* \not\in \mc{C}_T} \leq \delta \Leftrightarrow
%         \mathbb{P} \lp \exists t \in [N]:  m^* \not \in \mc{C}_t \rp \leq \delta,
%     \end{align} where $\delta \in (0, 1)$ is a fixed error level. \citet{ramdas_admissible_anytime-valid_2020} showed the biimplication above, i.e., that any sequence of intervals $(\mc{C}_t)$ that satisfies one side of the implication will immediately satisfy the other as well. Hence, both sides of the implication are interchangeable definitions for a CS. As a result, any CS $(\mc{C}_t)$ would be sufficient to satisfy the requirement in \eqref{eq:GoalCI}.
%     Further, for all nontrivial CSs, we expect that their width $|\mc{C}_t|$ converges to zero as $t \to N$. Hence, we can stop auditing the transactions at the first time $\tau$ where the width of the CS is less than $\varepsilon$.
    
%     An important point to note is that the only source of randomness in the above discussion is the randomized sampling strategies $(q_t)_{t \in [N]}$, that is used for selecting which transactions to be evaluated by the human auditor. Hence, we can choose $q_t$ that make the estimation easier, e.g., by reducing variance in the outcomes, and is another tool in our arsenal for reducing the width of our CSs. Among existing works in literature, the recent papers by~\citet{waudby2020estimating, waudby2020confidence} are the most closely related to our work. In these works, the authors considered the problem of estimating the average value of $N$ items via \wor sampling --- however, they only considered uniform sampling, and only were interested in estimating the unweighted mean of the population. The methods we introduce in this paper are strictly more general, and we show how can recover these existing results in \Cref{sec:HoefEBComparison}. In contrast, our financial auditing problem can be thought of estimating the weighted mean of a finite population, using non-uniform sampling strategies.

% %    \begin{algorithm}[h]
% %            \label{alg:AuditingProblem}
% %            \caption{Procedure for estimating $m^*$ with additive accuracy $\varepsilon$ and confidence $1-\delta$, by designing sampling strategies $(q_t)_{t \in [N]}$ and a CS $(\mathcal{C}_t)_{t \in [N]}$.}
% %            \begin{algorithmic}
% %                \STATE \textbf{Input:} $(M_i)_{i \in [N]}$ proposed transaction costs. $(S(i))_{i \in [N]}$ side information (optional). Desired error level $\delta \in (0, 1)$, and interval width $\varepsilon \in (0, 1)$.
% %                \STATE $F_0 \coloneqq \emptyset$
% %                \FOR{$t \in 1, 2, \dots, N$}
% %                \STATE Construct $q_t$ from $F_{t - 1}$ (and $(S(i))_{i \in [N]}$ if available).
% %                    \STATE Sample $I_t \sim q_t$.
% %                    \STATE Query $f(I_t)$.
% %                    \STATE $F_t \coloneqq F_{t - 1} \cup \{(t, I_t, f_{I_t})\}$
% %                    \STATE 
% %                    \STATE Construct $\mathcal{C}_t$ from $F_t$ (and $(S(i))_{i \in [N]}$ if available).
% %                    \STATE \textbf{if }$|\mathcal{C}_t| < \varepsilon$ \textbf{then break}
% %                \ENDFOR
% %                \STATE $\pi_i \coloneqq M_i / (\sum_{i \in [N]} M_i)$ for $i \in [N]$
% %                \STATE $m^* \coloneqq \sum_{i \in [N]} \pi(i)f(i)$
% %                \STATE \textbf{Ensures:} $\prob{m^* \in \mathcal{C}_t \text{ for all }t \in [N]} \geq 1 - \delta$.
% %            \end{algorithmic}
% %        \end{algorithm}
%         %% related work on fixed sample size CI

%         \noindent \textbf{\wor confidence intervals valid at a fixed sample size.}~Unlike the aforementioned results, most other work on concentration inequalities for observations drawn via \wor sampling are valid only at a fixed sample size $n$, starting with  \citet{hoeffding1963probability}, who bounded the probability of deviation of the unweighted empirical mean with \wor sampling in terms of the range of the observations. In particular, \citet{hoeffding1963probability} showed that for observations $X_{I_1}, \ldots, X_{I_n} \in [a,b]$ drawn uniformly \wor from $N$ values $(X_i)_{i \in [N]}$, we have
%         \begin{align}
%         {
%             \label{eq:hoeffding-fixed-time}
%             \mathbb{P}\lp \tfrac{\sum_{t=1}^n X_{I_t}}{n}  - \tfrac{\sum_{t=1}^N X_i}{N}  > \varepsilon \rp \leq \exp \lp - \tfrac{2 n \varepsilon^2}{(b-a)^2} \rp.}
%         \end{align}
%         In \wor sampling, as the sample size $n$ approaches $N$, the total number of items, we expect the empirical estimate to approximate the true average very accurately. This observation, not captured by the above bound, was made formal by \citet{serfling1974probability}, who showed that the $n$ in~\eqref{eq:hoeffding-fixed-time} can be replaced by $\frac{n}{1-(n-1)/N}$, thus highlighting the significant improvement possible for larger $n$ values.
%         \citet{ben2018weighted} prove a Hoeffding style concentration inequality on the unweighted sample mean to its own expectation, which is a different estimand than the weighted population mean.  Finally, in the unweighted case, \citet{bardenet2015concentration} obtained variance adaptive Bernstein and empirical-Bernstein variants of Serfling's results, that are tighter in cases where the variance of the observations is small. These results were uniformly dominated by those of~\citet{waudby2020confidence,waudby2020estimating}, that have found successful application to auditing elections~\cite{waudby2021rilacs}. When auditing elections, all ballots are equal, but in financial auditing, the transactions are unequal, and thus the need for the (sequential) weighted WoR methods we develop.

%     \subsection{Contributions}
%     \label{subsec:overview}
%     Our main contributions are as follows:
%         \begin{enumerate}[itemsep=0em]
%             \item \emph{Novel CSs for weighted means with non-uniform sampling.} We construct CSs for $m^*$ that are based on a betting method that was pioneered in \citep{waudby2020estimating} in \Cref{sec:side information}, as well as Hoeffding and empirical-Bernstein CSs in \Cref{sec:hoeffding-empirical-bernstein} (which are looser but have a simple analytical form). Our results generalize previous methods to enable the estimation of weighted means (i.e., $m^*$ is the weighted mean of $(f(i))_i \in [N])$ with weights determined by $(\pi(i))_{i \in [N]}$), and under sampling distributions that are not uniform and adaptive to the $f(i)$ that have been observed so far. In particular, our betting CSs, which we show empirically are the most powerful in \Cref{sec:HoefEBExperiments}) are based on simultaneously playing a gambling game associated with each potential value in the support of $m^*$, i.e., $[0, 1]$. Values for games where we accumulate much wealth are eliminated from the CS. Consequently, we develop a simple, lucrative betting strategy for this setting (\kelly), which is equivalent to developing narrower CSs. As a result, we propose the first nontrivial CSs for estimation the weighted mean of a finite population, and under adaptive sampling strategies.
%             \item \emph{Adaptive sampling strategies that minimize CS width} In addition to designing CSes that are intrinsically narrow, we are also able to change the sampling distribution of the transactions at each time step, and develop a sampling strategy that will minimize CS width in concert with any valid CS construction. 
%             We propose two sampling strategies, \propM and \propMS, the latter of which can incorporate approximately accurate scores $(S(i))_{i \in [N]}$ to improve the sample efficiency of our CSs. This is accomplished by choosing the sampling distribution, at each time step, that maximizes the wealth accumulated by the betting strategies that underlie our CSs. We find that this is approximately equivalent to choosing the sampling distribution with the minimal variance, and we show that our sampling strategies result in a noticeable improvement over uniform sampling through simulations in \Cref{sec:experiments}. 
%             \item \emph{Robust use of side information to tighten CSs.} Finally, in~\Cref{sec:side information}, we show that we can use side information to reduce the width of our CSs. Our method is inspired by the idea of control variates used for variance reduction in  Monte Carlo sampling. Interestingly, our method adapts to the quality of the side information --- the higher the correlation is between the side information $(S(i))_{i \in [N]}$ and the misstated fractions $(f(i))_{i \in [N]}$, the tighter the resulting CS. When the side information has no correlation, the resulting CS is simply the one that would have been produced if no side information was provided at all. Consequently, any informative side information generally improves our CSs.
% \end{enumerate}



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Betting-based CS construction}
    \label{subsec:betting-CS-no-side-info}
    We derive our CSs by designing sequential tests to simultaneously check the hypotheses that $m^*=m$, for all $m \in [0,1]$. 
    By the principle of \emph{testing by betting}~\citep{shafer_testing_betting_2021}, 
    this is equivalent to playing repeated gambling games 
    aimed at disproving the null $m^*=m$, for each $m \in [0,1]$. 
    Formally, for all $m \in [0,1]$, 
    we construct a process $(W_t(m))_{t \in [N]}$~(the wealth process), 
    such that \textbf{(i)} if $m=m^*$, 
    then $(W_t(m))$ is a \emph{test martingale}, 
    i.e., a nonnegative martingale with initial value $1$, 
    and \textbf{(ii)} if $m \neq m^*$,
    then $W_t(m)$ grows at an exponential rate.
    Recall that a process $(W_t)_{t \in [N]}$ adapted to $(\filtration_t)_{t \in [N]}$ is a supermartingale iff  $\expect[W_t \mid \filtration_{t - 1}] \leq W_{t - 1}$ for all $t \in [N]$,
    and a martingale if the inequality is replaced with an equality. 
    Assuming we can construct such a process, 
    we define the confidence set at any time $t$ 
    as the set of those $m \in [0,1]$ 
    for which $(W_t(m))$ is `small', 
    because a nonnegative martingale 
    is unlikely to take large values. 
    
    As mentioned earlier, this approach requires us 
    to design sampling distributions $(q_t)$, 
    and a method for constructing a CS $(\mathcal{C}_t)$ 
    from the queried indices.  
    We begin by formally defining a sampling strategy.
    \begin{definition}[Sampling Strategy]
        \label{def:sampling-strategy}
        A sampling strategy consists of a sequence $(q_t)_{t \in [N]}$, where $q_t$ is a probability distribution on the set $\mc{N}_t \defined [N] \setminus \{I_1, \ldots, I_{t-1}\}$. Here $I_j$ denotes the index drawn according to the predictable~(i.e., $\filtration_{j-1}$-measurable) distribution $q_j$. 
    \end{definition}
%%
% \vspace{-10pt}
%%

    A natural baseline sampling strategy is to set $q_t$ to be uniform over  $\mc{N}_t$ for all $t \in [N]$. We will develop other, more powerful, sampling strategies that are more suited to our problem in~\Cref{sec:sampling-strategies}. 

    We now describe how to construct the wealth process for an arbitrary sampling strategy. First, define the following:
%%
% \vspace{-5pt}
%
    \begin{align}
        Z_t \coloneqq f(I_t) \tfrac{\pi(I_t)}{q_t(I_t)},\text{ and } \mu_t(m) \defined m - \sum_{j=1}^{t-1} \pi(I_j) f(I_j).
    \end{align} Note that $\mu_t(m)$ is the remaining misstated fraction after accounting for the first $t - 1$ queries to $f$ if $m$ is truly the total misstated fraction.
    Now, we can define the \emph{wealth process}:
    \begin{align}
        W_t(m) = W_{t-1}(m) \times \lp 1 + \lambda_t(m)\lp Z_t - \mu_t(m) \rp\rp, \label{eq:wealth-process-0}
    \end{align}
    with $W_0=1$. $(\lambda_t(m))_{t \in [N]}$ is a predictable sequence with values in $[0,1/u_t(m)]$, and  $u_t(m)$ is the largest value in the support of $Z_t - \mu_t(m)$, for each $t \in [N]$. 
     Note that this constraint on $(\lambda_t(m))$  ensures that  $W_t(m)$ is nonnegative for each $t \in [N]$. We also let $W_0(m) = 1$ for all $m \in [0, 1]$. If we view the wealth process as the wealth we earn from gambling on the outcome of $Z_t - \mu_t(m)$, then $(\lambda_t(m))$ represents a betting strategy, i.e.,  how much money to gamble each turn. Hence, we refer to $(\lambda_t(m))$ as a \emph{betting strategy}.

    It is easy to verify that $(W_t(m^*))$ is a nonnegative martingale  for any sampling strategy $(q_t)$ and betting strategy $(\lambda_t(m^*))$). Hence, it is unlikely to take large values, as we describe next. 
    \begin{proposition}
       \label{prop:type-I}
       For any sampling and betting strategies $(q_t)$ and $(\lambda_t(m^*))$, the following holds:
       \begin{align}
           \mathbb{P}\lp \exists t \geq 1: W_t(m^*) \geq 1/\delta \rp \leq \delta.
       \end{align}
    \end{proposition}
    %%
    % \vspace{-10pt}
    %%
    This is a consequence of Ville's inequality, first obtained by~\citet{ville1939etude}, which is a time-uniform version of Markov's inequality for nonnegative supermartingales.
    This result immediately implies that  for any sampling strategy, and any betting strategy, the term $m^*$ must lie in the set
    \begin{align}
        \label{eq:conf-seq-def-1}
        \mc{C}_t = \{m : W_t(m) < 1/\delta\}
    \end{align}    with probability at least $1-\delta$, making $(\mc{C}_t)$ a $(1 - \delta)$-CS. 
    \begin{theorem}
        $(\mc{C}_t)$ is an $(1 - \delta)$-CS, where $\mc{C}_t$ defined by \eqref{eq:conf-seq-def-1}. Hence, the associated stopping time $\tau$ is an $\rlfa$, for any sampling strategy $(q_t)$ and betting strategies $(\lambda_t(m))$ for each $m \in [0, 1]$. Recall that the $\tau$ is defined in \eqref{eq:TauDef} as the first time where $|\mc{C}_t| \leq \varepsilon$.
        \label{thm:CSthm}
    \end{theorem}
    % \vspace{-5pt}
    This methodology gives us flexible framework for constructing different $(\mc{C}_t)$ that result in different RLFAs. Now, we can turn our attention to finding betting strategies $(\lambda_t(m))$ that reduces the CS width quickly and minimizes $\tau$.

    \begin{remark}
        \label{remark:betting-CS-width}
        Note that the set $\mc{C}_t$ in~\eqref{eq:conf-seq-def-1}, does not admit a closed form expression, and is computed numerically in practice by choosing $m$ values over a sufficiently fine grid on $[0,1]$. In~Appendix~C, we design CSs based on nonnegative supermartingales~(instead of martingales) that do admit closed form representation. However, this analytical tractability comes as the price of empirical performance, as we demonstrate in~Appendix~E. 
    \end{remark}    

    \begin{remark}
        \label{remark:cs-optimality} 
        Ville's inequality~(Fact~1 in~Appendix~A.2), used for proving~\Cref{prop:type-I},  is known to be tight for continuous-time nonnegative martingales with infinite quadratic variation, and incurs a slight looseness as we move to the case of discrete time martingales. As a result, the martingale-based CSs constructed in this section provide nearly tight coverage guarantees, that are strictly better than the supermartingale based closed-form CSs discussed in~Appendix~C. This near-tightness of the error probability of our betting-based CSs implies that there exists no other CS that is uniformly tighter than ours, while also controlling the error probability below $\alpha$. In other words, our CSs satisfy a notion of admissibility or Pareto-optimality. 
    \end{remark}
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    \subsection{Powerful betting strategies}
    \label{sec:powerful-bet-strat}
        Besides validity, we also want the size of the CS to shrink rapidly. This depends on how quickly the values of $W_t(m)$ for $m \neq m^*$ grow with $t$.
        One such criterion is to consider the \emph{growth rate}, 
        i.e., the expected logarithm of the outcome of each bet.
        We can define the \emph{one-step growth rate} $D_n$, 
        for each $n \in [N]$ as follows:
        \begin{align}
            % \frac{1}{n} \log \lp W_n(m) \rp = \frac{1}{n} \sum_{t=1}^n \log \lp 1 + \lambda_t(m) \lp Z_t - \mu_t(m) \rp \rp.
            D_{n}(m, \lambda)&\defined  \log (1 + \lambda(Z_t - \mu_t(m))).
            %G_n(m, \lambda) &\defined \frac{1}{n}\sum_{t = 1}^n D_t(m, \lambda).
        \end{align}
        We are interested in maximizing the expected logarithm of the wealth process \citep{grunwald_safe_testing_2020,shafer_testing_betting_2021}, since it is equivalent to minimizing the expected time for a wealth process to exceed a fixed threshold (asymptotically, as the threshold grows larger) \citep{breiman_optimal_gambling_1961}. Thus, \textit{in the context of the auditing problem, 
        maximizing $\expect[D_t(\lambda, m) \mid \filtration_{t - 1}]$, approximately minimizes $\expect[\tau]$}. The one-step growth rate is a broadly studied objective known as the ``Kelly criterion'' \citep{kelly_new_interpretation_1956}.
        In general, finding the best sequence of bets $\lambda_t(m)$ for different values of $n$ is non-tractable. Instead we consider the approximation $\log(1+x) \geq x - x^2$ for $|x| \leq 1/2$, and define the best constant bet $\lambda^*_n$ in hindsight, as
        \begin{align}
            B_t(m, \lambda) &\defined \lambda \lp Z_t - \mu_t(m)\rp - \lambda^2 \lp Z_t - \mu_t(m) \rp^2, \label{eq:Bn}\\
            \lambda^*_n &\defined \underset{\lambda \in [\pm 1/2c]}{\argmax}\ \frac{1}{n} \sum_{t = 1}^n B_t(m, \lambda),
            \label{eq:approx-lambda-star}
        \end{align}
        where  $c = \max \{|Z_t - \mu_t(m)|: t\in[n]\}$. We get the following result on $\lambda^*_n$ for each $n \in [N]$:
        \begin{gather}
            \lambda^*_n \propto \frac{\sum_{t = 1}^n Z_t - \mu_t(m)}{\sum_{t = 1}^n (Z_t - \mu_t(m))^2} \defined \frac{A_n}{V_n}.
        \end{gather}
        Since $\lambda^*_n$ depends on the $n$th sample itself, $Z_n$, we cannot use this strategy in our CS construction. Instead, at any $n \in [N]$, we can use a predictable approximation of this strategy, that we shall refer to as the \kelly betting strategy. This strategy sets $\lambda_t(m)$ as follows:
        \begin{align}
            \lambda_t(m) = c_t \frac{A_{t - 1}}{V_{t -1}}, \label{eq:approx-kelly} \tag*{(\kelly)}
        \end{align}
        where the (predictable) factor $c_t$ is selected to ensure that $\lambda_t(m) \times \lp Z_t - \mu_t(m) \rp \in (-1, \infty)$, i.e., to satisfy the nonnegativity constraint of $(W_t(m))$.

        \begin{remark}
            \label{remark:other-betting-methods}
            Note that there exist several other betting schemes  besides \kelly, such as those based on alternative approximations of $\log(1+x)$~\citep{fan_exponential_inequalities_2015, waudby2020estimating, ryu2022confidence}, or the ONS strategy that relies on the exp-concavity of the $\log$-loss~\citep{cutkosky2018black}. In practice, however, we did not observe significant difference in their performance, and we focus on the \kelly\ strategy due to its conceptual simplicity.
        \end{remark}
    
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    \subsection{Logical CS}\label{subsec:logical-CS}
        Irrespective of the choice of the sampling and betting strategies, we can construct a CS that contains $m^*$ with probability 1, based on purely logical considerations. After sampling $t$ transactions, we know that $m^*$ is lower bounded by quantities derived from the the misstatement fraction accumulated in the items we have sampled already. %We also know that $m^*$ is upper bounded by the maximum possible misstatement fraction if the remaining $f(i)$ values (i.e., $f(i)$ for each $i \in \mathcal{U}_t$) are at their maximum value of 1. 
        Hence, we can derive the following lower and upper deterministic bounds on $m^*$:
        % \vspace{-10pt}
        \begin{align}
            L_l(t) \defined \sum_{j=1}^t \pi(I_j)f(I_j), \quad  
            % \leq m^*\\
            U_l(t) \defined L_l(t) + \sum_{i \in \mc{U}_t} \pi(i).
            % \geq m^* .
        \end{align}
        Note that $L_l(t)$~(resp. $U_l(t)$) values are obtained by noting that all the remaining unknown $f$ values must be larger than $0$~(resp. smaller than $1$).
        Additionally, due to the time-uniform nature of confidence sequences,% we can discard any points in the set $C_t$ that were absent from any $C_{t'}$, for $t'<t$. Thus, 
         we can intersect the logical CS with a `probabilistic' CS constructed in~\eqref{eq:conf-seq-def-1}, and obtain the following CS:
        \begin{align}
            \label{eq:combined-conf-seq}
            \widetilde{\mc{C}}_t \defined \mc{C}_t \cap [L_\ell(t), U_\ell(t)] \cap \widetilde{\mc{C}}_{t - 1},
        \end{align}
        where $\widetilde{C}_{0} \defined [0,1]$. Note that we may take the running intersection of a CS since it remains a CS, simply by definition. Consequently, the combined CS in \eqref{eq:combined-conf-seq} dominates the probabilistic CS.
        \begin{remark}
           \label{remark:logical-cs-friendly-rlfa}
            Note that at any $t \geq 1$, the residual misstatement $m^*_t$ is equal to $m^* - L_l(t)$. Thus, if $m^* \in \widetilde{C}_t$, and $|\widetilde{C}_t| \leq \varepsilon$, then by definition, we must have $m^*_{t'} \leq \varepsilon$ for all $t' \geq t$. This means that the stopping time defined in~\eqref{eq:TauDef} by incorporating logical CS  is an $\rlfa$ procedure. 
        \end{remark}

% 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Sampling Strategies}
    \label{sec:sampling-strategies}
    The choice of the sampling strategy, $(q_t)$, is also critical to reducing uncertainty about $m^*$ quickly. Recall that $q_t$ is a probability distribution on the remaining indices $\remain_t$ for each $t \in [N]$. To motivate the choice of our sampling strategy, we first consider the following question: \emph{what is the randomized sampling strategy that leads to the fastest reduction in uncertainty about $m^*$?}
    % 
    In general, it is difficult to characterize this strategy in closed form~(other than the computational aspect of the strategy being the solution of a multistage optimization problem). Thus, we consider a simplified question, that of finding the sampling strategy that maximizes the expectation of the one-step growth rate, $D_n(\lambda, m)$, for each $n \in [N]$. We seek to maximize the lower bound, $B_n(\lambda, m)$, introduced in~\eqref{eq:Bn}:
    \begin{align}
        q_n^* \defined \underset{q \in \Delta^{\mc{N}_{n}}}{\argmax}\ \expect_{I_n \sim q }\left[ B_n(\lambda, m) \right],  
        \label{eq:max-bound}
    \end{align}
    %
    where $\Delta^{\mc{N}_n}$ is the universe of distributions supported on $\mc{N}_n$. We now obtain a closed-form characterization of $q_n^*$. 
    \begin{proposition}
        \label{theorem:oracle-strategy}
        Note that $q_n^* = \argmin_{q \in \Delta^{\mc{N}_{n}}}\ \mathbb{V}_{I_n \sim q}[Z_n]$, which implies that $q_n^*(i) \propto \pi(i)f(i)$. Hence, for any valid betting strategy $(\lambda_t)$ and sampling strategy $(q_t)$, 
        % satisfying $|\lambda_t(Z_t - \mu_t(m))|\leq 1$ for all $m \in [0,1]$,
        we have $\expect_{I \sim q_t}[B_t(\lambda_t,  m) ] \leq \expect_{I \sim q^*_t}[B_t(\lambda_t,  m)]$.
    \end{proposition}
    % \vspace{-10pt}
    We defer the proof to Appendix~B.1, which proceeds by showing  that maximizing the lower bound on the one-step growth rate is equivalent to minimizing the variance of $Z_n$. It turns out that $q_n^*(i) \propto \pi(i)f(i)$ is the minimum~(in fact, zero) variance sampling distribution, and thus, $(q^*_t)$ dominates any other sampling strategy w.r.t.\ maximizing the expected bound on the one-step growth rate.
    % 
    \begin{remark}
        The oracle strategy in \Cref{theorem:oracle-strategy} can be considered as a solution of an alternative question: suppose there is an oracle who knows the true values of $f(i)$, and needs to convince an observer that the value $m^*$ is within an interval of width $\varepsilon$ with probability at least $1-\delta$. The oracle wishes to do so by revealing as few $f(i)$ values to the observer as possible. Clearly, any deterministic sampling strategy from the oracle will lead to skepticism from the observer (i.e., the observer will only be convinced once the $\pi(i)$ corresponding to the unrevealed $f(i)$ sum to $\varepsilon$). Hence, the sampling strategy used by the oracle must be random, and according to~\Cref{theorem:oracle-strategy}, it should draw transactions with probability $\propto \pi(i) \times f(i)$. 
    \end{remark}
    % \vspace{-10pt}
    % 
    \paragraph{Sampling without side information.} Since the $(f(i))$ values are unknown by definition of the problem, we cannot use $(q_t^*)$ in practice. Instead, we consider a sampling strategy that selects a index $i \in \mc{N}_t$ in proportion to its $\pi(i)$ value --- we refer to this strategy as the $\propM$ strategy.  This strategy is also known as ``sampling proportional to size'' ~\citep{bickel1992inference} or ``dollar unit sampling''~\citep{neter1978dollar} in auditing literature, and is similar to the best deterministic strategy, which queries indices in descending order w.r.t.\ $\pi(i)$.
    \begin{align}
        q_t(i) = \frac{\pi(i)}{\sum_{j \in \mc{N}_t} \pi(j)}, \tag*{(\propM)}
    \end{align} for each $i \in \mc{N}_t$.
    Sampling with \propM minimizes the ``worst case'' support range, and max value, of $Z_t$.
    %i.e., it is equivalent to $\argmin_{q_t \in \Delta^{\mc{N}_t}}\ \max_{(f(i))_{i \in \mc{n}_t}}\ \pi(i)f(i) / q_t(i) - \min_{(f(i))_{i \in \mc{N}_t}}\ \pi(i)f(i) / q_t(i)$.
    This allows for the largest possible choice of $\lambda_t$, i.e., our bet.
% 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    
    \paragraph{Using accurate side information for sampling.}
        \Cref{theorem:oracle-strategy} motivates a natural sampling strategy in situations where we have access to side information $(S(i))$ that is known to be a high-fidelity approximation of the true $(f(i))$ values---draw indices proportional to $\pi(i) \times S(i)$. We will refer to this strategy as the $\propMS$ strategy:
        %%
        \begin{align}
            q_t(i) = \frac{\pi(i) S(i)}{\sum_{j \in \mc{N}_t} \pi(j) S(j)} ~ .
            \tag*{(\propMS)}
        \end{align}
        %% 
        Under certain relative accuracy guarantees on the side information, we can characterize the performance achieved by the \propMS strategy as compared to the optimal strategy of~\Cref{theorem:oracle-strategy}, as we state next.  
        % In order to apply this strategy to practical problems, we require some guarantees on the relative accuracy of the side information, which we state next.
        \begin{corollary}
            \label{corollary:accurate-side-info}
            Assume that the side information, $(S(i))$, is an accurate prediction of $(f(i))$, i.e., there exists a known parameter $a \in [0,1)$, such that
            \begin{align}
                S(i) / f(i) \in [1 \pm a]
                \label{eq:accurate-side-info-assumption}
            \end{align} for all $i \in [N]$. With the \propMS strategy for $(q_t)$, we can ensure $\expect_{I_t \sim q_t}[B_t(\lambda_t, m)] \geq \expect_{I_t \sim q_t^*}[B_t(\lambda_t, m)]\lp \frac{1}{1+a} \rp^2$, where $(q^*_t)$ is the optimal sampling strategy of \Cref{theorem:oracle-strategy}.
        \end{corollary}
        % \vspace{-10pt}
        Next, we  develop an approach to properly incorporate side information  without any accuracy guarantees. 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Using possibly inaccurate side information}
\label{sec:side information}
    Often, we do not have a uniform guarantee on accuracy on $(S(i))$ as we assumed in the previous section. In such cases, we cannot continue to use the \propMS strategy, as it requires 
    % the 
    knowledge of the range of $f(i)/S(i)$ 
    to  ensure the non-negativity of the process $(W_t(m))$. 
    We develop new techniques in this section 
    that can exploit the side information 
    without the uniform accuracy guarantees, 
    provided that the side information 
    is correlated with the unknown $(f(i))$ values. 
    In particular, the method developed in this section 
    for incorporating the side information 
    is orthogonal to the choice of the sampling strategy; 
    and thus, it can be combined with any sampling strategy 
    that ensures the non-negativity of the process $(W_t(m))$.  
    
    Our approach is based on the idea of control variates~\citep[\S~V.2]{asmussen2007stochastic} that are used to reduce the variance of Monte Carlo~(MC) estimates of an unknown quantity, using some correlated side information whose expected value is known.  More specifically, let $\mhat$ denote an unbiased estimate of an unknown parameter $m$, and let $\vhat$ denote another (possibly correlated to $\mhat$) statistic with zero mean. Then, the new statistic, $\mhat_\beta = \mhat + \beta \vhat$ is also an unbiased estimate of $m$, for all $\beta \in \mathbb{R}$. Furthermore, it is easy to check that $\var(\mhat_\beta) = \var(\mhat) + \beta^2\var(\vhat) + 2 \beta \cov(\mhat, \vhat)$, which implies that the variance of this new estimate is minimized at  $\beta=\beta^* \defined - \lp \cov(\mhat, \vhat) / \var(\vhat) \rp$. Finally, note that the variance of $\mhat_{\beta^*}$ cannot be larger than the variance of the original estimate $\mhat$, since $\var(\mhat_{\beta^*}) \leq \var(\mhat_0) = \var(\mhat)$ by the definition of $\beta^*$.

    %Motivated by the above discussion, we define a modified version of the wealth process first introduced in \eqref{eq:wealth-process-0}. 
    Returning to our problem, given some possibly inaccurate side information $(S(i))$,
    % a possibly inaccurate estimate of $(f(i))$,
     define the control variate~(that is, an analog of the term $\vhat$)  as 
    % \begin{align}
    $
        U_t \defined S(I_t) - \mathbb{E}_{I' \sim q_t}[S(I')], 
   $
    % \end{align}
    and let $(\beta_t)$ denote a sequence of predictable terms taking values in $[-1, 1]$ used to weigh the effect of $(U_t)$. Note that, similar to $\vhat$, the term  $U_t$ has zero mean for each $t \in [N]$. We now define the wealth process with control variates, denoted by $(\Wtilde_t(m))$, and its corresponding CS as follows: 
    % \vspace{-10pt}
     \begin{align}
         \Wtilde_t(m) &\defined \prod_{t=1}^n \lp 1 + \lambda_t(m) (Z_t + \beta_t U_t - \mu_t(m)) \rp,\\
         \mc{C}_t &= \{m \in [0,1]: \Wtilde_n(m) < 1/\alpha \},
        \label{def:CS-control-variates}
     \end{align}
     where $(\lambda_t(m))$ is a betting strategy for each $m \in [0, 1]$.% are chosen according to some betting strategy~(such as~\kelly of~\Cref{def:approx-kelly}).
     \begin{theorem}
        For any set of side information $(S(i))$, sequence $(\beta_t)$, sampling strategy $(q_t)$, and betting strategies $(\lambda_t(m))$, $(\mc{C}_t)$ as defined in \eqref{def:CS-control-variates} is an $(1 - \delta)$-CS for $m^*$. Consequently, the stopping rule~$\tau(\epsilon, \delta)$ associated with $(\mc{C}_t)$ is an $\rlfa$.
        \label{thm:SideCSthm}
    \end{theorem}
    % \vspace{-10pt}
    The discussion above suggests that by a suitable choice of the parameters $(\beta_t)$, we can reduce the variance of the first term. To see why this is desirable, recall that the optimal value  of the approximate growth rate after $n$ steps of the new wealth process satisfies the following:
    \begin{gather}
        \begin{aligned}
            \widetilde{B}_n(\lambda, m)\coloneqq &\lambda (Z_t + \beta_t U_t  - \mu_t(m))\\
            &- \lambda^2 (Z_t + \beta_tU_t - \mu_t(m))^2, %\propto \frac{ \Mtilde_n^2}{\Vtilde_n},
        \end{aligned}\\
            \max_{\lambda} \widetilde{B}_n(\lambda, m)  \propto \frac{\sum_{t = 1}^n Z_t + \beta_tU_t - \mu_t(m)}{\sum_{t = 1}^n (Z_t + \beta_tU_t - \mu_t(m))^2}.
        \label{eq:control-variates-exponent-1}
    \end{gather}
    Note that by setting $\beta_t=0$ for all $t \in [n]$, we recover $\widetilde{B}_n(\lambda, m) = B_n(\lambda, m)$, i.e., the  wealth lower bound with no side information. Next, we observe that $\sum_{t=1}^n \beta_t U_t$ concentrates strongly around its mean~($0$).
    % 
    \begin{proposition}
        \label{prop:control-variates-1}
        For any $\delta \in (0,1)$ and sequence $(\beta_t)$, the following statement is simultaneously true for all $n \in [N]$ with probability at least $1-\delta$
        \begin{align}
            \left \lvert \frac{1}{n}\sum_{t = 1}^n \beta_tU_t \right \rvert = \mc{O}\lp \sqrt{  \log(\log n /\delta)/n} \rp.
        \end{align}
    \end{proposition}
    % \vspace{-10pt}
    This result, proved in~Appendix~B.2, implies that in order to select the parameters $(\beta_t)$, we can focus on its effect on the second order term in the denominator. In particular, the best value of $\beta$ for the first $n$ observations, is the one that minimizes the denominator, and can be defined as follows:
    \begin{align}
        \beta^*_n &\defined  \underset{\beta \in [-1,1]}{\argmin} \; \sum_{t=1}^n {(Z_t - \mu_t(m) + \beta U_t)^2}\\
                  &\propto-\frac{\sum_{t = 1}^n(Z_t - \mu_t(m))U_t}{\sum_{t = 1}^nU_t^2}.
    \end{align}
    The numerator of $\beta_n^*$ varies with $\sum_{t = 1}^n f(I_t)S(I_t)$---hence, the magnitude of $\beta_t$ increases with the amount of correlation between $f(i)$ and $S(i)
    $.
    Since $\beta^*_n$ is not predictable (it is $\mc{F}_n$ instead of $\mc{F}_{n-1}$ measurable), we will use the following strategy of approximating $\beta_n^*$ at each $n \in [N]$:
    % \begin{align}
    $
        \beta_n \propto -\frac{\sum_{t = 1}^{n - 1}(Z_t - \mu_t(m))U_t}{\sum_{t = 1}^{n - 1}U_t^2},
    $        
    % \end{align}
    for $n \geq 2$ and we let $\beta_1 = 0$. This provides a principled way of incorporating side information even when the relationship between the side information and the ground truth is unclear.

    \begin{remark}
        \label{remark:risk-vs-correlation}
        Our work is motivated by applications where the side-information is generated by an ML model trained on historical data. In practice, ML models are trained via empirical risk minimization, and we expect that models with lower risk should result in side-information with higher correlation. For some simple cases, such as least-squares linear regressors, we can obtain a precise relation between correlation and risk: $\rho^2 = 1- MSE$.  Characterizing this relation for more general models is left for future work. 
    \end{remark}
\section{Experiments}
\label{sec:experiments}
    We conduct simulations of our RLFA methods on a variety of scenarios for $\pi$ and $f$. For each simulation setup, we choose two positive integers $\Nlarge$ and $\Nsmall$ such that $\Nlarge+\Nsmall=N$. We generate the weight distribution $\pi$, consisting of $\Nlarge$ `large' values and $\Nsmall$ `small' values. The exact range of values taken by these terms are varied across experiments, but on an average the ratio of `large' over `small' $\pi$ values lie between $10$ and $10^3$. We then generate the $f$ values in one of two ways: (1) $f \propto \pi$, where indices with where large $\pi$ values take $f$ values in $[0.4, 0.5]$ and small $\pi$ values take on $f$ values in $[0.001, 0.01]$, or (2) $f \propto 1/\pi$, where the $f$ value ranges are swapped for large and small values. The simulations in this section focus on the different sampling strategies as well as the efficacy of control variates --- we provide additional experiments comparing the betting CS with other types of CS in Appendix~E.
    %
    %
    \begin{figure}[t]
    % 
        \begin{subfigure}{\columnwidth}
            \def\figwidth{\columnwidth}
            \def\figheight{0.7\textwidth} % Feel free to change
        \centering
        \hspace*{-1em}
        % \input{Figures/copies/no_side/main.tex}
        \hspace{5pt}\begin{subfigure}{0.5\columnwidth}
        \begin{center}
        \input{Figures/NoSideInfoCSf_propto_M_large_40}
        \end{center}
        \end{subfigure}
        \begin{subfigure}{0.5\columnwidth}
        \begin{center}
            \input{Figures/NoSideInfoCSf_propto_M_large_160}\hspace*{10pt}
        \end{center}
        \end{subfigure}
        % \vspace{-25pt}
        \begin{subfigure}[T]{0.5\columnwidth}
        \begin{center}
        \input{Figures/NoSideInfoCSf_inv_propto_M_large_40}
        \end{center}
        \end{subfigure}\hfill
        \begin{subfigure}[T]{0.5\columnwidth}
        \begin{center}
        \input{Figures/NoSideInfoCSf_inv_propto_M_large_160}
        \end{center}
        \end{subfigure}
        % \vspace{-10pt}
        \caption{Width of CSs}
        % \vspace{-10pt}
        \label{fig:no-side-info-CS}
        %\end{figure}
        \end{subfigure}
        \begin{subfigure}{\columnwidth}
             \def\figwidth{\columnwidth}
             \def\figheight{0.7\columnwidth} % Feel free to change
             \centering
             \hspace*{-1em}
    
             \begin{subfigure}[T]{0.5\columnwidth}
             \centering
             \input{Figures/NoSideInfoHist_f_propto_M_large_40}
             \end{subfigure}\begin{subfigure}[T]{0.5\columnwidth}
             \centering
             \input{Figures/NoSideInfoHist_f_propto_M_large_160}\hspace*{4pt}
             \end{subfigure}
             % \vspace{-15pt}
    
    
             \begin{subfigure}[T]{0.5\columnwidth}
             \centering
             \input{Figures/NoSideInfoHist_f_inv_propto_M_large_40}
             \end{subfigure}\hfill
             \begin{subfigure}[T]{0.5\columnwidth}
             \centering
             \input{Figures/NoSideInfoHist_f_inv_propto_M_large_160}
             \end{subfigure}
         % \vspace{-5pt}
         \caption{Distribution of samples audited ($\tau$). We omit the uniform~(without logical CS) CS histograms, as they concentrated entirely at $N$.}
         \label{fig:no-side-info-Hist}
    \end{subfigure}
        \caption{A comparison of \propM vs. uniform sampling, and the impact of intersecting with the logical CS (\Cref{subsec:logical-CS}) where $\varepsilon= \delta = 0.05$. The \propM strategy produces tighter CSs that results in fewer audited samples. Intersecting with the logical CS further reduces the width, particularly when few transactions are large ($\Nlarge = 0.2$).}
    \end{figure}

% 
    \noindent \textbf{No side information: uniform vs.\ \propM\ sampling.}
        In the first experiment, we compare the performance of the \propM strategy with the uniform baseline. In addition to this, we also illustrate the significance of logical CS~(introduced in \Cref{subsec:logical-CS}) especially in cases when there are a few large $\pi$ values.
        From the widths of the CSs plotted in \Cref{fig:no-side-info-CS}, we can see that \propM outperforms the uniform baseline in all four cases. The gap in performance increases when $\Nlarge$ is small since $\pi$ deviates more significantly from the uniform weighting: it consists of a few large weights with the rest close to $0$. On the other hand, when $\Nlarge$ is large, the weights  resemble the uniform distribution, leading to the competitive performance of the uniform baseline. The logical CSs are most useful in the case of small $\Nlarge$, especially with $f \propto \pi$. This is because for small $\Nlarge$, every query to an index with large $\pi$ value leads to a significant reduction in the uncertainty about $m^*$.
    
         Next, in~\Cref{fig:no-side-info-Hist}, we plot the distribution of the stopping time $\tau$ for an RLFA
 with $\varepsilon = \delta=0.05$,  over $500$ independent trials. The \propM strategy leads to a significant reduction in the sample size requirement to obtain an $\varepsilon$-accurate estimate of $m^*$ as compared to the uniform baseline, both with and without the logical CS. Furthermore, the distribution of $\tau$ with the \propM strategy often has  less variability than the uniform strategy. Hence, \propM has demonstrated itself empirically to be a better sampling strategy than simply sampling uniformly, as one would do when all the weights are equal.

    \noindent \textbf{Using \propMS\ with accurate side information.} 
        In the second experiment, we study the benefit of incorporating accurate side information in the design of our CSs, by comparing the performance of \propMS strategy with that of the \propM strategy. We generate $S$ randomly while ensuring $S(i) / f(i) \in [1 \pm a]$ (from \eqref{eq:accurate-side-info-assumption}) for some $a \in (0,1)$. Thus smaller values of $a$ imply that the scores $S(i)$ are more accurate approximations of $f(i)$ for all $i \in [N]$.

\begin{figure}[t]
\begin{subfigure}{\columnwidth}
     \def\figwidth{0.5\columnwidth}
     \def\figheight{0.35\columnwidth} % Feel free to change
     \centering
     \hspace*{-1em}
     \input{Figures/AccurateSideInfoCSf_propto_M_large_40_A_0_1}
     \input{Figures/AccurateSideInfoCSf_propto_M_large_160_A_0_1}
     \input{Figures/AccurateSideInfoCSf_inv_propto_M_large_40_A_0_1}
     \input{Figures/AccurateSideInfoCSf_inv_propto_M_large_160_A_0_1}
     %\caption{Plots showing the variation of the width of the betting CS using \propMS and \propM strategies in different data regimes. For all the plots, we set $b=0.1$.}
     % \vspace{-5pt}
     \caption{Width of CSs}
     \label{fig:accurate-side-info-CS}
%\end{figure}
\end{subfigure}
% \vspace{5pt}
\begin{subfigure}{\columnwidth}
         \def\figwidth{0.5\columnwidth}
         \def\figheight{0.35\columnwidth} % Feel free to change
         \centering
         \hspace*{-1em}
         \input{Figures/AccurateSideInfoHist_f_propto_M_large_40_A_0_1}
         \input{Figures/AccurateSideInfoHist_f_propto_M_large_160_A_0_1}
         \input{Figures/AccurateSideInfoHist_f_inv_propto_M_large_40_A_0_1}
         \input{Figures/AccurateSideInfoHist_f_inv_propto_M_large_160_A_0_1}
         \caption{Distribution of samples audited ($\tau$).}
     \label{fig:accurate-side-info-Hist}
    \end{subfigure}
    \caption{Comparison of \propMS vs. \propM  with accurate side information $(S(i))$, i.e., $S(i) / f(i) \in [0.9, 1.1]$ where $\varepsilon= \delta = 0.05$. We see that \propMS outperforms \propM in both CS width and sample efficiency.}
\end{figure}
        In~\Cref{fig:accurate-side-info-CS}, we can see that the \propMS strategy with accurate side information dominates the \propM strategy. This is further reflected in the distribution of $\tau$ for an RLFA where $\varepsilon=\delta=0.05$ in~\Cref{fig:accurate-side-info-Hist}.
        Hence, in situations where we are confident in the accuracy of our side information, we should incorporate it directly into our sampling strategy to reduce the width of the CS.

    \paragraph{Control variates from possibly inaccurate side information.}
        Finally, we consider the case in which we do not have prior information about the accuracy of the side information. Thus, using the \propMS strategy in this scenario directly can lead to very conservative CSs~(this is because in the absence of tight guarantees on the range of the $S/f$ ratio, we will have to use the worst case range). Instead, we compare the performance of the \propM strategy, with and without using control variates described in~\Cref{sec:side information}.
        In this case, we set $S(i) =  c\times f(i) +  (1-c)\times R_i$ for $c \in (0,1)$, where $(R_i)_{i \in [N]}$  are \iid random variables distributed uniformly over $[0,1]$. The parameter $c$ controls the level of correlation between $f$ and $S$ values, with small $c$ values indicating low correlation.

        We generate the data with  $\Nlarge =  40$ and $N=200$. In~\Cref{fig:CV-CS}, we compare the CSs and the distribution of $\tau$ for an RLFA~(with $\varepsilon=\delta=0.05$) for the \propM strategy with and without control variates, when the side information is generate with $c=0.9$. Due to the high correlation, there is a significant decrease in the samples needed to reach an accuracy of $\varepsilon$, when using control variates.

        \begin{figure}[t]
            \def\figwidth{0.5\columnwidth}
            \def\figheight{0.35\columnwidth} % Feel free to change
            \hspace*{-1em}
            \input{Figures/CV_CS_propto_M_large_100}\hfill
            \input{Figures/CVHist_f_propto_M_large_100}
        
            \hspace*{-1em}
            \input{Figures/CV_CS_inv_propto_M_large_100}\hfill
            \input{Figures/CVHist_f_inv_propto_M_large_100}
            % \vspace{-10pt}
            \caption{The plots above show the width of the CSs and the distribution of $\tau$ for the $f \propto \pi$ and the $f \propto 1/\pi$ cases, where  $\Nlarge/N=0.2$ and $c=0.9$. }
            \label{fig:CV-CS}
        \end{figure}

    Finally, in~\Cref{fig:CV-Gain}, we study the variation in sample efficiency as the correlation between $S$ and $f$ changes~(i.e., by varying $c$). In particular, for $9$ linearly spaced $c$ values in the range $[0.1, 0.9]$, we compute the $\tau$ for an RLFA without ($\tau_{\text{no-CV}}$) and with control variates ($\tau_{\text{CV}}$) over $250$ trials, and then plot the variation of the mean of their ratio, $\tau_{\text{CV}} / \tau_{\text{no-CV}}$.
    \begin{figure}[t]
        \def\figwidth{0.5\columnwidth}
        \def\figheight{0.35\columnwidth} % Feel free to change
        \centering
        \hspace*{-1em}
        \input{Figures/CV_Gain_f_propto_M_large_40_eps_0_025.tex}
        \input{Figures/CV_Gain_f_inv_propto_M_large_40_eps_0_025.tex}
        \caption{The figures plot the variation of the reduction in $\tau$ for an RLFA with $\varepsilon=0.025, \delta=0.05$, when the CS is constructed with and without using control variates . The x-axis denotes the parameter $c \in [0.1., 0.9]$, and thus controls the amount of correlation between $S$ and $f$. As the amount of correlation between $S$ and $f$ increases, the CS with control variates decreases takes a decreasing fraction of the time it would take the CS w/o control variates. %(relative to the sample-requirement of the $\propM$ strategy without control variates). %This effect is observed in both cases: when $f \propto \pi$ and when $f \propto 1/\pi$.}
        }
        \label{fig:CV-Gain}
    \end{figure}

    \Cref{fig:CV-Gain} highlights the key advantage of our CS construction using control variates --- this method automatically adapts to the correlation between the side information and the $f$ values. In cases where the side information is highly correlated~(i.e., larger $c$ values), the reduction in samples is large; whereas when the correlation is small, our approach automatically reduces the impact of the side information.

\section{Conclusion}
\label{sec:conclusion}
    In this paper, we defined the concept of an $\rlfa$ and devised RLFA procedures from confidence sequences (CSs) for the weighted average of $N$ terms~(denoted by $m^*$), using adaptive randomized sampling \wor. For arbitrary sampling strategies, we developed two methods of constructing CSs for $m^*$ using test martingales. We then addressed the question of improving CSs by  incorporating side information, with or without guarantees on their accuracy.

    Our work opens up several interesting directions for future work. For instance, in~\Cref{theorem:oracle-strategy}, we derived the sampling strategy that optimizes a lower bound on the one-step growth rate. 
    Future work could investigate whether we can obtain a more complete characterization of the optimal policy, without relying on approximations.
    Another interesting issue, not addressed in our paper is that of considering more general types of side information available to us. As described in~\Cref{sec:introduction},  we have assumed that we have access to $[0,1]$ valued side information that is supposed to be a proxy for the true~(and unknown) $f$ values. However, in practical auditing problems, the side information is usually available in terms of  a collection of numeric, discrete and categorical features that are correlated with the unknown $f$ values. Developing methods for incorporating these more realistic forms of side information into our framework for designing CSs is another important question for future work. Furthermore, another type of side information is any knowledge from a prior audit. For example, auditors may know before reviewing any data (transactions or AI-generated side-info) that for this year, some accounts are likely to have smaller or bigger $f$ values than other accounts because of the specific performance incentives placed on the company managers by their supervisors or by the market conditions.

% \paragraph{Acknowledgements} This work was funded by a PricewaterhouseCoopers Research Grant on "Efficient AI-Enhanced Financial Statement Auditing with Statistical Guarantees".
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newpage 
\bibliography{ref}



\end{document}


  
