Stochastic doubly efficient debate formalization
================================================

**Summary:** We formalize the correctness of the main stochastic oracle
doubly-efficient debate protocol from Anonymous et al. in Lean 4.

https://arxiv.org/abs/1805.00899 is one approach to AI alignment of strong
agents, using two agents ("provers") competing in a zero-sum game to convince a
human judge ("verifier") of the truth or falsify of a claim.  Theoretically, if
we model the judge as a polynomial time Turing machine, optimal play in the
debate game can convince the judge of any statement in PSPACE.  However, this
theoretical model is limited in several ways: the agents are assumed to have
unbounded computational power, which is not a realistic assumption for ML
agents, and the results consider only deterministic arguments.

Anonymous et al. improves the complexity theoretical model of
debate to be "doubly-efficient": both the provers and the verifier have limited
computational power.  It also treats stochastic arguments: the provers try to
convince the judge of the result of a randomized computation involving calls to
a random oracle.  Concretely, the main formalized result is

**Definition 6.1 (Lipshitz oracle machines).** An oracle Turing machine
$M^\mathcal{O}$ is $K$-Lipshitz if, for any other oracle $\mathcal{O}'$ s.t.
$|\Pr(\mathcal{O}(z)) - \Pr(\mathcal{O}'(z))| \le \epsilon$, we have
$|\Pr(M^\mathcal{O}(x)) - \Pr(M^\mathcal{O'}(x))| \le K \epsilon$.

**Theorem 6.2 (doubly-efficient stochastic oracle debate).** Let $L$ be a
language decidable by a $K$-Lipshitz probabilistic oracle Turing machine
$M^\mathcal{O}$ in time $T = T(n)$, measuring oracle queries only as costing
time.  Then there is an $O(K^2 T \log T)$ prover time, $O(K^2)$ verifier time
debate protocol with cross-examination deciding $L$ with completeness $3/5$ and
soundness $3/5$.

The formalized result differs from the paper result slightly in that we focus
only on correctness: we do not yet formalize space complexity, count only oracle
queries for time complexity, and represent those queries only via the code for
the protocol.  We also define Lipshitz oracle machines differently: the paper
fixes $M$ and lets both $\mathcal{O}$ and $\mathcal{O}'$ vary, while we fix both
$M$ and $\mathcal{O}$ and let only $\mathcal{O}'$ vary (this slightly
strengthens the resulting theorem).

1. `Prob/Defs.lean`: Finitely supported probability distributions, representing stochastic computations.
2. `Debate/Oracle.lean`: Our computation model, including the definition of Lipshitz oracles.
3. `Debate/Protocol.lean`: The debate protocol, honest players, and the definition of correctness.
4. `Debate/Correct.lean`: The final correctness theorems.
5. `Debate/Details.lean`: Proof details.

## Installation

1. Install Lean 4 and Lake via elan: https://github.com/leanprover/elan
2. Run `lake build` within the directory.

## Citing this work

We will add this after the paper is reviewed.

## License and disclaimer

We will add this after the paper is reviewed.
