	\label{sec:conf-continuum}
We now look deeper into the theory of learners whose confidence domain is a continuum 
(i.e., a connected, totally ordered, one-dimensional manifold with two endpoints). 
% There are two particularly important confidence domains in this setting are the fractional domain $[0,1]$ and the additive domain $[0,\infty]$.
%
%
% \vnew{
% \subsection{Update Flows}
% }
%
%joe3: bad story. Just say that you want additivity.
%oli3: bringing up text about additivity, and reduing stress on scale.
% Recall that the number of training iterations $n$ in \cref{ex:classifier}
% and Shafer's weight of evidence $w$ in \cref{ex:shafer}
% are measurements of confidence that do not lie in $[0,1]$, but
%  rather in $[0,\infty]$.
\commentout{
Most quantities used in science and everyday life can be measured additively:
if one starts with seven minutes/meters/votes/dollars,
and then gains six
 % additional (distinct) ones, 
more,
one has thirteen altogether.
}%
% We would like a measure of confidence that also works this way.
% What is measure of confidence that works this way?
% We would like to be able to measure confidence in the same way.
% Wanting to measure confidence the same way gives us the domain $[0,\infty]$. 
% To measure confidence in the same way, we must use the domain $[0,\infty]$. 

With the domain $[0,\infty]$, \cref{ax:combinativity} means $\Lrn$ is \emph{additive},
making it amenable to analogies of weight (e.g., the weight of evidence $w$ in \cref{ex:shafer})
and time (e.g., the number of training iterations $n$ in \cref{ex:classifier}).
Indeed, an additive learner can be implemented so that confidence really does coincide with time: imagine a machine with state space $\Theta$, controlled by buttons labeled by $\Phi$, that, while $\phi$ is pressed, evolves from initial state $\theta_0$ according to $\theta(t) = \Lrn(\phi, t, \theta)$. 
Conversely, this interpretation is coherent only if $\Lrn$ is additive---for otherwise there would exist $t_1,t_2$ such that the machine's state after pressing $\phi$ for $t_1$ seconds followed by $t_2$ additional seconds, would be different from the configuration after holding down $\phi$ for $t_1+t_2$ seconds.

% This temporal analogy may not always be appropriate 
Temporal analogies may not always be appropriate
% For instance, 
(as they may clash with other, truer conceptions of ``time''),
yet they have such intuitive force that 
% yet the intuition 
a function
$f: [a,b] \times \Theta \to \Theta$  
	% (with $a\le 0 < b$)
	(with $0 \in [a,b] \subseteq \mathbb R$)
satisfying \cref{ax:zero,ax:cont-and-smooth,ax:combinativity}
is known generically as a \emph{flow} \parencite{lee2013smooth}.
\commentout{
    Beyond \cref{ax:diffble,ax:additivity},
    $F$ need only handle full-confidence
    appropriately (i.e., satisfy \cref{ax:idemp,ax:cont})
    in order to satisfy all of our axioms thus far.}%
% Due to \cref{prop:additivity-implications} and the fact that 
%     \cref{ax:diffble} is stronger than \cref{ax:cont}, 
% the flow axioms 
% (\cref{ax:additivity},\cref{ax:diffble})
% imply all of our axioms so far
% (\cref{ax:zero,ax:idemp,ax:cont,ax:diffble,ax:seq-for-more,ax:nopause,ax:additivity}).
\commentout{%% THIS MAY NOT BE TRUE??
	\begin{linked}{prop}{continuum-seqacyc}
		When $\confdom$ is a continuum, \cref{ax:zero,ax:cont-and-smooth,ax:combinativity} imply \cref{ax:seq-for-more} and \cref{ax:acyclic}.
	\end{linked}
	}%
Since \cref{ax:combinativity} implies \crefrange{ax:seq-for-more}{ax:acyclic} for this domain, 
the only additional requirement of a commitment function is that $\Lrn_{(\theta,\phi)}(\chi)$ have a well-defined limit as $\chi \to \infty$. 
This highlights the strength of the assumption that confidence lies in $[0,\infty]$ and combines additively, so one might understandably worry that this could limit applicability---but this is not the case.
While 
the additive domain $([0,\infty], +)$ certainly restricts
how confidence can be measured, it has little effect on what confidence can express.

\commentout{%%%%%%%%%%%%%% THIS VERSION OF THE THEOREM IS RATHER BROKEN %%%%%%%%%%%%
\begin{linked}{theorem}{add-reparam}
	If $\Lrn 
	% : \Phi \times \confdom \times \Theta \to \Theta
	$ 
	% satisfies \cref{ax:zero,ax:combinativity,ax:cont-and-smooth,??}
	satisfies \cref{ax:zero,ax:cont-and-smooth,ax:seq-for-more,??}
	(but possibly not \cref{ax:combinativity})
	then there exists
	a flow update function
	$^+\!\Lrn$
	for the additive confidence domain $[0,\infty]$
	that does, 
	and a continuous function
	$g : \Phi \times [\bot,\!\top] \times \Theta \to [0,\infty]$
	% and a function $g$
	such that
	% for all $\theta,\phi,$ and $\chi$,
	% for all $\theta\in\Theta,\phi\in\Phi,$ and $\chi\in[\bot,\!\top]$,
	% \[
	\[
		\forall \theta,\phi,\chi.\qquad
		\Lrn( \phi,
			\chi,
		 \theta )
		 =
		{^+}\!\Lrn(\phi,~
		% \beta,
		g(\phi,\chi,\theta),~
		 \theta)
		 \qquad\text{and}\qquad
		{^+}\!\Bel(\phi,\theta) = g(\phi, \Bel(\phi,\theta),\theta)
		.
	\]
	% Furthermore, $^+\!F$ and $g$ are unique up to a multiplicative factor,
	% and so
	% there is a unique choice of $(^+\!F,g)$
	% such that $^+\!F$ and $F$ handle smallconfidences in the same way,
	% i.e., $\frac{\partial g}{\partial \chi} = 1$.
	Furthermore,
	% $^+\!F$ and $g$ are unique up to a multiplicative factor
	% the pair
	 $(^+\!F, g)$ is unique up to a multiplicative factor
	% in the additve representation of confidence.
	% in the additive representation of confidence (i.e., the output of $g$).
	in the output of $g$.
	% (that may depend on $\phi$ and somewhat on $\theta$).
	% (that can depend on $\phi$, and partially on $\theta$).
	% (that can depend on $\phi$ and $\theta$).
		% \footnote{ can depend on $\phi$, 
		% and $[\theta] \in \Theta / (\theta_1 \sim \theta_2 \text {if})$}.
	\end{linked}
\begin{coro}
There is a unique pair $(^+\!\Lrn, \beta)$
% such that $^+\!F$ and $F$ handle small confidences the same way,
% i.e., $\frac{\partial g}{\partial \chi}|_{\chi=0} = 1$.
such that $^+\!\Lrn$ and $\Lrn$ have the same effect on observations
made with sufficiently low confidence, 
i.e., $\frac{\partial \beta}{\partial \chi}\big|_{\chi=\bot} = 1$.
% that behaves like $F$ for low confience updates
% (and is also additive: \cref{ax:additivity}).
% Furthermore, there exists a function
\end{coro}
}%

\begin{linked}{theorem}{add-reparam}
	% If $\Lrn$ is a learner for setting $(\Theta, \Phi, \confdom)$,
	%%% STRONGER VERSION; COULDN'T PROVE IN TIME.
	% Let $(D, \le, \bot, \top)$ be an ordered connected manifold with a greatest and least element, and suppose $\Lrn : \Phi \times D \times \Phi \to \Theta$ satisfies \cref{ax:zero,ax:cont-and-smooth,ax:seq-for-more}, FC. Then, for every $d \in D$, there exists a one-dimensional submanifold $D_d \subseteq D$ containing $d$, a continuous function $g : D \times \Phi \times \Theta \to [0,\infty]$,
	% and a commitment flow $^+\!\Lrn$ such that
	% $
	% 	\forall \theta, \phi, \chi \in D_d.~
	% 	^+\!\Lrn(\phi, g(\chi,\phi,\theta), \theta) = \Lrn(\phi, \chi, \theta). 
	% $
	If $\confdom$ is a 
	% totally ordered 
	continuum and $F : \confdom \times \Theta \to \Theta$ is a commitment function 
	(i.e., satisfies \cref{ax:zero,ax:cont-and-smooth,ax:seq-for-more,ax:acyclic,ax:combinativity}
	% (i.e., satisfies \cref{ax:zero,ax:cont-and-smooth,ax:combinativity}
	% , \cref{ax:idemp}
	\unskip),
	then there exists a continuous ``translation'' function $g : \confdom \times \Theta \to [0,\infty]$,
	and a commitment flow $^+\!F$ such that
	% and a commitment flow $^+\!\Lrn$ such that
	$
		\forall \theta, \chi 
		% \in D_d
		.~
		% ^+\!\Lrn(\phi, g(\chi,\phi,\theta), \theta) = \Lrn(\phi, \chi, \theta). 
		^+\!F(g(\chi,\theta), \theta) = F(\chi, \theta). 
	$
\end{linked}

Thus, updates performed with $\Lrn$ are equivalent
to updates performed with ${^+}\!\Lrn$ (its \emph{additive form}), 
	if confidences are translated (via $g$) appropriately.
% We call $^+\!F$ the \emph{additive form of $F$}.
% and $\beta(\phi, \chi, \theta)$ the additive form of 
% confidence $\chi$. 
% This quantity might, unfortuna tely, depend on $\theta$, and $\chi$.
% Unless $F$ is strangely parameterized,
% If confidence to $F$ is meaningful independent of $\theta$ and $\phi$,
% then so too should 
% knowing 
% Ideally,
% the translation $g$ to an additive scale should not depend
% on our current beliefs $\theta$ or observation $\phi$.
%
% It would be  dependence of $g$ on $\theta$ and $\phi$ is
% somewhat unsavory, and would $F$ 
%
\commentout{%%% uniformity
\begin{defn}
We call an update function $F$ \emph{uniform} if 
the additive form
$g(\phi,\chi,\theta)$
of its confidence depends only on $\chi$
(and not on $\theta$ or $\phi$). 
\end{defn}

\cref{ax:additivity} implies uniformity, as then $^+\!F = F$ 
and
$g(\phi,\chi,\theta) = \chi$.
}%%% end uniformity
%
% In fact, if there is such a $g$ that does not depend on $\theta$, then there is a unique such representation, up to a multiplicative constant in the output of $g$.
When the original domain $\confdom$ is isomorphic to the canonical domains $[0,\infty]$ and $[0,1]$,  the translation $g$ need not depend on $\theta$ and there is a unique such representation, up to a multiplicative constant in the output of $g$. 
However, by allowing for a belief-state-dependent translation of confidence, our construction provides in principle an additive representation even for very different confidence domains, such as when $\cseq$ is not invertible (e.g., $\cseq = \max$)---provided the points of non-differentiability can be handled appropriately, which is sometimes but not always possible.
% , and extending well beyond other standard additive representation theorems [CITATION NEEDED].

% At a high level, t
The key to proving \cref{theorem:add-reparam} is realizing that commitment flows can be equivalently represented by vector fields. This view, which we now unpack, confers other benefits as well. 

% \subsection{Vector Field Representations}
% \subsection{Update Fields}
% \subsection{Order of Obsevation and the Vector Fields of Commitment Functions}
% \subsection{The Vector Fields of Commitment Functions, and Orderless Combination}
% \subsection{The Vector Field Representation, and Orderless Combination}
\subsection{Orderless Combination and the Vector Field Representation}
\label{sec:vecrep}

% Suppose we learn $\phi_1$, and then $\phi_2$ (for simplicity, say with the same confidence $\chi$). 
% Suppose we learn $\phi_1$ 
%  (with confidence $\chi_1$), 
% and then $\phi_2$
%  (with confidence $\chi_2$). 
% \unskip.
% Is this the same as learning $\phi_2$ and then $\phi_1$? 
% Is this the same as learning them in the opposite order?
Is it the same to learn $\phi_1$ and then $\phi_2$ as it is to learn them in the opposite order? 
It is for belief functions 
% ((\cref{ex:shafer}, perhaps a reason to find belief functions attractive)
(\cref{ex:shafer}) and when conditioning. 
% Our symmetry assumption ensures it is also the case if $\phi_1=\phi_2$ are the same.
But, in general, 
% observing inputs in different orders yields different results.
the order of observations can have a significant impact on the result.
Humans tend to have a recency bias: more recent observations have a stronger influence on beliefs.
\Cref{ex:prob-simple,ex:kalman1d} are not commutative either.
% But often pieces of information at once, we would like to update using that combined information.
% Nevertheless, even if prioritizing recent information is appropriate, 
% Still, 
But if the order matters for our update, what should we do if we receive two pieces of information simultaneously?
% it would seem problematic to prioritize one piece of information over another if they are recieved simultenously. 
	% if one were two pieces of information together, 
	% we would like to update using that combined information.
% It turns out that there is already a natural way to do this, even
% It turns out that we already have the tools to do this in a natural way, even if $\phi_1$ and $\phi_2$ do not commute.  
There  is a natural way to do this
	% represent flow learners as vector fields, and add them together. 
	with the techniques used to prove \cref{theorem:add-reparam}. 

% if the order in which one observes $\phi_1$ and $\phi_2$ matters.
% if $\phi_1$ and $\phi_2$ do not commute.  

\commentout{
% We now investigate 
We now turn to an equivalent representation of 
flow
% or fractional 
update functions, which, among
other things, will ultimately
yield a natural way of
% orderlessly combine observations, and 
orderlessly learning $\phi_1$ and $\phi_2$ together, and weighted by relative
confidence. 
%
At a technical level, we show how to
extend an arbitrary update function $F$, that handles inputs $\Phi$,
% to a set $\ext\Phi \supseteq \Phi$ with some algebraic operations.
to handle a more expressive set of inputs $\ext\Phi \supseteq \Phi$
closed under new operations of
orderless combination ($\cseq$), and rescaling by relative confidence ($\cdot$).
}%

% In \cref{ax:diffble}, we assumed that $\Theta$ has a differentiable structure; thus, 
Since $\Theta$ carries a differentiable structure,
it makes sense to talk about its tangent space
%joe3: please add parens!
%oli3: parens are non-standard in this context. See Lee 2013,
%   which is the standard reference on manifolds, or even the
%   wikipedia page.
$T\Theta$,
which consists of pairs $(\theta, \mat v)$ where
$\theta \in \Theta$, and $\mat v$,
% is intuitively an infinitessial direction rooted at $\theta$.
intuitively, is a direction that one can travel in $\Theta$ beginning at $\theta$
% tangent to $\Theta$
% rooted at $\theta$
\parencite[\S3]{lee2013smooth}.
% structure; for ease of presentation, suppose that it is an $n$-dimensional manifold \parencite{lee2013smooth}.
% For a smooth manifold $M$ (such as the space $\Delta \X$ of distributions over $\X$),
% and a point $p \in M$, we follow convention by writing $T_p M$ for the tangent space to $M$ at point $p$ \parencite{lee2013smooth}, and % $TM := \sqcup_{p \in M} (p, T_p M)$
% $TM := \sum_{p \in M} T_p M$ for the full tangent bundle over $M$.
%
% A vector field over $M$ is a smooth map $\mat v : M \to T M$ assigning a tangent vector $\mat v(p) \in T_p M$, to every point $p \in M
%
A \emph{vector field} $X \in \mathfrak X\Theta$ is a
% smooth
differentiable
map $X : \Theta \to T \Theta$
assigning to each  $\theta \in \Theta$
a vector $X(\theta) = (\theta, \mat v) \in T\Theta$
tangent to $\theta$.
\commentout{
	The set of all vector fields over $\Theta$ is denoted $\mathfrak X(\Theta)$
	 % and is closed under linear combination
	 and forms a vector space.
	\parencite[\S8]{lee2013smooth}
	\unskip.
	}%
%
% There is a close relationship between additive confidence and such vector fields.
% There is a close relationship between additive confidence and such vector fields.
% A vector field is called \emph{complete} if it generates a global flow.
% , or equivalently, a smooth section of the projection map $\pi : T M \to M$, where $\pi((p, v)) = p$.
% Additivity (\cref{ax:combinativity})
% \Cref{ax:combinativity}
\Cref{ax:seq-for-more}
% (and indeed even \cref{ax:seq-for-more})
implies that the behavior of $\Lrn$ is generated by the
way it handles updates of small confidence.
So, in a sense, all we need to know about
$\Lrn$
is how it handles infinitessimal confidences
% \unskip, which is to say, its derivative at zero confidence
\unskip---which can be
viewed as a vector field.
More precisely, in most cases (such as when using either the fractional or additive confidence domains),
% given a commitment function 
% $F$, and observation $\phi$,
a commitment function
$\Lrn_\phi$
% we can define the \emph{vector field of $\phi$} by
% emph{vector field representation} given by
can be represented by
the vector field
% differential of $F_\phi$
% intuitively represents an update with infinitessimal confidence,
% and is a vector field
\begin{equation}
	% F'_\phi
	% F'_\phi
	% \mathrm{d}\Lrn_\phi
	\Lrn'_\phi
	:=
	\theta \mapsto
	% \frac{\partial}{\partial \chi} F_{\theta}^{\chi} \Big|_{\chi=0}
	% \frac{\partial}{\partial \chi} \Lrn_{(\theta,\phi)}(\chi) \Big|_{\chi=0}
	\frac{\partial}{\partial \chi} \Lrn(\theta, \chi, \phi) \Big|_{\chi=\bot}
	\qquad\in 
	\mathfrak X \Theta
	.
	\label{eq:f-field}
\end{equation}
(To handle edge cases involving the zero field, we may need a more complex but closely related definition; see the proof of \cref{theorem:add-reparam} for details.)
% (or, in some corner cases, something equivalent to it).
We can then recover $^+\!\Lrn_\phi$ as the integral curves of $\Lrn'_\phi$
% In other words, if we knew only the vector field $F'_\phi$,
% we could
% because $F_\phi$ is the unique function satsifying \eqref{eq:f-field}.
% is a direct corolary of \citeauthor{lee2013smooth} [Thm 9.12, \citeyear{lee2013smooth}].
\parencite[Thm 9.12]{lee2013smooth}.
	% as we explain in the appendix. 
\commentout{%
\begin{fact}[{\citeauthor[Thm 9.12]{lee2013smooth}}]
	If $X \in \mathfrak X(\Theta)$,
	there is at most one
	 function
	$f : [0,\infty) \times \Theta \to \Theta$
	satisfying
	$
	% \[
		f(a, f(b, \theta)) = f(a+b,\theta)
			~~\text{and}~~
		\frac{\partial}{\partial t}
			 f(t,\theta)
			% \underset{\chi=0}|
			|_{t{=}0}
			\!\!= X(\theta)
		% \]
	$
	for all $\theta \in \Theta$ and $a,b\ge 0$.
	\label{fact:unique-integral-curves}
\end{fact}
\begin{coro}
	% Suppose $X_\phi \in \mathfrak X(\Theta)$ be a vector field.
	% Let $F$ be a flow-update rule.
	% Suppose $X_\phi \in \mathfrak X(\Theta)$ be a vector field.
	% Then there is at most one flow-update function satisfying \eqref{eq:f-field}.
	% Fix the vector field $X := F'_{\phi}$.
	% $F$ is the only flow-update rule satisfying \eqref{eq:f-field}.
	%%%v1
	% Let $F$ be a flow update function, and fix the vector field $F'_{\phi}$.
	% Then $F_\phi$ is the only flow update function satisfying \eqref{eq:f-field}.
	%%%v2
	If $\Lrn_{\phi_1}$ and $\Lrn_{\phi_2}: \Theta \times [0,1] \to \Theta$ are distinct,
	then so are $\Lrn'_{\phi_1}$ and $\Lrn'_{\phi_2}
	 % \in \mathfrak X(\Theta)
	$.
	%%%v3
	% If $F$ is the uni
	\label{fact:unique-flow-for-vfield}
\end{coro}
}%
% Therefore, \cref{theorem:add-reparam}
% The upshot is that every 
% % flow update function $F$
% commitment flow $\Lrn_\phi$ can be equivalently represented
% by its differential $\Lrn'_\phi$.
% \begin{prop}
% 	% Let $F$ be a flow-update rule.
% 	% Then, there is a bijective correspondence between
% 	There is a biective correspondence between
% 	flow-update rules
% 	% $F : \Phi \times[0,\infty] \times \Theta \to \Theta$.
% 	and
% 	$\Phi$-indexed families of vector fields $X : \Phi \to \mathfrak X(\Theta)$.
% % Every update rule $F : \Phi \times \mathbb R \to (\Theta  \to \Theta)$
% % satisfying \cref{ax:zero,ax:additivity,ax:diffble} corresponds to a unique
% % $\Phi$-indexed collection of vector fields
% %     $F' : \Phi \times \Theta \to T\Theta$
% \[
% 	X()
% \]
% \end{prop}
% \begin{coro}\label{thm:vecrep}
%     There is a natural bijection between
%     % update rules $F : \Phi \times \mathbb R \to \Delta \X \to \Delta \X$
%     update rules $F : \Phi \times \mathbb R \to (\Theta  \to \Theta)$
%         satisfying \cref{ax:zero,ax:additivity,ax:diffble},
%     and $\Phi$-indexed collections of complete vector fields
%         % $\{ F'_\phi : \Delta X \to T \Delta X \}_{\phi \in \Phi}$%
%         % $\{ F'_\phi : \Theta \to T \Theta \}_{\phi \in \Phi}$%
%         $ F' :  \Phi \times \Theta \to T \Theta$%
%         % $F' : \Phi \to \Delta\X \to T\Delta \X$%
%     .
% \end{coro}
% In the language of
%
% Not all vector fields can be integrated to get an update function
%
% \begin{coro}\label{thm:vecrep}
% There is a bijective correspondence between udpate rules satisfying \cref{ax:zero,ax:additivity,ax:diffble} and $\Phi$-indexed collections of \textbf{complete} vector fields.
% \end{coro}
% We call $F'$ the \emph{vector field representation} of an update function $F$.
% This vector field representation of an update function
It may seem counter-intuitive that the vector field $\Lrn'_\phi$,
which does not mention confidence at all, alegedly captures confidence---%
% says anything about confidence, given that it this vector field no longer says anything about 
% can capture confidence, given that it no longer has 
% In a sense, it does so by specifying
\unskip but it does, intuitively, by specifying
everything about the learning process {except} for the degree of confidence itself.
\commentout{
This vector field representation is useful for two reasons:
at a practical level, it gives us a natural extension of $\Phi$
that allows us deal with ``mixtures'' of observations and commonly arise.
At a deeper level, it will enable us to describe and classify
the flow update functions on $\Theta$.
}%
% Having separated the confidence from the mechanics of the update,
% this vector field representation allows us to describe and
% classify update functions on $\Theta$

We now return to orderless combination of observations.
% \subsection{Orderless Combination of Observations}
% \textbf{Orderless Combination of Observations.}
One key property of vector fields is thier closure under linear combination---and since
% Therefore, flow update functions,
% Since in the presence of a flow update function,
% observations correspond to vector fields,
commitment flows and vector fields are equivalent, 
% observations also inherrit this linear structure.
% we can allow the observations themselves to inherit this linear structure.
we can extend this linear structure to observations themselves.
%
\commentout{%
%
There are two aspects of linearity: scalar multiplication, and addition.
% The first way of combining
From scalar mutliplication, we get a way of rescaling
inputs
% $\phi$
by a ``relative confidence'' $k
 % \in [0,\infty)
$.
% \begin{prop}
	% Suppose $F$ is a flow udpate rule.
% For $\phi \in \Phi$, we can extend $F$
Concretely, given $\phi \in \Phi$ and $k \in (0,\infty)$,
 % given $k$ and $\phi$,
define a new observation
% $k\cdot\phi \in \ext\Phi$
$k\cdot\phi$
% and extend $F$ to handle it by:
and extend $F$ to
a function $\ext \Lrn$ that handles it by:
% \[
$
	\ext \Lrn^{\chi}_{k\cdot\phi}(\theta) := \Lrn^{k\chi}_{\phi}(\theta)
$
	% ,\quad\text{or equivalently,}\quad
\unskip, or equivalently, $
	\Lrn'_{k\cdot \phi} := k \Lrn'_{\phi}
	.
% \]
$
% \end{prop}
The rescaled input
$k\cdot \phi$ behaves the same way that $\phi$
does for extreme values of confidence,
since $k 0 = 0$ and $k\infty = \infty$.
%
% It is no accident that these rescalings 
This is precisely the same degree of freedom as exposed in  \cref{prop:az-iso}.
}%
% In this way, the set $\Phi$ inherits
% the additivity of the update rule in the form of scalar multiplication.
% It turns out more is possible: updates inherit the entire vector space structure.
%
This gives us a natural way to combine observations in parallel.
% This can be quite powerful.
% In particular, given \cofunc s $F, G : \mathbb R \to \Theta$, we can define
% $F \oplus G$ via the vector field $(F \oplus G)' = F' + G'$.
% \begin{defn}
% 	For $\phi_1, \phi_2 \in \Phi$, we extend $F$ to
% 	$\phi_1 \oplus \phi_2$
% \end{defn}
Concretely,
given $\phi_1, \phi_2 \in \Phi$, we can form a new input
$\phi_1 \oplus \phi_2$ and extend $\Lrn$ to handle it by taking
$\Lrn'_{\phi_1 \oplus \phi_2} := \Lrn'_{\phi_1} + \Lrn'_{\phi_2}$.
% This commitment flow may not have a simple closed-form flow representation, 
% even if both $\phi_1$ and $\phi_2$ do.
% Still, 
% 	there is a unique 
% 	such function, if it exists.
% % We now prove that it does, except possibly for full confidence.
% 
Standard existence theorems (and uniqueness) theorems for ordinary differential equations then apply.
Nevertheless, there are several wrinkles: 
in some cases, $\Lrn_{\phi_1\oplus\phi_2}$ may only continuously extend to a finite 
$\lim_{t \to \infty} \Lrn^{t}_{\phi_1 \oplus \phi_2}$ may not exist,
in which case we cannot continuously extend $\Lrn_{\phi_1 \oplus \phi_2}$ to handle full confidence,
and $\Lrn_{\phi_1\oplus\phi_2}$ might not satisfy \cref{ax:acyclic}.
% ---that is,
% $\lim_{\beta \to \infty} F^{\beta}_{\phi_1 \oplus \phi_2}$ may not exist.
% Thus, it may not be meaningful to observe $\phi_1 \oplus \phi_2$ with full confidence.
We leave $\phi_1 \oplus \phi_2$ undefined in such cases, 
% when $\lim_{\beta \to \infty} \Lrn^{\beta}_{\phi_1 \oplus \phi_2}$ does not exist, 
but point out that having a loss representation for $\Lrn$ (the subject of \cref{sec:loss-repr})
suffices to avoid both problems. 
	% natural class of learners for which this limit is guaranteed to exist. 
	% is a natural way to ensure that it does.
% but we will soon see that the representation introduced in \cref{sec:loss-repr}
% suffices to ensure that it does. 
%
% Inputs $\phi_1$ and $\phi_2$ are said to \emph{commute} if
% $F_{\phi_1}^{\chi_1} \circ F_{\phi_2}^{\chi_2} \ne  F_{\phi_2}^{\chi_2} \circ F_{\phi_1}^{\chi_1}$ for all $\chi_1, \chi_2$.
%
\commentout{%
\begin{prop}
	% For $\bot < \chi_1, \chi_2  < \top$,
	If $\Lrn$ is a flow update function
	% , and $\chi_1, \chi_2, \chi_1', \chi_2' \in (\bot, \top)$,
	then the following are equivalent:
	\begin{enumerate}
		\item $\Lrn_{\phi_1}^{\chi_1} \circ \Lrn_{\phi_2}^{\chi_2} =  \Lrn_{\phi_2}^{\chi_2} \circ \Lrn_{\phi_1}^{\chi_1}$
		for some $\chi_1, \chi_2 \notin \{\bot,\top\}$.
		% \item $F_{\phi_1}^{\chi_1'} \circ F_{\phi_2}^{\chi_2'} =  F_{\phi_2}^{\chi_2'} \circ F_{\phi_1}^{\chi_1'}$
		\item $\Lrn_{\phi_1}^{\chi_1} \circ \Lrn_{\phi_2}^{\chi_2} =  \Lrn_{\phi_2}^{\chi_2} \circ \Lrn_{\phi_1}^{\chi_1}$
		for all $\chi_1, \chi_2 \notin \{\bot,\top\}$.

		\item The vector fields $\Lrn'_{\phi_1}$ and $\Lrn'_{\phi_2}$ commute.
		% i.e.,
		% $F'_{\phi_1}(F'_{\phi_2}(f)) = F'_{\phi_2}(F'_{\phi_1}(f))$ for every smooth function $f$.

		\item
			% $\phi_1\oplus\phi_2$ is defined and
			For all $\chi \in \mathbb R$, 
			$\Lrn^{\chi}_{\phi_1} \circ \Lrn^{\chi}_{\phi_2} = \Lrn^\chi_{\phi_1\oplus\phi_2}$.
	\end{enumerate}
	If this condition holds, then $\phi_1$ and $\phi_2$ are said to \emph{commute}.
\end{prop}
}%

Observations $\phi_1$ and $\phi_2$ \emph{commute} iff $\Lrn_{\phi_1}^{\chi_1} \circ \Lrn_{\phi_2}^{\chi_2} =  \Lrn_{\phi_2}^{\chi_2} \circ \Lrn_{\phi_1}^{\chi_1}$
for all $\chi_1, \chi_2 \neq \top$. 
% \notin \{\bot,\top\}$.
%
Clearly $\phi_1 \oplus \phi_2 = \phi_2 \oplus \phi_1$ when either is
defined, so $\oplus$ provides a way of combining observations
orderlessly, even in cases where $\phi_1$ and $\phi_2$ do not commute%
% (that is, ).
% And when $\phi_1$ and $\phi_2$
% already do not depend on order, $\phi_1\oplus \phi_2$ has the same effect
% as $\phi_1$ followed by $\phi_2$.
---and when they do, $\phi_1\oplus \phi_2$
is equivalent to observing $\phi_1$ and $\phi_2$ in either order.
%
\commentout{%
\begin{prop}
	% If $\phi_1$ and $\phi_2$ commute
	% (i.e., $F^{\chi}_{\phi_1} \circ F^{\chi}_{\phi_2} =
	%  	F^{\chi}_{\phi_2} \circ F^{\chi}_{\phi_1}$ for all $\chi$)
	% \unskip, then both are equal to $F^{\chi}_{\phi_1 \oplus \phi_2}$
	% for all $\chi
	%  % \in [0,\infty]
	%  $.
	If $\Lrn^{\chi}_{\phi_1} \circ \Lrn^{\chi}_{\phi_2} =
		\Lrn^{\chi}_{\phi_2} \circ \Lrn^{\chi}_{\phi_1}$,
	% both equal
	then both updates are equal to
	 $\Lrn^{\chi}_{\phi_1 \oplus \phi_2}$. % can add \! before period to fit on one line.
	\commentout{That is,
	\[
		F^{\chi}_{\phi_1}( F^{\chi}_{\phi_2}(\theta))
		=
		F^{\chi}_{\phi_2 \oplus \phi_1} (\theta)
		=
		F^{\chi}_{\phi_1 \oplus \phi_2} (\theta)
		=
		F^{\chi}_{\phi_1}( F^{\chi}_{\phi_2}(\theta))
		.
	\]}
\end{prop}
}%
%
% In a sense, this is because $\phi_1 \oplus \phi_2$ is a ``mixture'' containing
% one part $\phi_1$ and one part $\phi_2$. This intuition is made
% precise by the following proposition, which
The following proposition shows that $\phi_1\oplus\phi_2$ is equivalent to an infinitely
fine interleaving of $\phi_1$ and $\phi_2$ updates.
%


\begin{linked}{prop}{linterleave}
	Suppose $\Lrn_{\phi_1}$ and $\Lrn_{\phi_2}$ are commitment flows.
	For $t \in [0, \infty]$, 
	let
	$L_t := \Lrn_{\phi_2}^t \circ \Lrn_{\phi_1}^t
	% : \Theta \to \Theta
	$
 	denote
	% a confidence-$t$ update $\phi_1$ followed
	% by a confidence-$t$ update of $\phi_2$,
	learning $\phi_1$ followed by $\phi_2$ (both with confidence $t$), 
	% an update with $\phi_1$ followed by an update with $\phi_2$,
	% both made with confidence $t$
	and
	for $n \in \mathbb N$, let
	% $L_t^{(n)}(\theta) := L_t \circ\cdots\circ L_t(\theta)$
	$L_t^{(n)}(\theta) := L_t \circ\cdots\circ L_t(\theta)$
	denote $n$ repeated applications of $L_t$.
	% denote the result starting with $\theta$ and applying $L_t$ $n$t$.
	% Symmetrically, let $v_t$ and $v_t^{(n)}$ represent the
	Then
	$
	% \[
		\Lrn_{\phi_1 \oplus \phi_2}^\chi(\theta) =
			\lim\limits_{n \to \infty} L_{\nicefrac\chi n}^{(n)}(\theta)
		%%%v1
		% F_{\phi_1 \oplus \phi_2}^\chi =
		% \lim_{n \to \infty}~~
		% \overbrace{u_{\nf \chi n}\circ u_{\nf \chi n} \circ\cdots\circ
		% 	u_{\nf \chi n}}^{\text{$n$ times}}
		%%%v2
		%  (F^{\frac\chi n}_{\phi_1} \circ F^{\frac\chi n}_{\phi_2})
		%  \circ
		%  (F^{\frac\chi n}_{\phi_1} \circ F^{\frac\chi n}_{\phi_2})
		%  \circ
		%  \cdots
		%  \circ
		%  (F^{\frac\chi n}_{\phi_1} \circ F^{\frac\chi n}_{\phi_2})
		.
	% \]
	$
	\onlyfirsttime{\unskip\footnote{\vnew{
	For completeness, note that \cref{prop:linterleave} is closely related to the \emph{Lie-Trotter product formula} \citep{trotter1959product,cohen1982eigenvalue}, and can be viewed as an interpreted instantiation of it.
	}}}
\end{linked}


\commentout{%
% \paragraph{What Distinguishes This from Control Theory?}
\paragraph{Vector Field Representations and Control Theory.}
In many ways, 
	the assumptions we have made in \cref{sec:vecrep}
	have lead our framework to resemble a dynamical system. 
We have a continuous manifold of states $\Theta$, a set of 
	of inputs (``control signals'') $\Phi$, which cause $\Theta$ 
	to evolve ``over time''. 
% Indeed, the math behind the mo
% However, there are two critical differences
% However, there is a critical difference:
However, there are two critical differences. 
%
% \begin{itemize}
% 	\item 
First, control theory does not require the analogue of a ``full-confidence'' update; there may be no limit as $t \to \infty$.
	% This allows conrol theory to talk about a far more general class of dynamical systems without fixed points. 
% This enables control theory to talk about a far more general class of dynamical systems without fixed points. 
Thus, while control theory must apply to arbitrary dynamical systems, 
	the theory of confidence needs only describe those which describe motion that uniformly approaches a fixed point. % But going in circles does not count as 
%
	% \item 
Second, while ``time'' has a single clear interpretation in control theory, our analogue of additive confidence is only well-defined up to a multiplicative constant.
In some cases, the analogy can break down significantly---%
	observing $\phi_2$ after learning $\phi_1$ with full confidence, for instance, 
	extends ``time'' past $t=\infty$.
% Finally, we mention that confidence may not always be best thought of in these terms. 
Finally, we mention that in some contexts, it is clearer to think of confidence in the range $[0,1]$, never adopting a temporal analogy at all. 
	% It is also sometimes helpful to think in terms of the reparameterized setting of $[0,1]$.
% \end{itemize}
}%
%
% Moreover,
\commentout{
Going back through our examples:
\begin{description}
	\item[{\bf[\cref{ex:prob-simple}]}]
		$g(\mu, \alpha, \phi) = - \log(1-\alpha)$.
		This means that
		$^+\!F(\mu, \beta, \phi) = e^{-\beta} \mu + (1-e^{-\beta}) (\mu\mid \phi)$.
		
	\item [{\bf[\cref{ex:shafer}]}] 
		Weight of evidence $w$ is already additive, so
			$g(m, w, \phi) = w$, and $^+\!F = F$. 
		Meanwhile, degree of support $\alpha$ is translated
		the same way as $\alpha$ in the first example: in this case,
			$g(m, \alpha, \phi) = - \log(1-\alpha)$. 
		
		\commentout{
		As noted in the introduction,
		the restriction of this update rule to belief states
		that are probabilities, gives an update rule}
		% from the $\alpha$ of \cref{ex:prob-simple}, but they differ'
		
				
	\item [{\bf[\cref{ex:classifier}]}] 
		($n$ is already additive).
	% \item [{\bf[\cref{ex:classifier}]}] 
\end{description}
}%



% Recall that \cref{ax:seq-for-more} impilies that the behavior of updates
% is generated by low-confidence updates; we saw a particularly nice
% way of doing that in \cref{ax:additivity},
% which has the feature that confidence behaves the same way no matter what your initial beliefs are.


% Even restricting to , additivity is a particularly natural.
%
% \begin{prop}
% 	If $F$ is a differentiable \cofunc\ with confidence domain $\Rplus$,then there is a unique update rule $G$ with the same confidence domain, that behaves approximately like $F$ for small increments of confidence, and is also additive (\cref{ax:additivity}).
% \end{prop}
% \input{sections/vecfield-repr}
