\section{Introduction}  \label{sec-intro}

In this paper we study sublanguages of Plotkin's functional programming language ${\mathrm{PCF}}$,
which we here take to be the simply typed $\lambda$-calculus over a single base type $\nat$,
with constants 
\[ \begin{array}{rlcrl}
\num{n} & : \nat \mbox{~~for each $n \in \mathbb{N}$} \;, & &
{\mathit{suc}},{\mathit{pre}} & : \nat\rightarrow\nat \;, \\
{\mathit{ifzero}} & : \nat\rightarrow\nat\rightarrow\nat\rightarrow\nat \;, & &
Y_\sigma & : (\sigma\rightarrow\sigma)\rightarrow\sigma \mbox{~~for each type $\sigma$} \;.
\end{array} \]
As usual, we will consider this language to be endowed with a certain (call-by-name) operational semantics,
which in turn gives rise to a notion of \emph{observational equivalence} for ${\mathrm{PCF}}$ programs.

We define the \emph{level} ${\mathrm{lv}}(\sigma)$ of a type $\sigma$ inductively by
\[ {\mathrm{lv}}(\nat) ~=~ 0 \;, ~~~~~~ {\mathrm{lv}}(\sigma\rightarrow\tau) ~=~ \max\,({\mathrm{lv}}(\sigma)+1,{\mathrm{lv}}(\tau)) \;, \]
and define the \emph{pure type} $\pure{k}$ of level $k \in \mathbb{N}$ by
\[ \pure{0} ~=~ \nat \;, ~~~~~~ \pure{k+1} ~=~ \pure{k} \rightarrow \nat \;. \] 
Modifying the definition of ${\mathrm{PCF}}$ so that the constants $Y_\sigma$ are admitted only
for types $\sigma$ of level $\leq k$, we obtain a sublanguage ${\mathrm{PCF}}_k$ for any $k \in \mathbb{N}$.
Our main result will be that for each $k$, the expressive power of ${\mathrm{PCF}}_{k+1}$ strictly
exceeds that of ${\mathrm{PCF}}_k$: in particular, there is no closed term of ${\mathrm{PCF}}_k$ that is observationally equivalent to $Y_{\pure{k+1}}$.
(Fortunately, `observational equivalence' has the same meaning for all the 
languages in question here, as will be explained in Section~\ref{sec-background}.)
This answers a question posed explicitly by Berger in \cite{Berger-min-recursion}, 
but present in the folklore at least since the early 1990s.
It is worth remarking that the situation is quite different for various extensions of ${\mathrm{PCF}}$
considered in the literature, in which one may restrict to recursions at level~1 types
without loss of expressivity (see Subsection~\ref{subsec-embed}).

We can phrase our result more denotationally in terms of the type structure
${\mathsf{SF}}$ of \emph{(PCF-)sequential functionals} or its effective substructure ${\mathsf{SF}}^{\mbox{\scriptsize \rm eff}}$.
As will be reviewed in Section~\ref{sec-background},
the latter may be conveniently characterized up to isomorphism as the closed term model for ${\mathrm{PCF}}$
modulo observational equivalence.
Our result can therefore be understood as saying that more elements of ${\mathsf{SF}}^{\mbox{\scriptsize \rm eff}}$ (and hence of ${\mathsf{SF}}$)
are denotable in ${\mathrm{PCF}}_{k+1}$ than in ${\mathrm{PCF}}_k$.
From this we may easily infer that there is no finite `basis'
$B \subseteq {\mathsf{SF}}^{\mbox{\scriptsize \rm eff}}$ relative to which all elements of ${\mathsf{SF}}^{\mbox{\scriptsize \rm eff}}$ are $\lambda$-definable 
(see Corollary~\ref{no-finite-basis-cor} below).

The models ${\mathsf{SF}}$ and ${\mathsf{SF}}^{\mbox{\scriptsize \rm eff}}$ are \emph{extensional}:
elements of type $\sigma \rightarrow \tau$ can be considered as mathematical functions mapping 
elements of type $\sigma$ to elements of type $\tau$.
Whilst our theorem is naturally stated in terms of these extensional models, 
its proof will make deep use of a more intensional model which yields ${\mathsf{SF}}$
as its extensional quotient.
This intensional model of PCF has been considered many times before in the literature,
for instance as the \emph{${\mathrm{PCF}}$ B\"ohm tree} model of Amadio and Curien \cite{Amadio-Curien},
and it is known to be isomorphic to the game models of PCF given by 
Abramsky, Jagadeesan and Malacaria \cite{AJM} and by Hyland and Ong \cite{Hyland-Ong}.
In this paper, we choose to work with the \emph{nested sequential procedure} (NSP) 
presentation of this model, as studied in detail in the recent book of Longley and Normann 
\cite{Longley-Normann}. 
(We will touch briefly on the possible use of other presentations for this purpose
in Section~\ref{sec-further-work}.)
We shall denote the NSP model by ${\mathsf{SP}}^0$; 
its construction will be reviewed in Section~\ref{sec-background},
but in the meantime, let us offer a high-level overview of our proof method
without assuming detailed knowledge of this model.

As a motivating example, fix $k \in \mathbb{N}$, and consider the ${\mathrm{PCF}}$ term
\[  \Phi_{k+1} ~:~ (\nat \rightarrow \pure{k+1} \rightarrow \pure{k+1}) \rightarrow \nat \rightarrow \pure{k+1}  \]
given informally by
\[ \Phi_{k+1}\;g\;n ~=~ g\;n\;(\Phi_{k+1}\;g\;(n+1)) ~=~ g\;n\;(g\;(n+1)\;(g\;(n+2)\;(g\;\cdots))) \;,
\]
or more formally by
\[ \Phi_{k+1} ~=~ \lambda g n.\; Y_{\nat \rightarrow \pure{k+1}} \,
   (\lambda f m.\,g\,m\,(f(m+1)))\,n  \;, \]
where $f$ has type $\nat \rightarrow \pure{k+1}$.
Clearly, $\Phi_{k+1}$ is a term of ${\mathrm{PCF}}_{k+1}$; however, we will be able to
show that the element of ${\mathsf{SF}}$ that $\Phi_{k+1}$ defines is not denotable in ${\mathrm{PCF}}_k$. 

What is the essential feature of this function that puts it beyond the reach of ${\mathrm{PCF}}_k$?
To get a hint of this, we may observe from the informal definition above that
$\Phi_{k+1}$ seems implicitly to involve an infinite nested sequence of calls to its argument $g$, 
and indeed the NSP model makes this idea precise.
Furthermore, each call to $g$ involves an argument of type level $k+1$ resulting from another such call.
Broadly speaking, we shall refer to such a sequence of nested calls (subject to certain other conditions)
as a \emph{$k\!+\!1$-spine} in the NSP associated with $\Phi_{k+1}$.
As a second example, we can see from the natural recursive definition of $Y_{\pure{k+1}}$ itself 
that this too involves a spine of this kind:
\[  Y_{\pure{k+1}} \; h ~=~ h(Y_{\pure{k+1}}\;h) ~=~ h(h(h(\cdots))) \;.  \]
where $h$ has type $\pure{k+1} \rightarrow \pure{k+1}$.
In fact, we may view $Y_{\pure{k+1}}$ as a `special case' of $\Phi_{k+1}$, since
$Y_{\pure{k+1}} \,h$ is observationally equivalent to $\Phi_{k+1} \,(\lambda n.h) \,\num{0}$.

A suitable general definition of $k\!+\!1$-spine turns out to be quite delicate to formulate;
but having done this, it will be possible to prove that no ${\mathrm{PCF}}_k$-denotable NSP contains a
$k\!+\!1$-spine (Theorem~\ref{no-gremlin-thm}).
This will be proved by induction on the generation of such NSPs: in essence, we have to show that none of
the generating operations for the interpretations of ${\mathrm{PCF}}_k$ terms are capable of manufacturing 
spinal NSPs out of non-spinal ones.  This already suffices to show that within the
intensional model ${\mathsf{SP}}^0$, the evident procedure for $Y_{\pure{k+1}}$ is denotable in ${\mathrm{PCF}}_{k+1}$
but not in ${\mathrm{PCF}}_k$.

However, this does not yet establish our main theorem, which concerns not ${\mathsf{SP}}^0$ but its extensional quotient ${\mathsf{SF}}$. For this purpose, we undertake a closer analysis of the function $\Phi_{k+1}$ defined above: 
we show that
not only the NSP arising from the above definition, but any \emph{extensionally equivalent} NSP, 
must necessarily involve a $k\!+\!1$-spine. 
This shows that, within ${\mathsf{SF}}$ (or ${\mathsf{SF}}^{\mbox{\scriptsize \rm eff}}$), the functional given by $\Phi_{k+1}$ is
not denotable in ${\mathrm{PCF}}_k$.
This establishes Berger's conjecture that the languages ${\mathrm{PCF}}_k$ form a strict hierarchy.

Since $\Phi_{k+1}$ is definable from $Y_{\nat \rightarrow \pure{k+1}}$, the above shows that
the element $Y_{\nat \rightarrow \pure{k+1}} \in {\mathsf{SF}}$ is not ${\mathrm{PCF}}_k$-denotable.
To complete the picture, however, we would also like to know that the simpler element 
$Y_{\pure{k+1}} \in {\mathsf{SF}}$ is not ${\mathrm{PCF}}_k$-denotable.
We show this via a more refined version of the above analysis which is of some interest in its own right.
Just as ${\mathrm{PCF}}$ is `stratified' into sublanguages ${\mathrm{PCF}}_k$, we show that each ${\mathrm{PCF}}_k$ may be further
stratified into sublanguages ${\mathrm{PCF}}_{k,1}, {\mathrm{PCF}}_{k,2}, \ldots$ on the basis of the `width' of the types $\sigma$
for which $Y_\sigma$ is permitted (see Definition~\ref{width-def}).
An easy adaptation of our earlier proofs then shows that, for any $l$, 
there are operators $Y_\sigma$ in ${\mathrm{PCF}}_{k,l+2}$ that are not denotable in ${\mathrm{PCF}}_{k,l}$
(the appearance of $l+2$ is admittedly a curiosity here).
Since all these $Y_\sigma$ are themselves readily definable from $Y_{\pure{k+1}}$,
it follows that $Y_{\pure{k+1}} \in {\mathsf{SF}}$ itself is not denotable in ${\mathrm{PCF}}_{k,l}$ for any $l$,
and hence not in ${\mathrm{PCF}}_k$.
This finer analysis illustrates the remarkable richness of structure that ${\mathsf{SF}}$ has to offer.

The paper is organized as follows.
In Section~\ref{sec-background} we recall the necessary technical background on ${\mathrm{PCF}}$ 
and on the models ${\mathsf{SP}}^0$ and ${\mathsf{SF}}$, fleshing out many of the ideas outlined above.
In Section~\ref{sec-denotations} we obtain a convenient inductive characterization 
of the class of procedures denotable in (the `oracle' version of) ${\mathrm{PCF}}_k$,
framed in terms of constructions on the procedures themselves.
In Section~\ref{sec-Y-k+1} we introduce the central concept of a $k\!+\!1$-spinal procedure,
and show using our inductive characterization that no ${\mathrm{PCF}}_k$-denotable procedure can be 
$k\!+\!1$-spinal (this is the most demanding part of the proof).
As noted above, this already shows that ${\mathrm{PCF}}_{k+1}$ denotes more elements of ${\mathsf{SP}}^0$ 
than ${\mathrm{PCF}}_k$ does.
In Section~\ref{sec-extensional} we obtain the corresponding result for ${\mathsf{SF}}$, showing that the element 
$\Phi_{k+1} \in {\mathsf{SF}}$, and hence $Y_{\nat \rightarrow \pure{k+1}} \in {\mathsf{SF}}$, is not ${\mathrm{PCF}}_k$-denotable.
In Section~\ref{sec-pure-type} we adapt our methods to the more fine-grained hierarchy of languages
${\mathrm{PCF}}_{k,l}$ that takes account of the widths of types; this enables us to show also that
$Y_{\pure{k+1}} \in {\mathsf{SF}}$ is not ${\mathrm{PCF}}_k$-denotable.
We conclude in Section~\ref{sec-further-work} with a discussion of related and future work.

The present paper is a revised, corrected and expanded version
of a University of Edinburgh technical report
from July 2015, the most significant changes being 
the addition of the material on the pure type $\pure{k+1}$ in Section~\ref{sec-pure-type},
and a simplified approach to characterizing a suitable substructure of ${\mathsf{SP}}^0$ in 
Section~\ref{sec-denotations}.

I am grateful to Ulrich Berger, Mart\'{i}n Escard\'{o}, Dag Normann and Alex Simpson
for valuable discussions and correspondence, and to Luke Ong and Colin Stirling for helping me to
navigate the existing literature on $\lambda Y$ and recursion schemes
(as discussed in Section~\ref{sec-further-work}).
Many of the participants in the Domains XII workshop in Cork and the Galop XI workshop in Eindhoven
also offered valuable comments; thanks in particular to Neil Jones for drawing his work to my attention.
Finally, I thank the anonymous referees for their careful work on the paper and their valuable suggestions;
these have resulted in significant improvements in both the overall architecture and 
the formal details.


\section{Background}  \label{sec-background}

We here summarize the necessary definitions and technical background from 
\cite{Longley-Normann}, especially from Chapters~6 and 7.

\subsection{The language ${\mathrm{PCF}}$}  \label{subsec-PCF}

In \cite{Scott-TCS}, Scott introduced the language LCF for computable functionals of simple type.
This language is traditionally called ${\mathrm{PCF}}$ when equipped with a standalone
operational semantics as in Plotkin \cite{LCF-considered}.
We will work here with the same version of ${\mathrm{PCF}}$ as in \cite{Longley-Normann},
with the natural numbers as the only base type.
Our types $\sigma$ are thus generated by
\[  \sigma ~::=~ \nat ~\mid~ \sigma \rightarrow \sigma \;, \]
and our terms will be those of the simply typed $\lambda$-calculus
constructed from the constants
\[ \begin{array}{rcll}
  \num{n} & : & \nat & \mbox{for each $n \in \mathbb{N}$} \;, \\
  \mathit{suc},\;\mathit{pre} & : & \nat\rightarrow\nat \;, \\
  \mathit{ifzero} & : & \nat\rightarrow\nat\rightarrow\nat\rightarrow\nat \;, \\
  Y_\sigma & : & (\sigma\rightarrow\sigma) \rightarrow \sigma &
  \mbox{for each type $\sigma$} \;.
\end{array} \]
We often abbreviate the type $\sigma_0 \rightarrow\cdots\rightarrow \sigma_{r-1} \rightarrow \nat$
to $\sigma_0,\ldots,\sigma_{r-1} \rightarrow \mathbb{N}$ or just $\vec{\sigma} \rightarrow \nat$.
As usual, we write $\Gamma \vdash M:\sigma$ to mean that $M$ is a well-typed
term in the environment $\Gamma$ (where $\Gamma$ is a finite list of typed variables).
Throughout the paper, we shall regard the type of a variable $x$ as intrinsic to $x$, and will
often write $x^\sigma$ to indicate that $x$ carries the type $\sigma$.
For each $k \in \mathbb{N}$, the sublanguage ${\mathrm{PCF}}_k$ is obtained by admitting the constants
$Y_\sigma$ only for types $\sigma$ of level $\leq k$.

We endow the class of closed ${\mathrm{PCF}}$ terms with the following small-step reduction rules:
\[ \begin{array}{rclcrcl}
  (\lambda x.M)N & \rightsquigarrow & M[x \mapsto N]   \;, & &
  \mathit{ifzero}\;\num{0} & \rightsquigarrow & \lambda xy.x \;, \\
  \mathit{suc}\;\num{n} & \rightsquigarrow & \num{n+1} \;, & &
  \mathit{ifzero}\;\num{n+1} & \rightsquigarrow & \lambda xy.y \;, \\
  \mathit{pre}\;\num{n+1} & \rightsquigarrow & \num{n} \;, & &
  Y_\sigma M & \rightsquigarrow & M(Y_\sigma M) \;. \\
  \mathit{pre}\;\num{0}   & \rightsquigarrow & \num{0} \;,
\end{array} \]
We furthermore allow these reductions to be applied in certain term contexts.
Specifically, the relation $\rightsquigarrow$ is inductively generated by the rules above along with the clause:
if $M \rightsquigarrow M'$ then $E[M] \rightsquigarrow E[M']$,
where $E[-]$ is one of the contexts
\[ [-]N \;,      ~~~~~~~~ {\mathit{suc}}\,[-] \;, ~~~~~~~~ 
   {\mathit{pre}}\,[-] \;, ~~~~~~~~ {\mathit{ifzero}}\,[-] \;. \]
We write $\rightsquigarrow^*$ for the reflexive-transitive closure of $\rightsquigarrow$.
If $Q$ is any closed ${\mathrm{PCF}}$ term of type $\nat$, it is easy to check that either
$Q \rightsquigarrow^* \num{n}$ for some $n \in \mathbb{N}$ 
or the (unique) reduction path starting from $Q$ is infinite.

This completes the definition of the languages ${\mathrm{PCF}}$ and ${\mathrm{PCF}}_k$.
Whilst the language ${\mathrm{PCF}}_0$ is too weak for programming purposes
(it cannot even define addition), it is not hard to show that even ${\mathrm{PCF}}_1$ is Turing-complete:
that is, any partial computable function $\mathbb{N} \rightharpoonup \mathbb{N}$ is representable 
by a closed ${\mathrm{PCF}}_1$ term of type $\nat \rightarrow \nat$.

We will also refer to the non-effective language ${\mathrm{PCF}}^\Omega$
(or \emph{oracle ${\mathrm{PCF}}$}) obtained by extending the definition of ${\mathrm{PCF}}$ with a
constant $C_f: \nat\rightarrow\nat$ for every set-theoretic partial function $f : \mathbb{N} \rightharpoonup \mathbb{N}$,
along with a reduction rule $C_f \num{n} \rightsquigarrow \num{m}$ for every $n,m$ such that
$f(n)=m$. (In ${\mathrm{PCF}}^\Omega$, the evaluation of a closed term $Q:\nat$ may fail to
reach a value $\num{n}$ either because it generates an infinite computation, or because
it encounters a subterm $C_f(n)$ where $f(n)$ is undefined.)
The languages ${\mathrm{PCF}}^\Omega_k$ are defined analogously.

If $M,M'$ are closed ${\mathrm{PCF}}^\Omega$ terms of the same type $\sigma$, 
and ${\mathcal{L}}$ is one of our languages ${\mathrm{PCF}}_k$, ${\mathrm{PCF}}_k^\Omega$, ${\mathrm{PCF}}$, ${\mathrm{PCF}}^\Omega$,
we say that $M,M'$ are \emph{observationally equivalent in ${\mathcal{L}}$}, 
and write $M \simeq_{\mathcal{L}} M'$,
if for all closed program contexts $C[-^\sigma] : \nat$ of ${\mathcal{L}}$ and all $n \in \mathbb{N}$, we have
\[  C[M] \rightsquigarrow^* \num{n} \mbox{~~iff~~} C[M'] \rightsquigarrow^* \num{n} \;. \]
(It makes no difference to the relation $\simeq_{\mathcal{L}}$ whether we take $C[-]$ to range over
single-hole or multi-hole contexts.)

Fortunately, it is easy to show that all of the above languages give rise to exactly the
same relation $\simeq_{\mathcal{L}}$.
Indeed, it is immediate from the definition that if ${\mathcal{L}},{\mathcal{L}}'$ are two of our languages
and ${\mathcal{L}} \supseteq {\mathcal{L}}'$, then $\simeq_{{\mathcal{L}}} \,\subseteq\, \simeq_{{\mathcal{L}}'}$;
it therefore only remains to verify that $M \simeq_{{\mathrm{PCF}}_0} M'$ implies
$M \simeq_{{\mathrm{PCF}}^\Omega} M'$.
We may show this by a `syntactic continuity' argument,
exploiting the idea that any of the constants $Y_\sigma$ or $C_f$ in ${\mathrm{PCF}}^\Omega$
can be `approximated' as closely as necessary by terms of ${\mathrm{PCF}}_0$.
Specifically, let us write $\bot$ for the non-terminating program $Y_0(\lambda x^0.x) : \nat$
(a term of ${\mathrm{PCF}}_0$), and for any type $\sigma$ write $\bot_\sigma$ for the term of type $\sigma$
of the form $\lambda \vec{x}.\bot$.
For any $j \in \mathbb{N}$, we may then define ${\mathrm{PCF}}_0$ terms
\begin{eqnarray*} 
   Y_\sigma^{(j)} & = & \lambda f^{\sigma\rightarrow\sigma}.\,f^j(\bot_\sigma) \;,  \\
   C_f^{(j)} & = & \lambda n.\,\mathit{case}~n~\mathit{of}~
   (0 \Rightarrow \num{f(0)} \mid \cdots \mid j-1 \Rightarrow \num{f(j-1)}) \;,
\end{eqnarray*}
where we use some evident syntactic sugar in the definition of $C_f^{(j)}$.
For any ${\mathrm{PCF}}^\Omega$ term $M$, let $M^{(j)}$ denote the `approximation' obtained from
$M$ by replacing all occurrences of constants 
$Y_\sigma, C_f$ by $Y_\sigma^{(j)}, C_f^{(j)}$ respectively. 
It is then not hard to show that for closed $Q:\nat$, we have
\[ Q \rightsquigarrow^* \num{n} \mbox{~~~iff~~~} 
   \exists j.\; Q^{(j)} \rightsquigarrow^* \num{n} \;. \]
From this it follows easily that if $C[-]$ is an observing context of ${\mathrm{PCF}}^\Omega$
that distinguishes $M,M'$, then some approximation $C^{(j)}[-]$ (a context of ${\mathrm{PCF}}_0$)
also suffices to distinguish them. This establishes that 
$\simeq_{{\mathrm{PCF}}_0} \,\subseteq\; \simeq_{{\mathrm{PCF}}^\Omega}$.
We may therefore write $\simeq$ for observational equivalence without ambiguity.

In fact, an even more restricted class of observing contexts suffices for ascertaining
observational equivalence of ${\mathrm{PCF}}^\Omega$ terms. 
The well-known \emph{(equational) context lemma}, due to Milner \cite{Milner-PCF},
states that $M \simeq M' : \sigma_0,\ldots,\sigma_{r-1} \rightarrow \nat$
iff $M,M'$ have the same behaviour in all \emph{applicative contexts} of ${\mathrm{PCF}}$---%
that is, if for all closed ${\mathrm{PCF}}$ terms $N_0:\sigma_0,\ldots,N_{r-1}:\sigma_{r-1}$, we have
\[ M N_0 \ldots N_{r-1} \rightsquigarrow^* n \mbox{~~iff~~}
   M' N_0 \ldots N_{r-1} \rightsquigarrow^* n \;. \]
Furthermore, using the above idea of approximation, it is easy to see that we obtain
exactly the same equivalence relation if we allow the $N_i$ here to range only over
closed ${\mathrm{PCF}}_0$ terms---this gives us the notion of 
\emph{${\mathrm{PCF}}_0$ applicative equivalence}, which we shall denote by $\sim_{0}$.

We have concentrated so far on giving a purely operational description of ${\mathrm{PCF}}$.
We are now able to express the operational content of our main theorems as follows.
As in Section~\ref{sec-intro}, we define the type $\pure{k}$ by $\pure{0} = \nat$,
$\pure{k+1} = \pure{k} \rightarrow \nat$;
we shall write $\pure{k}$ simply as $k$ where there is no risk of confusion.

\begin{theorem}  \label{operational-main-thm}
For any $k \geq 1$, there are functionals definable in ${\mathrm{PCF}}_{k+1}$ but not in ${\mathrm{PCF}}^\Omega_k$.
More specifically: 

(i) There is no closed term $M$ of ${\mathrm{PCF}}^\Omega_k$ such that 
$M \simeq Y_{\nat \rightarrow (k+1)}$ (or equivalently $M \sim_{0} Y_{\nat \rightarrow (k+1)}$).

(ii) There is even no closed $M$ of ${\mathrm{PCF}}^\Omega_k$ such that
$M \simeq Y_{k+1}$.
\end{theorem}

We shall obtain part~(i) of this theorem at the end of Section~\ref{sec-extensional},
then in Section~\ref{sec-pure-type} resort to a more indirect method to obtain the stronger 
statement (ii).
The theorem also holds when $k=0$, but this rather uninteresting case does not require
the methods of this paper; it will be dealt with in Subsection~\ref{subsec-power-pcf1}.

Theorem~\ref{operational-main-thm} can be construed as saying that in a suitably 
pure fragment of a functional language such as Haskell,
the computational strength of recursive function definitions increases strictly as the admissible
type level for such recursions is increased.
The point of the formulation in terms of $\sim_{0}$ is to present our result in 
as manifestly strong a form as possible:
there is no $M \in {\mathrm{PCF}}_k$ that induces the same partial function as $Y_{k+1}$ 
even on closed ${\mathrm{PCF}}_0$ terms.

A more denotational formulation of our theorem can be given in terms of
the model ${\mathsf{SF}}$ of \emph{sequential functionals}, which we here define as the
type structure of closed ${\mathrm{PCF}}^\Omega$ terms modulo observational equivalence.
Specifically, for each type $\sigma$, let ${\mathsf{SF}}(\sigma)$ denote the set of closed
${\mathrm{PCF}}^\Omega$ terms $M:\sigma$ modulo $\simeq$.
It is easy to check that application of ${\mathrm{PCF}}^\Omega$ terms
induces a well-defined function 
$\cdot : {\mathsf{SF}}(\sigma\rightarrow\tau) \times {\mathsf{SF}}(\sigma) \rightarrow {\mathsf{SF}}(\tau)$
for any $\sigma,\tau$;
the structure ${\mathsf{SF}}$ then consists of the sets ${\mathsf{SF}}(\sigma)$ along with these application operations.
Using the context lemma, it is easy to see that ${\mathsf{SF}}(\nat) \cong \mathbb{N}_\bot = \mathbb{N} \sqcup \{ \bot \}$,
and also that ${\mathsf{SF}}$ is \emph{extensional}:
if $f,f' \in {\mathsf{SF}}(\sigma\rightarrow\tau)$ satisfy
$f \cdot x = f' \cdot x$ for all $x \in {\mathsf{SF}}(\sigma)$, then $f = f'$.
Thus, up to isomorphism, each ${\mathsf{SF}}(\sigma\rightarrow\tau)$ may be considered simply as a 
certain set of functions from ${\mathsf{SF}}(\sigma)$ to ${\mathsf{SF}}(\tau)$.

Any closed ${\mathrm{PCF}}^\Omega$ term $M:\sigma$ naturally has a denotation $\sem{M}^{\mathsf{SF}}$
in ${\mathsf{SF}}(\sigma)$, namely its own equivalence class. We may therefore restate
Theorem~\ref{operational-main-thm} as:

\begin{theorem}  \label{SF-main-thm}
Suppose $k \geq 1$.

(i) The element $\sem{Y_{\nat\rightarrow(k+1)}}^{\mathsf{SF}}$ is not denotable in ${\mathrm{PCF}}^\Omega_k$.

(ii) Even $\sem{Y_{k+1}}^{\mathsf{SF}}$ is not ${\mathrm{PCF}}^\Omega_k$-denotable.
\end{theorem}

It follows immediately that in any other adequate, compositional model of ${\mathrm{PCF}}^\Omega$
(such as Scott's continuous model or Berry's stable model), the element
$\sem{Y_{k+1}}$ is not ${\mathrm{PCF}}^\Omega_k$-denotable,
since the equivalence relation on ${\mathrm{PCF}}^\Omega$ terms induced by such a model must be contained
within $\simeq$.

By taking closed terms of ${\mathrm{PCF}}$ rather than ${\mathrm{PCF}}^\Omega$ modulo observational
equivalence, we obtain the type structure ${\mathsf{SF}}^{\mbox{\scriptsize \rm eff}}$ of 
\emph{effective sequential functionals}, which can clearly be seen as a substructure of ${\mathsf{SF}}$.
Although the above constructions of ${\mathsf{SF}}$ and ${\mathsf{SF}}^{\mbox{\scriptsize \rm eff}}$ are syntactic,
there are other more mathematical constructions 
(for instance, involving game models \cite{AJM,Hyland-Ong})
that also give rise to these structures, 
and experience suggests that these are mathematically natural classes of 
higher-order functionals.
We now see that Theorem~\ref{SF-main-thm}(i)
implies an interesting absolute property of ${\mathsf{SF}}^{\mbox{\scriptsize \rm eff}}$,
not dependent on any choice of presentation for this structure or any selection of language primitives:

\begin{corollary}[No finite basis]   \label{no-finite-basis-cor}
There is no finite set $B$ of elements of ${\mathsf{SF}}^{\mbox{\scriptsize \rm eff}}$ such that all
elements of ${\mathsf{SF}}^{\mbox{\scriptsize \rm eff}}$ are $\lambda$-definable relative to $B$.
In other words, the cartesian category of ${\mathrm{PCF}}$-computable functionals is not finitely generated.
\end{corollary}

\begin{proof}
Suppose $B = \{ b_0, \ldots, b_{n-1} \}$ were such a set.
For each $i$, take a closed ${\mathrm{PCF}}$ term $M_i$ denoting $b_i$.
Then the terms $M_0,\ldots,M_{n-1}$ between them contain only finitely many
occurrences of constants $Y_\sigma$, 
so these constants are all present in ${\mathrm{PCF}}_k$ for large enough $k$.
But this means that $b_0,\ldots,b_{n-1}$, and hence all elements of ${\mathsf{SF}}^{\mbox{\scriptsize \rm eff}}$, 
are ${\mathrm{PCF}}_k$-denotable, contradicting Theorem~\ref{SF-main-thm}(i).
$\Box$
\end{proof}

\subsection{The model ${\mathsf{SP}}^0$}  \label{sec-SP0}

We turn next to an overview of the 
\emph{nested sequential procedure} (or NSP) model, denoted by ${\mathsf{SP}}^0$.
Further details and motivating examples are given in \cite{Longley-Normann}.
In some respects, however, our presentation here will be more formal than that of \cite{Longley-Normann}:
in particular, our discussion of bound variables and $\alpha$-conversion issues will be somewhat 
more detailed, in order to provide a solid foundation for the delicate syntactic arguments that follow.

The ideas behind this model have a complex history.
The general idea of sequential computation via nested oracle calls 
was the driving force behind Kleene's later papers
(e.g.\ \cite{Kleene-revisited-IV}), although the concept did not receive a 
particularly transparent or definitive formulation there.
Many of the essential ideas of NSPs can be found in early work of
Sazonov \cite{Sazonov-early}, in which a notion of 
\emph{Turing machine with oracles} was used to characterize the 
`sequentially computable' elements of the Scott model.
NSPs as we study them here were first explicitly introduced
in work on game semantics for ${\mathrm{PCF}}$--
both by Abramsky, Jagadeesan and Malacaria \cite{AJM} 
(under the name of \emph{evaluation trees}) 
and by Hyland and Ong \cite{Hyland-Ong} (under the name of \emph{canonical forms}).
In these papers, NSPs played only an ancillary role;
however, it was shown by Amadio and Curien \cite{Amadio-Curien} how 
(under the name of \emph{${\mathrm{PCF}}$ B\"ohm trees}) they
could be made into a model of ${\mathrm{PCF}}$ in their own right.
Similar ideas were employed again by Sazonov
\cite{Sazonov-PCF} to give a standalone characterization of
the class of sequentially computable functionals.
More recently, Normann and Sazonov \cite{Normann-Sazonov} 
gave an explicit construction of the NSP model in 
a somewhat more semantic spirit than \cite{Amadio-Curien}, 
using the name \emph{sequential procedures}.
As in \cite{Longley-Normann}, we here add the epithet `nested'
to emphasize the contrast with other flavours of sequential computation.%
\footnote{A major theme of \cite{Longley-Normann} is that NSPs serve equally well to 
capture the essence of PCF computation and that of Kleene's S1--S9 computability;
this is one reason for preferring a name that is not biased towards PCF.}


As in \cite{Longley-Normann}, our NSPs are generated by means of the following
infinitary grammar, interpreted coinductively.
Here $\bot$ is a special atomic symbol and $n$ ranges over natural numbers.
\begin{eqnarray*}
\mbox{\it Procedures:}~~~~
p,q & ::= & \lambda x_0 \cdots x_{r-1}.\,e \\
\mbox{\it Expressions:}~~~~
d,e & ::= & \bot ~\mid~ n ~\mid~ {\mathtt{case}}~a~\,{\mathtt{of}}~ 
            (i \Rightarrow e_i \mid i \in \mathbb{N}) \\
\mbox{\it Applications:}~~~~~~~
a   & ::= & x\, q_0 \cdots q_{r-1}
\end{eqnarray*}

Here we write $(i \Rightarrow e_i \mid i \in \mathbb{N})$ to indicate an infinite sequence of `branches':
$(0 \Rightarrow e_0 \mid 1 \Rightarrow e_1 \mid 2 \Rightarrow e_2 \mid \cdots)$.

We will use vector notation to denote finite (possibly empty) lists of variables or procedures:
$\vec{x}$, $\vec{q}$.
Our convention will be that a list $\vec{x}$ must be non-repetitive, though a list $\vec{q}$ need not be.
We may use $t$ to range over \emph{NSP terms} of any of the above three kinds;
note that a `term' is formally a (possibly infinite) syntax tree as generated by the above grammar. 
A procedure $\lambda \vec{x}.\bot$ will often be abbreviated to $\bot$.

For the most part, we will be working with terms modulo (infinitary) $\alpha$-equivalence,
and most of the concepts we introduce will be stable under renamings of bound variables.
Thus, a statement $t=t'$, appearing without qualification, will mean that $t,t'$ are $\alpha$-equivalent
(although we will sometimes write $=$ as $=_\alpha$ if we wish to emphasize this).
When we wish to work with terms on the nose rather than up to $=_\alpha$, we shall refer to them
as \emph{concrete terms}.

If each variable is assigned a simple type over $\nat$, then 
we may restrict our attention to \emph{well-typed}\index{well-typed} terms.
Informally, a term will be well-typed unless a typing violation occurs
at some particular point within its syntax tree. 
Specifically, within any term $t$, occurrences of procedures $\lambda \vec{x}.e$
(of any type), applications $x \vec{q}$ (of the ground type $\nat$) and expressions $e$
(of ground type) have types that are related to the types of their constituents and of variables
as usual in type $\lambda$-calculus extended by ${\mathtt{case}}$ expressions of type $\nat$.
We omit the formal definition here since everything works in the expected way;
for a more precise formulation see \cite[Section~6.1.1]{Longley-Normann}.

If $\Gamma$ is any environment (i.e.\ a finite non-repetitive list of variables),
we write $\Gamma \vdash e$ and $\Gamma \vdash a$ to mean that 
$e,a$ respectively are well-typed with free variables in $\Gamma$. 
We also write $\Gamma \vdash p:\tau$ when $p$ is well-typed in $\Gamma$ and
of type $\tau$. 

We shall often refer to variable environments that arise from combining several lists of variables,
which may be represented by different notations, e.g.\ $\Gamma,V,\vec{x}$.  
Since such environments are required to be non-repetitive, we take it to be
part of the content of a typing judgement such as 
$\Gamma,V,\vec{x} \vdash t \,(:\tau)$ that the entire list $\Gamma,V,\vec{x}$ is non-repetitive.
However, the \emph{order} of variables within an environment will typically be of much less concern to us 
(clearly our typing judgements $\Gamma \vdash T \,(:\tau)$ are robust under permutations of $\Gamma$),
and we will sometimes abuse notation by identifying a finite \emph{set} $Z$ of variables
with the list obtained from some arbitrary ordering of it.

It will also be convenient to place another condition on concrete
well-typed terms (not imposed in \cite{Longley-Normann}) in order to exclude \emph{variable hiding}.
Specifically, we shall insist that if $\Gamma \vdash t \,(:\tau)$
then no variable of $\Gamma$ appears as a bound variable within $t$,
nor are there any \emph{nested} bindings within $t$ of the same variable $x$.
(Clearly any concrete term not satisfying this restriction is $\alpha$-equivalent to one that does.)
This will help to avoid confusion in the course of some delicate arguments in which questions of
the identity of variables are crucial.

With these ideas in place, we may take ${\mathsf{SP}}(\sigma)$ to be the set of
well-typed procedures of type $\sigma$ modulo $=_\alpha$, 
and ${\mathsf{SP}}^0(\sigma) \subseteq {\mathsf{SP}}(\sigma)$ the subset constituted by the \emph{closed} procedures
(i.e.\ those that are well-typed in the empty environment).
By inspection of the grammar for procedures, it is easy to see that ${\mathsf{SP}}^0(\nat) \cong \mathbb{N}_\bot$.


As in \cite{Longley-Normann}, we shall need to work not only with NSPs
themselves, but with a more general calculus of NSP \emph{meta-terms}
designed to accommodate the intermediate forms that arise in the course of computations:
\begin{eqnarray*}
\mbox{\it Meta-procedures:} ~~~~
P,Q & ::=   & \lambda \vec{x}.\,E \\
\mbox{\it Meta-expressions:} ~~~~
D,E & ::=   & \bot ~\mid~ n ~\mid~ 
      \caseof{G}{i \Rightarrow E_i \mid i \in \mathbb{N}} \\
\mbox{\it Ground meta-terms:} ~~~~~~~~
G   & ::=   & E ~\mid~ x\, \vec{Q} ~\mid~ P \vec{Q}
\end{eqnarray*}

Here again, $\vec{x}$ and $\vec{Q}$ denote finite lists.
We shall use $T$ to range over meta-terms of any of the above three kinds;
once again, a meta-term is formally a syntax tree as generated by the above grammar.
(Unless otherwise stated, we use uppercase letters for general meta-terms and lowercase
ones for terms.)
Once again, we will normally work with meta-terms up to (infinitary) $\alpha$-equivalence,
but may also work with concrete meta-terms when required.

The reader is warned of an ambiguity arising from the above grammar: a surface form
$\lambda.E$ may be parsed either as a meta-procedure $\lambda \vec{x}.E$ with $\vec{x}$ empty,
or as a ground meta-term $(\lambda \vec{x}.E)\vec{Q}$ with both $\vec{x},\vec{Q}$ empty.
Formally these are two quite distinct meta-terms, bearing in mind that meta-terms are officially syntax trees.
To remedy this ambiguity, we shall therefore in practice write `$()$' to indicate the presence of 
an empty argument list $\vec{Q}$ to a meta-procedure $P$, 
so that the ground meta-term above will be written as $(\lambda.E)()$.
In the absence of `$()$', a surface form $\lambda.E$ should always be interpreted as a meta-procedure.

Meta-terms are subject to the expected typing discipline, leading to typing judgements
$\Gamma \vdash P:\sigma$, $\Gamma \vdash E$, $\Gamma \vdash G$
for meta-procedures, meta-expressions and ground meta-terms respectively.
Again we omit the details: see \cite[Section~6.1.1]{Longley-Normann}.
We shall furthermore require that well-typed concrete meta-terms are subject to the 
no-variable-hiding condition.
We will sometimes write e.g.\ $\Gamma \vdash P$ to mean that $P$ is a well-typed meta-procedure
in environment $\Gamma$, if the type itself is of no particular concern to us.

There is an evident notion of simultaneous capture-avoiding \emph{substitution} 
$T[\vec{x}\mapsto\vec{Q}]$ for well-typed concrete terms. 
Specifically, given $\Gamma,\vec{x} \vdash T \,(:\tau)$ and 
$\Gamma,\vec{y} \vdash Q_i : \sigma_i$ for each $i<r$,
where $\vec{x} = x_0^{\sigma_0},\ldots,x_{r-1}^{\sigma_{r-1}}$, 
we will have $\Gamma,\vec{y} \vdash T[\vec{x}\mapsto\vec{Q}] \,(:\tau)$.
Note that this may entail renaming of bound variables both within $T$ (in order to avoid capture of
variables in $\vec{y}$) and in the $Q_i$ (in order to maintain the no-hiding condition for variables
bound within $T$).
The details of how this renaming is performed will not matter, provided that for each $T,\vec{x},\vec{Q}$ as above we have a determinate choice of a suitable concrete term $T[\vec{x}\mapsto\vec{Q}]$,
so that multiple appearances of the same substitution will always yield the same result.
We also note that substitution is clearly well-defined on $\alpha$-equivalence classes.
Finally, we will say a substitution $[\vec{x}\mapsto\vec{Q}]$ \emph{covers}
a set $V$ of variables if $V$ consists of precisely the variables $\vec{x}$.

As a mild extension of the concept of meta-term, we have an evident notion of a
\emph{meta-term context} $C[-]$: essentially a meta-term containing a `hole' $-$,
which may be of meta-procedure, meta-expression or ground meta-term type
(and in the case of meta-procedures, will carry some type $\sigma$).
Our convention here is that a meta-term context $C[-]$ is permitted to contain only a single 
occurrence of the hole $-$.
Multi-hole contexts $C[-_0,-_1,\ldots]$ will occasionally be used, but again, each hole $-_i$ may
appear only once.

By the \emph{local variable environment} associated
with a concrete meta-term context $\Gamma \vdash C[-]$,
we shall mean the set $X$ of variables $x$ bound within $C[-]$ whose scope includes the hole, 
so that the environment in force at the hole is $\Gamma,X$.
(The no-variable-hiding convention ensures that $X$ and indeed $\Gamma,X$ is non-repetitive.)
Although in principle local variable environments pertain to particular choices of concrete contexts,
most of the concepts that we define using such environments will be easily seen to be invariant
under renamings of bound variables.

Next, there is a concept of \emph{evaluation} 
whereby any concrete meta-term $\Gamma \vdash T\, (:\sigma)$ evaluates to 
an ordinary concrete term $\Gamma \vdash\, \dang{\!T\!} (:\sigma)$.
To define this, the first step is to introduce a \emph{basic reduction}\index{basic reduction} relation $\rightsquigarrow_b$ for concrete ground meta-terms, which we do by the following rules:
\begin{description}
\item[(b1)] \( (\lambda \vec{x}.E) \vec{Q} \rightsquigarrow_b 
         E [\myvec{x} \mapsto \myvec{Q}] ~~~\) (\emph{$\beta$-rule}). 
\item[(b2)] \( \caseof{\bot}{i \Rightarrow E_i} \rightsquigarrow_b \bot \).
\item[(b3)] \( \caseof{n}{i \Rightarrow E_i} \rightsquigarrow_b  E_n \).
\item[(b4)] \( \caseof{(\caseof{G}{i \Rightarrow E_i})}{j \Rightarrow F_j} \rightsquigarrow_b \) \\
 \( \mbox{~~~~~~~~~} \caseof{G}{i \Rightarrow \caseof{E_i}{j \Rightarrow \!F_j}} \).
\end{description}
Note that the $\beta$-rule applies even when $\vec{x}$ is empty: 
thus, $(\lambda.2)() \rightsquigarrow_b 2$.

From this, a \emph{head reduction} relation $\rightsquigarrow_h$ 
on concrete meta-terms is defined inductively:
\begin{description}
\item[(h1)] If $G \rightsquigarrow_b G'$ then $G \rightsquigarrow_h G'$.
\item[(h2)] If $G \rightsquigarrow_h G'$ and $G$ is not a ${\mathtt{case}}$ meta-term, then 
\[ \caseof{G}{i \Rightarrow E_i} ~\rightsquigarrow_h~ \caseof{G'}{i \Rightarrow E_i} \;. \]
\item[(h3)] If $E \rightsquigarrow_h E'$
then $\lambda \myvec{x}.E ~\rightsquigarrow_h~ \lambda \myvec{x}.E'$.
\end{description}

Clearly, for any meta-term $T$, there is at most one $T'$ with $T \rightsquigarrow_h T'$.
We call a meta-term a \emph{head normal form} if it cannot be further 
reduced using $\rightsquigarrow_h$. 
The possible shapes of head normal forms are 
$\bot$, $n$, $\caseof{y \vec{Q}}{i \Rightarrow E_i}$
and $y \vec{Q}$, the first three optionally prefixed by $\lambda \myvec{x}$
(where $\vec{x}$ may contain $y$).

We now define the \emph{general reduction} relation $\rightsquigarrow$ inductively as follows:
\begin{description}
\item[(g1)] If $T \rightsquigarrow_h T'$ then $T \rightsquigarrow T'$.
\item[(g2)] If $E \rightsquigarrow E'$ then 
$\lambda \myvec{x}.E ~\rightsquigarrow~ \lambda \myvec{x}.E'$.
\item[(g3)] If $Q_j = Q'_j$ except at $j=k$ where $Q_k \rightsquigarrow Q'_k$, then
\begin{eqnarray*}
x \vec{Q} & \rightsquigarrow & x \vec{Q}\,' \;,  \\
\caseof{x\, \vec{Q}}{i \Rightarrow E_i} & \rightsquigarrow &
\caseof{x\, \vec{Q}\,'}{i \Rightarrow E_i} \;.
\end{eqnarray*}
\item[(g4)] If $E_i = E'_i$ except at $i=k$ where $E_k \rightsquigarrow E'_k$, then
\[ \caseof{x\, \vec{Q}}{i \Rightarrow E_i} ~\rightsquigarrow~
   \caseof{x\, \vec{Q}}{i \Rightarrow E'_i} \;. \]
\end{description}
It is easy to check that this reduction system is sound with respect to the typing rules.
We emphasize that the relation $\rightsquigarrow$ is defined on concrete meta-terms,
although it is clear that it also gives rise to a well-defined reduction relation on their
$\alpha$-classes.
An important point to note is that modulo the obvious inclusion of terms into meta-terms,
terms are precisely meta-terms in \emph{normal form}, i.e.\ those that cannot be reduced using $\rightsquigarrow$.
(For example, the meta-procedure $\lambda.2$ is in normal form, though the ground meta-term
$(\lambda.2)()$ is not.)
We write $\rightsquigarrow^*$ for the reflexive-transitive closure of $\rightsquigarrow$.

The above reduction system captures the finitary aspects of evaluation.
In general, however, since terms and meta-terms may be infinitely deep,
evaluation must be seen as an infinite process. 
To account for this infinitary aspect, we use some familiar domain-theoretic ideas.

We write $\sqsubseteq$
for the evident syntactic orderings on concrete meta-procedures and on ground 
meta-terms: thus, $T \sqsubseteq U$ iff $T$ may be obtained from $U$ 
by replacing zero or more ground subterms (perhaps infinitely many) by $\bot$.
It is easy to see that for each $\sigma$, the set of all concrete procedure terms of type $\sigma$
forms a directed-complete partial order under $\sqsubseteq$. 

By a \emph{finite} (concrete) term $t$, we shall mean one generated by the following
grammar, this time construed inductively:
\begin{eqnarray*}
\mbox{\it Procedures:}~~~~
p,q & ::= & \lambda x_0 \ldots x_{r-1}.\,e \\
\mbox{\it Expressions:}~~~~
d,e & ::= & \bot ~\mid~ n ~\mid~ {\mathtt{case}}~a~\,{\mathtt{of}}~ 
            (0 \Rightarrow e_0 \mid \cdots \mid r-1 \Rightarrow e_{r-1}) \\
\mbox{\it Applications:}~~~~~~~
a   & ::= & x\, q_0 \ldots q_{r-1}
\end{eqnarray*}
We regard finite terms as ordinary NSP terms 
by identifying the conditional branching
$(0 \Rightarrow e_0 \mid \cdots \mid r-1 \Rightarrow e_{r-1})$ with 
\[ (0 \Rightarrow e_0 \mid \cdots \mid r-1 \Rightarrow e_{r-1} \mid 
    r \Rightarrow \bot \mid r+1 \Rightarrow \bot \mid \cdots) \;.
\]

We may now explain how a general meta-term $T$ \emph{evaluates} 
to a term $\dang{\!T\!}$.
This will in general be an infinite process,
but we can capture the value of $T$ as the limit of the finite portions that
become visible at finite stages in the reduction.
To this end, for any concrete meta-term $T$ we define
\[ \Downarrow_{\mbox{\scriptsize \rm fin}} T ~=~
   \{ t \mbox{~finite~} \mid \exists T'.\; T \rightsquigarrow^* T' ~\wedge~ 
   t \sqsubseteq T' \} \;. 
\]
It is not hard to check that for any meta-term $T$, the set $\Downarrow_{\mbox{\scriptsize \rm fin}} T$
is directed with respect to $\sqsubseteq$ (see \cite[Proposition~6.1.2]{Longley-Normann}).
We may therefore define 
$\dang{\!T\!}$, the \emph{value} of $T$, to be the ordinary concrete term
\[ \dang{\!T\!} ~=~ \bigsqcup \,(\Downarrow_{\mbox{\scriptsize \rm fin}} T) \;. \]
Note in passing that the value $\dang{\!G\!}$ of a ground meta-term $G$
may be either an expression or an application. 
In either case, it is certainly a ground meta-term.
It is also easy to see that
$\dang{\lambda \myvec{x}.E} \,=\, \lambda \myvec{x}.\dang{\!E\!}$,
and that if $T \rightsquigarrow^* T'$ then $\dang{\!T\!} \,=\, \dang{\!T'\!}$.

Whilst we have defined our evaluation operation $\dang{\!-\!}$ for concrete meta-terms,
it is clear that this induces a well-defined evaluation operation on their $\alpha$-classes,
and for the most part this is all that we shall need.
We also note that the syntactic ordering $\sqsubseteq$ on concrete terms induces a partial order
$\sqsubseteq$ on their $\alpha$-classes, and that each ${\mathsf{SP}}(\sigma)$ and ${\mathsf{SP}}^0(\sigma)$
is directed-complete with respect to this ordering.

In the present paper, an important role will be played by the tracking of variable occurrences
(and sometimes other subterms) through the course of evaluation.
By inspection of the above rules for $\rightsquigarrow$, it is easy to see that if $T \rightsquigarrow T'$,
then for any occurrence of a (free or bound) variable $x$ within $T'$, we can identify a
unique occurrence of $x$ within $T$ from which it originates (we suppress the formal
definition). The same therefore applies whenever $T \rightsquigarrow^* T'$.
In this situation, we may say that the occurrence of $x$ within $T'$ is a \emph{residual} of the
one within $T$, or that the latter is the \emph{origin} of the former.
Note, however, that these relations are relative to a particular reduction path
$T \rightsquigarrow^* T'$: there may be other paths from $T$ to $T'$ for which the origin-residual relation
is different.  

Likewise, for any occurrence of $x$ within $\dang{\!T\!}$, we may pick some finite
$t \sqsubseteq \dang{\!T\!}$ containing this occurrence, and some $T' \sqsupseteq t$
with $T \rightsquigarrow^* T'$; this allows us to identify a unique occurrence of $x$ within $T$
that originates the given occurrence in $\dang{\!T\!}$.
It is routine to check that this occurrence in $T$ will be independent of the choice of
$t$ and $T'$ and of the chosen reduction path $T \rightsquigarrow^* T'$;
we therefore have a robust origin-residual relationship between variable occurrences in $T$
and those in $\dang{\!T\!}$.

A fundamental result for NSPs is the \emph{evaluation theorem}, which says that
the result of evaluating a meta-term is unaffected if we choose to evaluate certain subterms
`in advance':

\begin{theorem}[Evaluation theorem]  \label{eval-thm}
If $C[-_0,-_1,\ldots]$ is any meta-term context with countably
many holes and $C[T_0,T_1,\ldots]$ is well-formed, then
\[ \dang{C[T_0,T_1,\ldots]} ~=~ 
   \dang{C[\dang{\!T_0\!},\dang{\!T_1\!},\ldots]}. \]
\end{theorem}

The proof of this is logically elementary but administratively complex: 
see \cite[Section~6.1.2]{Longley-Normann}.



One further piece of machinery will be useful: 
the notion of \emph{hereditary $\eta$-expansion}, which enables us to convert 
a variable $x$ into a procedure term (written $x^\eta$).
Using this, the restriction that variables may appear only at the head of applications
can be seen to be inessential: e.g.\ the `illegal term' $f x$ may be replaced by the legal term $f x^\eta$.
The definition of $x^\eta$ is by recursion on the type of $x$: 
if $x:\sigma_0,\ldots,\sigma_{r-1} \rightarrow \nat$, then
\[ x^\eta ~=~ \lambda z_0^{\sigma_0} \ldots z_{r-1}^{\sigma_{r-1}} . \;
   \caseof{x z_0^\eta \ldots z_{r-1}^\eta}{i \Rightarrow i} \;.  \]
In particular, if $x:\nat$ then $x^\eta = \lambda.\,\caseof{x}{i \Rightarrow i}$.
The following useful properties of $\eta$-expansion are proved in 
\cite[Lemma~6.1.14]{Longley-Normann}:

\begin{lemma}  \label{eta-properties}
$\dang{x^\eta \vec{q}} \;=\, \caseof{x \vec{q}}{i \Rightarrow i}$, and
$\dang{\lambda \vec{y}.\,p \vec{y}\,^\eta} \;=\, p$.
\end{lemma}
The sets ${\mathsf{SP}}(\sigma)$ may now be made into a total applicative
structure ${\mathsf{SP}}$ by defining
\[ (\lambda x_0 \cdots x_r.e) \cdot q ~=~ 
    \lambda x_1 \cdots x_r.\dang{e[x_0 \mapsto q]} \;. \]
Clearly the sets ${\mathsf{SP}}^0(\sigma)$ are closed under this application 
operation, so we also obtain an applicative substructure ${\mathsf{SP}}^0$ of ${\mathsf{SP}}$.
It is easy to check that application in ${\mathsf{SP}}$ is monotone and continuous
with respect to $\sqsubseteq$.
It is also shown in \cite[Section~6.1.3]{Longley-Normann} that both ${\mathsf{SP}}$ and ${\mathsf{SP}}^0$
are typed \emph{$\lambda$-algebras}: 
that is, they admit a compositional interpretation of typed $\lambda$-terms that
validates $\beta$-equality. 
(The relevant interpretation of pure $\lambda$-terms is in fact given by three of the
clauses from the interpretation of ${\mathrm{PCF}}^\Omega$ as defined in Subsection~\ref{subsec-interp} below.)

\subsection{Interpretation of ${\mathrm{PCF}}$ in ${\mathsf{SP}}^0$}  \label{subsec-interp}

A central role will be played by certain procedures 
$Y_\sigma \in {\mathsf{SP}}^0((\sigma\rightarrow\sigma)\rightarrow\sigma)$
which we use to interpret the ${\mathrm{PCF}}$ constants $Y_\sigma$
(the overloading of notation will do no harm in practice).
If $\sigma = \sigma_0,\ldots,\sigma_{r-1} \rightarrow \nat$,
we define $Y_\sigma = \lambda g^{\sigma\rightarrow\sigma}. F_\sigma[g]$,
where $F_\sigma[g]$ is specified corecursively up to $\alpha$-equivalence by:
\[ F_\sigma[g] ~=_\alpha~ \lambda x_0^{\sigma_0} \ldots x_{r-1}^{\sigma_{r-1}}.\;
                              \caseof{g\;(F_\sigma[g])\; x_0^\eta \cdots x_{r-1}^\eta}{i \Rightarrow i} \;. \]
(A concrete representative of $F_\sigma[g]$ satisfying the no-hiding condition will of course feature 
a different choice of bound variables $x_0,\ldots,x_{r-1}$ at each level.)
We may now give the standard interpretation of ${\mathrm{PCF}}^\Omega$ in ${\mathsf{SP}}$.
To each ${\mathrm{PCF}}^\Omega$ term $\Gamma \vdash M:\sigma$ we associate
a procedure-in-environment $\Gamma \vdash \sem{M}^{{\mathsf{SP}}}_\Gamma : \sigma$
(denoted henceforth by $\sem{M}_\Gamma$)
inductively as follows:
\begin{eqnarray*}
\sem{x^\sigma}_\Gamma & = & x^{\eta} \\
\sem{\num{n}}_\Gamma & = & \lambda.n \\
\sem{{\mathit{suc}}}_\Gamma & = & \lambda x.\,\caseof{x}{i \Rightarrow i+1} \\
\sem{{\mathit{pre}}}_\Gamma & = & \lambda x.\,\caseof{x}{0 \Rightarrow 0 \mid i+1 \Rightarrow i} \\
\sem{{\mathit{ifzero}}}_\Gamma & = & 
\lambda xyz.\,{\mathtt{case}}~{x}~{\mathtt{of}}~(0 \Rightarrow \caseof{y}{j \Rightarrow j} \\
& & ~~~~~~~~~~~~~~~\mid~i+1 \Rightarrow \caseof{z}{j \Rightarrow j}) \\
\sem{Y_\sigma}_\Gamma & = & Y_\sigma \\
\sem{C_f}_\Gamma & = & \lambda x.\,\caseof{x}{i \Rightarrow f(i)} \\
\sem{\lambda x^\sigma.M}_\Gamma & = & \lambda x.\sem{M}_{\Gamma,x} \\
\sem{MN}_\Gamma & = & \sem{M}_\Gamma \cdot \sem{N}_\Gamma
\end{eqnarray*}
(In the clause for $C_f$, we interpret $f(i)$ as $\bot$ whenever $f(i)$ is undefined.)

The following key property of $\sem{-}^{\mathsf{SP}}$ is shown as Theorem~7.1.16 in \cite{Longley-Normann}:

\begin{theorem}[Adequacy]   \label{SP0-adequacy}
For any closed ${\mathrm{PCF}}^\Omega$ term $M:\nat$,
we have $M \rightsquigarrow^* \num{n}$ iff $\sem{M} = \lambda.n$.
\end{theorem}

We may now clarify the relationship between ${\mathsf{SP}}^0$ and ${\mathsf{SF}}$.
First, there is a natural `observational equivalence' relation $\approx$ on each ${\mathsf{SP}}^0(\sigma)$,
defined by
\[ q \approx q' \mbox{~~~iff~~~} \forall r \in {\mathsf{SP}}^0(\sigma\rightarrow\nat).~~
   r \cdot q  ~=~ r \cdot q' \;. \]
It is not hard to see that if $p \approx p' \in {\mathsf{SP}}^0(\sigma\rightarrow\tau)$ and $q \approx q' \in {\mathsf{SP}}^0(\sigma)$
then $p \cdot q \approx p' \cdot q \approx p' \cdot q' \in {\mathsf{SP}}^0(\tau)$.
Explicitly, the first of these equivalences holds because for any $r \in {\mathsf{SP}}^0(\tau\rightarrow\nat)$ we have
(using Lemma~\ref{eta-properties}) that
\[ r \cdot (p \cdot q) ~=~ (\lambda x.r(\lambda \vec{z}.xq\vec{z}^{\,\eta})) \cdot p 
   ~=~ (\lambda x.r(\lambda \vec{z}.xq\vec{z}^{\,\eta})) \cdot p' ~=~ r \cdot (p' \cdot q) \;, \]
while the second equivalence holds because for any $r \in {\mathsf{SP}}^0(\tau\rightarrow\nat)$ we have
\[ r \cdot (p' \cdot q) ~=~ (\lambda y.r(\lambda \vec{z}.p'y^\eta \vec{z}^{\,\eta})) \cdot q 
   ~=~ (\lambda y.r(\lambda \vec{z}.p'y^\eta \vec{z}^{\,\eta})) \cdot q' ~=~ r \cdot (p' \cdot q') \;. \]
We thus obtain a well-defined applicative structure ${\mathsf{SP}}^0 /\!\approx$ as a quotient of ${\mathsf{SP}}^0$;
we write $\theta : {\mathsf{SP}}^0 \rightarrow {\mathsf{SP}}^0 /\!\approx$ for the quotient map.

It turns out that up to isomorphism, this structure ${\mathsf{SP}}^0 /\!\approx$ is none other than ${\mathsf{SF}}$.
Indeed, in \cite{Longley-Normann} this was taken as the definition of ${\mathsf{SF}}$, and the
characterization as the closed term model of ${\mathrm{PCF}}^\Omega$ modulo observational equivalence
proved as a consequence.
In order to fill out the picture a little more,
we will here exhibit the equivalence of these two definitions as a consequence of 
the following non-trivial fact, given as Corollary~7.1.34 in \cite{Longley-Normann}:%
\footnote{What is actually shown in \cite{Longley-Normann} is that every element of ${\mathsf{SP}}^0$ is denotable
on the nose in a language ${\mathrm{PCF}} + \mathit{byval}$, with a certain choice of denotation for the
constant $\mathit{byval}$. 
Since the latter satisfies
$\sem{\mathit{byval}} \approx \sem{\lambda f^{\nat\rightarrow\nat}x^\nat.\,{\mathit{ifzero}}\;x\,(fx)(fx)}$,
the present Theorem~\ref{SP0-completeness-thm} follows easily.}

\begin{theorem}  \label{SP0-completeness-thm}
For every $p \in {\mathsf{SP}}^0(\sigma)$, there is a closed ${\mathrm{PCF}}^\Omega$ term $M: \sigma$ such that
$\sem{M} \approx p$.
\end{theorem}

\begin{proposition}  \label{SF-char}
$({\mathsf{SP}}^0/\!\approx) \cong{\mathsf{SF}}$, via an isomorphism that identifies $\theta(\sem{M}^{\mathsf{SP}})$ 
with $\sem{M}^{\mathsf{SF}}$ for any closed ${\mathrm{PCF}}^\Omega$ term $M$.
\end{proposition}

\begin{proof}
For any element $x \in {\mathsf{SP}}^0/\!\approx$, we may take $p \in {\mathsf{SP}}^0$ with $\theta(p)=x$,
then by Theorem~\ref{SP0-completeness-thm} take $M$ closed with $\sem{M} \approx p$;
we then have that $\theta(\sem{M}) = x$.
In this way, each ${\mathsf{SP}}^0(\sigma)/\!\approx$ corresponds bijectively to the set of 
closed ${\mathrm{PCF}}^\Omega$ terms $M:\sigma$ modulo some equivalence relation $\sim$. 

Recall now that $\simeq$ denotes observational equivalence in ${\mathrm{PCF}}^\Omega$.
To see that $\sim \,\subseteq\, \simeq$, suppose $M \sim M'$, and let $C[-]$ be any suitable observing
context of ${\mathrm{PCF}}^\Omega$. By the compositionality of $\sem{-}^{\mathsf{SP}}$, we obtain some $\sem{C} \in {\mathsf{SP}}^0$
such that $\sem{C[M]} = \sem{C} \cdot \sem{M}$ and similarly for $M'$. 
But $M \sim M'$ means that $\sem{M} \approx \sem{M'}$, whence by the definition of $\approx$
we conclude that $\sem{C[M]} = \sem{C[M']}$ at type $\nat$.
So by Theorem~\ref{SP0-adequacy} we have 
$C[M] \rightsquigarrow^* \num{n}$ iff $C[M'] \rightsquigarrow^* \num{n}$.
Since $C[-]$ was arbitrary, we have shown that $M \simeq M'$.

To see that $\simeq \,\subseteq\, \sim$, suppose $M \simeq M' : \sigma$. 
It will suffice to show that $\sem{M} \approx \sem{M'}$.
So suppose $r \in {\mathsf{SP}}^0(\sigma\rightarrow\nat)$, and 
using Theorem~\ref{SP0-completeness-thm}, take a ${\mathrm{PCF}}^\Omega$ term $R$ such that
$\sem{R} \approx r$.
Then $r \cdot \sem{M} = \sem{R} \cdot {M} = \sem{RM}$ at type $\nat$, and similarly for $M'$.
But since $M \simeq M'$, we have $RM \rightsquigarrow^* n$ iff $RM' \rightsquigarrow^* n$,
whence $\sem{RM} = \sem{RM'}$ by Theorem~\ref{SP0-adequacy}.
Thus $r \cdot \sem{M} = r \cdot \sem{M'}$, and we have shown $\sem{M} \approx \sem{M'}$.

Since each ${\mathsf{SF}}(\sigma)$ consists of closed ${\mathrm{PCF}}^\Omega$ terms $M:\sigma$ modulo $\simeq$,
we have established a bijection $({\mathsf{SP}}^0(\sigma) /\!\approx) \cong {\mathsf{SF}}(\sigma)$ for each $\sigma$.
Moreover, both $\sem{-}^{\mathsf{SP}}$ and $\theta$ respect application, so it follows that 
$({\mathsf{SP}}^0 /\!\approx) \cong {\mathsf{SF}}$, and it is immediate by construction that $\theta(\sem{M}^{\mathsf{SP}})$
is identified with $\sem{M}^{\mathsf{SF}}$.
$\Box$
\end{proof}

\vspace*{1.0ex}
As we have already seen in Subsection~\ref{subsec-PCF}, 
Milner's context lemma for ${\mathrm{PCF}}^\Omega$ implies that
${\mathsf{SF}}$ is extensional.
From this and Proposition~\ref{SF-char}, we may now read off the following
useful characterization of the equivalence $\approx$:

\begin{lemma}[NSP context lemma]  \label{nsp-eq-context-lemma}
Suppose $p,p' \in {\mathsf{SP}}^0(\sigma_0,\ldots,\sigma_{r-1} \rightarrow \nat)$. 
Then $p \approx p'$ iff
\[ \forall q_0 \in {\mathsf{SP}}^0(\sigma_0), \ldots, q_{r-1} \in {\mathsf{SP}}^0(\sigma_{r-1}). ~~
   p \cdot q_0 \cdot \ldots \cdot q_{r-1} ~=~ p' \cdot q_0 \cdot \ldots \cdot q_{r-1} \;. \]
\end{lemma}


We shall also make use of the \emph{observational ordering} on ${\mathsf{SF}}$
and the associated preorder on ${\mathsf{SP}}$.
Let $\sqsubseteq$ be the usual information ordering on ${\mathsf{SF}}(\nat) \cong \mathbb{N}_\bot$,
and let us endow each ${\mathsf{SF}}(\sigma)$ with the partial order $\preceq$ defined by
\[  x \preceq x' \mbox{~~~iff~~~} \forall h \in {\mathsf{SF}}(\sigma\rightarrow\nat),\, n \in \mathbb{N}.~~
    h \cdot x ~\sqsubseteq~ h \cdot x' \;.  \]
It is easy to see that the application operations $\cdot$ are monotone with respect to $\preceq$.
Moreover, Milner's context lemma also exists in an \emph{inequational} form which says, in effect, that
if $f,f' \in {\mathsf{SF}}(\sigma_0,\ldots,\sigma_{r-1}\rightarrow\nat)$ then
\[ f \preceq f' \mbox{~~~iff~~~} \forall y_0 \in {\mathsf{SF}}(\sigma_0),\ldots,y_{r-1} \in {\mathsf{SF}}(\sigma_{r-1}).~~
   f \cdot y_0 \cdot \ldots \cdot y_{r-1} ~\sqsubseteq f' \cdot y_0 \cdot \ldots \cdot y_{r-1} \;. \]
Thus, if elements of ${\mathsf{SF}}(\sigma\rightarrow\tau)$ are considered as functions ${\mathsf{SF}}(\sigma) \rightarrow {\mathsf{SF}}(\tau)$,
the partial order $\preceq$ coincides with the pointwise partial order on functions.

We write $\preceq$ also for the preorder on each ${\mathsf{SP}}^0(\sigma)$
induced by $\preceq$ on ${\mathsf{SF}}$: that is, $p \preceq p'$ iff $\theta(p) \preceq \theta(p')$.
Furthermore, we extend the use of the notations $\approx,\,\preceq$ in a natural way to open terms
(in the same environment), and indeed to meta-terms: 
e.g.\ we may write $\vec{x} \vdash P \preceq P'$ to mean
$\dang{\lambda \vec{x}.P} \;\preceq\; \dang{\lambda \vec{x}.P'}$.

Clearly $p \approx p'$ iff $p \preceq p' \preceq p$.
The following is also now immediate:

\begin{lemma}  \label{nsp-ineq-context-lemma}
Suppose $p,p' \in {\mathsf{SP}}^0(\sigma)$ where $\sigma = \sigma_0,\ldots,\sigma_{r-1} \rightarrow \nat$. 
Then the following are equivalent:

(i) $p \preceq p'$.

(ii) \( \forall r \in {\mathsf{SP}}^0(\sigma\rightarrow\nat).~ r \cdot p \sqsubseteq r \cdot p' \).

(iii) \( \forall q_0 \in {\mathsf{SP}}^0(\sigma_0), \ldots, q_{r-1} \in {\mathsf{SP}}^0(\sigma_{r-1}). ~~
       p \cdot q_0 \cdot \ldots \cdot q_{r-1} \,\sqsubseteq\, p' \cdot q_0 \cdot \ldots \cdot q_{r-1} \).
\end{lemma}

We conclude this subsection by reformulating some of the major milestones in our proof
using the notation now available.
Specifically, in Sections~\ref{sec-denotations} and \ref{sec-Y-k+1} we will show the following:

\begin{theorem}  \label{SP0-main-thm}
For any $k \geq 1$, the element $\sem{Y_{k+1}} \in {\mathsf{SP}}^0$ is not ${\mathrm{PCF}}^\Omega_k$-denotable
(whence neither is  $\sem{Y_{0 \rightarrow (k+1)}}$).
\end{theorem}
In Section~\ref{sec-extensional} we will go on
to show that no $Z \approx \sem{Y_{0 \rightarrow (k+1)}}$ can be ${\mathrm{PCF}}^\Omega_k$-denotable,
establishing Theorem~\ref{SF-main-thm}(i).
In Section~\ref{sec-pure-type} we will resort to a more refined version of our methods 
to show the same for $Y_{k+1}$;
this will establish Theorem~\ref{SF-main-thm}(ii).


\subsection{The embeddability hierarchy}  \label{subsec-embed}

The following result will play a crucial role in this paper:

\begin{theorem}[Strictness of embeddability hierarchy]  \label{no-retraction-thm}
In ${\mathsf{SF}}$, no type $\pure{k+1}$ can be a \emph{pseudo-retract}
of any finite product $\Pi_i \sigma_i$ where each $\sigma_i$ 
is of level $\leq k$.
More formally, if $z$ is a variable of type $\pure{k+1}$ and each $x_i$ a variable of type $\sigma_i$,
there cannot exist procedures
\[  z \vdash t_i : \sigma_i \;, ~~~~~~ \vec{x} \vdash r : \pure{k+1}  \]
such that $z \vdash {r[\vec{x} \mapsto \vec{t}\,]} \,\succeq z^\eta$.
\end{theorem}

If in the above setting we had $z \vdash {r[\vec{x} \mapsto \vec{t}\,]} \,\approx z^\eta$,
we would call $\pure{k+1}$ a \emph{retract} of $\Pi_i \sigma_i$.
In Appendix A we will show that the notions of retract and pseudo-retract actually coincide,
since $z \vdash p \succeq z^\eta$ implies $z \vdash p \approx z^\eta$.
However, this fact will not be needed for the main results of this paper.

In our statement of Theorem~\ref{no-retraction-thm},
we have referred informally to a product $\Pi_i \sigma_i$ which we have
not precisely defined (although the formal statement of the theorem gives 
everything that is officially necessary).
One may readily make precise sense of this product notation within the
\emph{Karoubi envelope} ${\mathbf{K}}({\mathsf{SF}})$ as studied in \cite[Chapter~4]{Longley-Normann}:
for instance, it is not hard to show that any finite product of level $\leq k$ types can
be constructed as a retract of the pure type $\pure{k+1}$.
In the present paper, however, references to product types may be taken to be
purely informal and motivational.


The proof of Theorem~\ref{no-retraction-thm} appears in
\cite[Section~7.7]{Longley-Normann}, but because of its crucial role in the paper 
we reprise it here with some minor stylistic improvements.

\vspace*{1.0ex}
\begin{proof}
By induction on $k$. 
For the case $k=0$, we note that 
$\nat\rightarrow\nat$ cannot be a pseudo-retract of any $\nat^r$, since (for example)
the set of maximal elements in ${\mathsf{SF}}(\nat\rightarrow\nat)$ is of larger cardinality than
the set of all elements of ${\mathsf{SF}}(\nat)^r$.
(Alternatively, one may note that $\nat\rightarrow\nat$ is not a \emph{retract} of $\nat^r$,
since the former contains strictly ascending chains of length $r+2$ while the latter does not;
then use the method of Appendix~A in the easy case $k=1$ to show that 
any pseudo-retraction of the relevant type would be a retraction.)


Now assume the result for $k-1$, and suppose 
for contradiction that $z \vdash t_i$ and $\vec{x} \vdash r$ exhibit $\pure{k+1}$
as a pseudo-retract of $\Pi_i \sigma_i$ where each $\sigma_i$ is of level $\leq k$.
Let $v =\, \dang{r[\vec{x} \mapsto \vec{t}\,]}$, 
so that $z \vdash v \succeq z^\eta$,
whence $\dang{v[z \mapsto u]}\, \succeq u$ for any $u \in {\mathsf{SP}}^0(k+1)$.
We first check that any $v$ with this latter property must have the syntactic form 
$\lambda f^k.\,\caseof{zp}{\cdots}$ for some $p$ of type $\pure{k}$.
Indeed, it is clear that $v$ does not have the form 
$\lambda f.n$ or $\lambda f.\bot$, and the only other alternative form
is $\lambda f.\,\caseof{fp'}{\cdots}$. 
In that case, however, we would have 
\[ \dang{v[z \mapsto \lambda w^k.0]} \cdot\; (\lambda y^{k-1}.\bot) ~=~ \bot \;, \]
contradicting
$\dang{v[z \mapsto \lambda w^k.0]} \cdot\; (\lambda y^{k-1}.\bot) \,\succeq\, (\lambda w.0)(\lambda y.\bot) = 0$.

We now focus on the subterm $p$ in $v = \lambda f^k.\,\caseof{zp}{\cdots}$.
The general direction of our argument will be to show that
$\lambda f^k.p$ represents a function of type $\pure{k} \rightarrow \pure{k}$
that dominates the identity, and that moreover our construction
of $v$ as $\dang{r[\vec{x} \mapsto \vec{t}\,]}$ can be used to split this into functions
$\pure{k} \rightarrow \Pi_j \rho_j$ and $\Pi_j \rho_j \rightarrow \pure{k}$ where the
$\rho_j$ are of level $\leq k-1$, contradicting the induction hypothesis.
An apparent obstacle to this plan is that $z$ as well as $f$ may appear
free in $p$; however, it turns out that we still obtain all the properties
we need if we specialize $z$ here (somewhat arbitrarily) to $\lambda w.0$.

Specifically, we claim that $\lambda f.\!\dang{p[z \mapsto \lambda w.0]} \,\succeq {\mathit{id}}_k$.
By Lemma~\ref{nsp-ineq-context-lemma}, it will suffice to show that
$\dang{p[f \mapsto q, z \mapsto \lambda w.0]} \,\succeq q$ for any $q \in {\mathsf{SP}}^0(k)$.
The idea is that if it is not, then (ignoring the presence of $z$ in $p$ for now)
we may specialize $z$ to some $u$ that will detect the difference
between $p[f \mapsto q]$ and $q$, so that the subterm `$zp$' within $v$ will yield $\bot$,
contradicting that $z \vdash v \succeq z^\eta$.
We can even allow for the presence of $z$ in $p$ by a suitably careful choice of $u$.

Again by Lemma~\ref{nsp-ineq-context-lemma}, it suffices to show that
$\dang{p[f \mapsto q, z \mapsto \lambda w.0]} \cdot\; s \succeq q \cdot s$ for any $s$.
So suppose $q \cdot s = \lambda.n$ whereas 
$\dang{p[f \mapsto q, z \mapsto \lambda w.0]} \cdot\; s \neq \lambda.n$
for some $s \in {\mathsf{SP}}^0(k-1)$ and $n \in \mathbb{N}$.
Take $u = \lambda g.\,\caseof{gs}{n \Rightarrow 0}$,
so that $u \cdot q' = \bot$ whenever $q' \cdot s \neq \lambda.n$.
Then $u \preceq \lambda w.0$ by Lemma~\ref{nsp-ineq-context-lemma},
so we have $\dang{p[f \mapsto q, z \mapsto u]} \cdot\; s \neq \lambda.n$
since $\lambda.n$ is maximal in ${\mathsf{SP}}^0(\nat)$.
By the definition of $u$, it follows that
$\dang{(zp)[f \mapsto q, z \mapsto u]}\, = \bot$, whence
$\dang{v[z \mapsto u]} \cdot\, q = \bot$, whereas $u \cdot q=0$,
contradicting $\dang{v[z \mapsto u]}\, \succeq u$.
This completes the proof that
$\lambda f.\!\dang{p[z \mapsto \lambda w.0]} \; \succeq {\mathit{id}}_{k}$.

Next, we show how to split the function represented by this procedure
so as to go through some $\Pi_j \rho_j$ as above.
Since $\dang{r[\vec{x} \mapsto \vec{t}\,]} \,= \lambda f.\,\caseof{zp}{\cdots}$, we have that
$r[\vec{x}\mapsto\vec{t}\,]$ reduces in finitely many steps to a head normal form
$\lambda f.\,\caseof{zP}{\cdots}$ where $\dang{\!P\!}\,=p$.
By working backward through this reduction sequence, we may locate
the ancestor within $r[\vec{x}\mapsto\vec{t}\,]$ of this head occurrence of $z$.
Since $z$ does not appear free in $r$, this occurs within some $t_i$, and clearly it must 
appear as the head of some subterm $\caseof{zp'}{\cdots}$.
Now since $t_i$ has type $\sigma_i$ of level $\leq k$,
and $z:\pure{k+1}$ is its only free variable, 
it is easy to see that all \emph{bound}
variables within $t_i$ have pure types of level $<k$.
Let $x'_0,x'_1,\ldots$ denote the finitely many bound variables that are in scope
at the relevant occurrence of $zp'$, and suppose each $x'_j$ has type
$\rho_j$ of level $<k$.
By considering the form of the head reduction sequence
$r[\vec{x}\mapsto\vec{t}\,] \rightsquigarrow^*_h \lambda f.\,\caseof{zP}{\cdots}$,
we now see that $P$ has the form $p'[\vec{x}\,' \mapsto \vec{T}]$
where each $T_j: \rho_j$ contains at most $f$ and $z$ free.%
\footnote{The reader wishing to see a more formal justification for this step may consult the
proof of Lemma~\ref{g-lemma-1}(i) below.}

Writing $^*$ for the substitution $[z \mapsto \lambda w.0]$,
define procedures
\[ f^k \vdash t'_j  =\,  \dang{T_j^*} \,: \rho_j \;, ~~~~~~
\vec{x}\,' \vdash r'  =\,  \dang{p'^*} \,: \pure{k} \;. \]
Then $\dang{r'[\vec{x}\,' \mapsto \vec{t}\,'\,]}$ coincides with the term 
$\dang{\lambda f.\,P^*} \,= \lambda f.\!\dang{\!p^*\!}$,
which dominates the identity as shown above.
Thus $\pure{k}$ is a pseudo-retract of $\Pi_j \rho_j$,
which contradicts the induction hypothesis.
So $\pure{k+1}$ is not a pseudo-retract of $\Pi_i \sigma_i$ after all,
and the proof is complete.
$\Box$
\end{proof}

\vspace*{1.0ex}
As an aside, we remark that for several extensions of ${\mathrm{PCF}}$ studied in the literature,
the situation is completely different, in that the corresponding fully abstract and universal models
possess a \emph{universal} simple type $\upsilon$ of which all simple types are retracts.
It follows easily in these cases that one can indeed bound the type levels of recursion operators
without loss of expressivity. For example:

\begin{itemize}
\item In the language ${\mathrm{PCF}}+{\mathit{por}}+{\mathit{exists}}$ considered by Plotkin \cite{LCF-considered},
the type $\nat\rightarrow\nat$ is universal, and the proof of this shows that every program in this
language is observationally equivalent to one in ${\mathrm{PCF}}_1 + {\mathit{por}}+{\mathit{exists}}$.
(This latter fact was already noted in \cite{LCF-considered}.)
\item In ${\mathrm{PCF}}+{\mathit{catch}}$ (a slight strengthening of Cartwright and Felleisen's language SPCF
\cite{Cartwright-Felleisen}), the type $\nat\rightarrow\nat$ is again universal, and again
the sublanguage ${\mathrm{PCF}}_1 + {\mathit{catch}}$ has the same expressive power.
\item In the language ${\mathrm{PCF}}+H$ of Longley \cite{SRF}, the type $(\nat\rightarrow\nat)\rightarrow\nat$
is universal, but even here, all constants $Y_\sigma$ with ${\mathrm{lv}}(\sigma)>1$ are dispensable.
\end{itemize}

Further details of each the above scenarios may be found in \cite{Longley-Normann}.
These facts may offer some indication of why a `cheap' proof of our present results 
in the setting of pure ${\mathrm{PCF}}$ is not to be expected.%
\footnote{That PCF manifests greater structural complexity than many stronger languages
is also a moral of Loader's undecidability theorem for finitary PCF \cite{Loader-PCF}.
However, the complexity we explore here seems quite orthogonal to that exhibited by Loader:
we are concerned purely with `infinitary' aspects of definability, the entire finitary structure
being already represented by our ${\mathrm{PCF}}_0$.}


\subsection{Other sublanguages of ${\mathrm{PCF}}$}  \label{subsec-power-pcf1}

Our main theorems establish a hierarchy of languages ${\mathrm{PCF}}_1 < {\mathrm{PCF}}_2 < \cdots\,$.
Before proceeding further, however, we pause to clarify the relationship between ${\mathrm{PCF}}_0$ and ${\mathrm{PCF}}_1$,
and also to survey some of the interesting territory that lies between them, 
in order to situate our theorems within a wider picture.

On the one hand, ${\mathrm{PCF}}_0$ is a rather uninteresting language.
As regards the elements of ${\mathsf{SF}}$ that it denotes, it is equivalent in expressivity to ${\mathrm{PCF}}_\bot$,
a variant of ${\mathrm{PCF}}_0$ in which we replace $Y_0$ by a constant $\bot$ (denoting $\bot \in {\mathsf{SF}}(\nat)$).
This is clear since $Y_0$ and $\bot$ are interdefinable: we have $\bot = Y_0(\lambda x.x)$,
and it is easy to see that $Y_0 = \lambda f. f\bot$ (in ${\mathsf{SF}}$).
By a syntactic analysis of the possible normal forms of type $\pure{1}$ in ${\mathrm{PCF}}_\bot$,
one can show that these are very weak languages that do not even define addition.

However, such an analysis is unnecessary for our purposes,
since there are more interesting languages that clearly subsume ${\mathrm{PCF}}_0$
but are known to be weaker than ${\mathrm{PCF}}_1$.
For instance, Berger \cite{Berger-min-recursion} considered the language ${\mathrm{T}}_0 +{\mathit{min}}$,
where ${\mathrm{T}}_0$ (a fragment of G\"odel's System~T) is the $\lambda$-calculus with
first-order primitive recursion over the natural numbers:
\[  \num{0} ~: \nat \;,  ~~~~~~ {\mathit{suc}} ~: \mathbb{N} \rightarrow \mathbb{N} \;, ~~~~~~
    {\mathit{rec}}_0 ~: \nat \rightarrow (\nat \rightarrow \nat \rightarrow \nat) \rightarrow (\nat \rightarrow \nat) \;, \]
and ${\mathit{min}}$ is the classical \emph{minimization} (i.e.\ unbounded search)
operator of type $(\nat\rightarrow\nat)\rightarrow\nat$.
On the one hand, it is an easy exercise to define $\bot$ in ${\mathrm{T}}_0 + {\mathit{min}}$,
and to define both ${\mathit{rec}}_0$ and ${\mathit{min}}$ in ${\mathrm{PCF}}_1$.
On the other hand, Berger showed that the ${\mathrm{PCF}}_1$-definable functional
$\Phi_0: (\nat\rightarrow\nat\rightarrow\nat) \rightarrow (\nat \rightarrow \nat)$ given by
\[ \Phi_0\;g\;n ~=~ g\;n\;(\Phi_0\;g\;(n+1))  \;, \]
is not expressible in ${\mathrm{T}}_0 + {\mathit{min}}$.%
\footnote{Berger actually considered denotability in the Scott model, 
but his argument applies equally to ${\mathsf{SF}}$.}
As already indicated in Section~\ref{sec-intro},
this functional and its higher-type analogues will play a crucial role in the present paper.

This situation is revisited in \cite[Section~6.3]{Longley-Normann} from the perspective of substructures
of ${\mathsf{SP}}^0$. It is shown that ${\mathrm{T}}_0 + {\mathit{min}}$, and indeed the whole of ${\mathrm{T}} + {\mathit{min}}$, 
can be modelled within 
the substructure ${\mathsf{SP}}^{0,{\mbox{\scriptsize \rm lwf}}}$ of \emph{left-well-founded} procedures, whereas the above functional 
$\Phi_0$ is not representable by any such procedure; thus $\Phi_0$ is not expressible in ${\mathrm{T}} + {\mathit{min}}$.
(The reader may wish to study these results and proofs before proceeding further, since they provide
simpler instances of the basic method that we will use in this paper.)
At third order, there are even `hereditarily total' functionals definable in ${\mathrm{PCF}}_1$ but not
by higher-type iterators, one example being the well-known \emph{bar recursion} operator
(see \cite{Longley-bar-recursion}).

Even weaker than ${\mathrm{T}}_0 + {\mathit{min}}$ is the language of \emph{(strict) Kleene primitive recursion
plus minimization}, denoted by ${\mathsf{Klex}}^{\mbox{\scriptsize \rm min}}$ in \cite{Longley-Normann};
this again subsumes ${\mathrm{PCF}}_0$.
It is shown in \cite{Longley-Normann} that the computational power of ${\mathsf{Klex}}^{\mbox{\scriptsize \rm min}}$
coincides with that of computable \emph{left-bounded} procedures;
this is used to show, for example, that even ${\mathit{rec}}_0$ is not computable in ${\mathsf{Klex}}^{\mbox{\scriptsize \rm min}}$.
We find it reasonable to regard left-bounded procedures
as embodying the weakest higher-order computability notion of natural interest that is still Turing complete.


\section{Sequential procedures for ${\mathrm{PCF}}_k$ terms}   \label{sec-denotations}

For the remainder of the paper, we take $k$ to be some fixed natural number greater than $0$.

In this section we give a direct inductive characterization of the ${\mathrm{PCF}}^\Omega_k$-denotable 
elements of ${\mathsf{SP}}$ by making explicit how our interpretation works for terms of ${\mathrm{PCF}}^\Omega_k$.
The first point to observe is that we may restrict attention to ${\mathrm{PCF}}^\Omega_k$ terms in 
\emph{long $\beta\eta$-normal form}: that is, terms in $\beta$-normal form in which every
variable or constant $z$ of type $\sigma_0,\ldots,\sigma_{r-1} \rightarrow \nat$ is fully applied
(i.e.\ appears at the head of a subterm $z N_0 \ldots N_{r-1}$ of type $\nat$).
Moreover, an inductive characterization of the class of such terms is easily given.

\begin{proposition}  \label{long-beta-eta-prop}
(i) A procedure $\Gamma \vdash p : \sigma$ is denotable by a ${\mathrm{PCF}}^\Omega_k$ term 
$\Gamma \vdash M: \sigma$ iff it is denotable by one in long $\beta\eta$-normal form.

(ii) The class of long $\beta\eta$-normal forms of ${\mathrm{PCF}}^\Omega_k$ is inductively generated by 
the following clauses:
\begin{enumerate}
\item If $\Gamma \vdash N_i : \sigma_i$ is a normal form for each $i<r$ 
and $x^{\sigma_0,\ldots,\sigma_{r-1} \rightarrow \nat} \in \Gamma$, 
then $\Gamma \vdash x N_0 \ldots N_{r-1} : \nat$ is a normal form (note that $r$ may be $0$ here).
\item If $\Gamma, x^\sigma \vdash M : \tau$ is a normal form then so is $\Gamma \vdash \lambda x.M : \sigma \rightarrow \tau$.
\item The numeric literals $\Gamma \vdash \num{n} : \nat$ are normal forms.
\item If $\Gamma \vdash M : \nat$ is a normal form then so are $\Gamma \vdash {\mathit{suc}}\;M : \nat$,
$\Gamma \vdash {\mathit{pre}}\;M : \nat$ and $\Gamma \vdash C_f\,M : \nat$ for any $f: \mathbb{N} \rightharpoonup \mathbb{N}$.
\item If $\Gamma \vdash M : \nat$, $\Gamma \vdash N : \nat$ and $\Gamma \vdash P : \nat$ are
normal forms, then so is $\Gamma \vdash {\mathit{ifzero}}\;M\;N\;P : \nat$.
\item If $\sigma = \sigma_0,\ldots,\sigma_{r-1} \rightarrow \nat$ is of level $\leq k$
and $\Gamma \vdash M : \sigma\rightarrow\sigma$ and $\Gamma \vdash N_i : \sigma_i$ are normal forms,
then $\Gamma \vdash Y_\sigma M N_0 \ldots N_{r-1} : \nat$ is a normal form.
\end{enumerate}
\end{proposition}

\begin{proof}
(i) It is a well-known property of simply typed $\lambda$-calculi that every term $M$ is $\beta\eta$-equivalent
to one in long $\beta\eta$-normal form: indeed, we may first compute the $\beta$-normal form of $M$
and then repeatedly apply the $\eta$-rule to expand any subterms that are not already fully applied.
Moreover, it is shown in \cite[Theorem~6.1.18]{Longley-Normann} that ${\mathsf{SP}}$ is a $\lambda\eta$-algebra,
so that if $\Gamma \vdash M =_{\beta\eta} M'$ then $\sem{M}_\Gamma = \sem{M'}_\Gamma$
in ${\mathsf{SP}}$. This establishes the claim.

(ii) This is clear from the fact that no application may be headed by a $\lambda$-abstraction and that
all occurrences of variables and constants must be fully applied.
$\Box$
\end{proof}

\vspace*{1.0ex}
It follows that the class of ${\mathrm{PCF}}^\Omega_k$-denotable procedures may be generated inductively
by a set of clauses that mirror the above formation rules for long $\beta\eta$-normal ${\mathrm{PCF}}^\Omega_k$ terms.
We now consider each of these formation rules in turn in order to spell out the corresponding
operation at the level of NSPs. In Section~\ref{sec-Y-k+1} we will show that these operations cannot give rise
to \emph{$k\!+\!1$-spinal} procedures, from which it will follow that no ${\mathrm{PCF}}^\Omega_k$-denotable 
procedure can be $k\!+\!1$-spinal.

For the first three formation rules, the effect on NSPs is easily described:

\begin{proposition}  \label{NSP-nf-prop}
(i) If $\Gamma \vdash x N_0 \ldots N_{r-1} : \nat$ in ${\mathrm{PCF}}^\Omega$, then
$\sem{x N_0 \ldots N_{r-1}}_\Gamma = \caseof{x \sem{N_0}_\Gamma \cdots \sem{N_{r-1}}_\Gamma}{j \Rightarrow j}$.

(ii) If $\Gamma \vdash \lambda x.M : \sigma \rightarrow \tau$ in ${\mathrm{PCF}}^\Omega$, then
$\sem{\lambda x.M}_\Gamma = \lambda x.\,\sem{M}_{\Gamma,x}$.

(iii) $\sem{\num{n}}_\Gamma = \lambda.n$.
\end{proposition}

\begin{proof}
(i) Easy using the definition of $\sem{-}$ and Lemma~\ref{eta-properties}.

(ii), (iii) are part of the definition of $\sem{-}$.
$\Box$
\end{proof}

\vspace*{1.0ex}
As regards the formation rules for ${\mathit{suc}}$, ${\mathit{pre}}$, $C_f$ and ${\mathit{ifzero}}$,
the situation is again fairly straightforward, although a little more machinery is needed:

\begin{definition}  \label{rightward-leaf-def}
(i) The set of \emph{rightward} (occurrences of) \emph{numeral leaves} within
a term $t$ is defined inductively by means of the following clauses:
\begin{enumerate}
\item A term $n$ is a rightward numeral leaf within itself.
\item Every rightward numeral leaf within $e$ is also one within $\lambda \vec{x}.e$.
\item Every rightward numeral leaf in each $e_i$ is also one in $\caseof{a}{i \Rightarrow e_i}$.
\end{enumerate}

(ii) If $t$ is a term and $e_i$ an expression for each $i$, 
let $t[i \mapsto e_i]$ denote the result of replacing each rightward leaf occurrence $i$
in $t$ by the corresponding $e_i$.
\end{definition}

\begin{lemma}  \label{rightward-leaf-prop}
$\dang{\caseof{d}{i \Rightarrow e_i}} \;= d[i \mapsto e_i]$
for any expressions $d,e_i$.
\end{lemma}

\begin{proof}
For each $c \in \mathbb{N}$, define a `truncation' operation $-^{(c)}$ on expressions as follows:
\begin{eqnarray*} 
    n^{(c)} ~=~ n \;, ~~~~  \bot^{(c)} & = & \bot \;, \\
    \caseof{a}{i \Rightarrow e_i}^{(0)} & = & \bot \;, \\
    \caseof{a}{i \Rightarrow e_i}^{(c+1)} & = & \caseof{a}{i \Rightarrow e_i^{(c)}} \;. 
\end{eqnarray*}
Then clearly $d = \bigsqcup_c d^{(c)}$ and 
$d[i \mapsto e_i] = \bigsqcup_c d^{(c)}[i \mapsto e_i]$.
Moreover, we may show by induction on $c$ that
\[ \dang{\caseof{d^{(c)}}{i \Rightarrow e_i}} ~=~ d^{(c)}[i \mapsto e_i] \;. \]
The case $c=0$ is trivial since $d^{(0)}$ can only have the form $n$ or $\bot$.
For the induction step, the situation for $d=n,\bot$ is trivial, so let us suppose
$d = \caseof{a}{j \Rightarrow f_j}$. Then
\begin{eqnarray*}
&  & \dang{\caseof{d^{(c+1)}}{i \Rightarrow e_i}} \\
&=& \dang{\caseof{(\caseof{a}{j \Rightarrow f_j^{(c)}})}{i \Rightarrow e_i}} \\
&=& \caseof{a}{j \Rightarrow\, \dang{\caseof{f_j^{(c)}}{i \Rightarrow e_i}}} \\
&=& \caseof{a}{j \Rightarrow (f_j^{(c)}[i \mapsto e_i])} \mbox{~~by the induction hypothesis} \\
&=& (\caseof{a}{j \Rightarrow f_j^{(c)}})[i \mapsto e_i] \\
&=& d^{(c+1)}[i \mapsto e_i] \;.
\end{eqnarray*}
Since $\dang{\!-\!}$ is continuous, the proposition follows by taking the supremum over $c$.
$\Box$
\end{proof}

\vspace*{1.0ex}
From this lemma we may now read off the operations on NSPs that correspond to clauses~4 and 5 
of Proposition~\ref{long-beta-eta-prop}(ii):

\begin{proposition}  \label{NSP-arith-prop}
(i) If $\Gamma \vdash M : \nat$ in ${\mathrm{PCF}}^\Omega$, then 
$\sem{C_f\,M}_\Gamma =  \sem{M}_\Gamma [i \mapsto f(i)]$
(understanding $f(i)$ to be $\bot$ when $i \not\in {\mathrm{dom}}\;f$); similarly for ${\mathit{suc}}$ and ${\mathit{pre}}$.

(ii) If $\Gamma \vdash M : \nat$, $\Gamma \vdash N : \nat$ and $\Gamma \vdash P : \nat$, then
$\sem{{\mathit{ifzero}}\;M\;N\;P}_\Gamma = \sem{M}_\Gamma [0 \mapsto d,\, i+1 \mapsto e]$
where $\sem{N}_\Gamma = \lambda.d$ and $\sem{P}_\Gamma = \lambda.e$.
\end{proposition}

\begin{proof}
(i) The definition of $\sem{-}$ yields 
\[ \sem{C_f\,M}_\Gamma ~=
   \dang{\lambda.\,\caseof{\sem{M}_\Gamma}{i \Rightarrow f(i)}} \, , \] 
and by Lemma~\ref{rightward-leaf-prop} this evaluates to $\sem{M}_\Gamma [i \mapsto f(i)]$.
Likewise for ${\mathit{suc}}$ and ${\mathit{pre}}$.

(ii) The definition of $\sem{-}$ yields 
\[ \sem{{\mathit{ifzero}}\;M\;N\;P}_\Gamma ~=~ 
   \dang{\lambda.\,\caseof{\sem{M}_\Gamma}{0 \Rightarrow d \mid i+1 \Rightarrow e}} \, , \] 
and by Lemma~\ref{rightward-leaf-prop} this evaluates to
$\sem{M}_\Gamma [0 \mapsto d,\, i+1 \mapsto e]$.
$\Box$
\end{proof}

\vspace*{1.0ex}
It remains to consider the formation rule involving $Y_\sigma$.
It will be convenient to regard the NSP for $Y M N_0 \ldots N_{r-1}$ as a result of
plugging some simpler NSPs together, in the sense indicated by the following definition.
Here and later, we shall follow the convention that Greek capitals $\Gamma,\Delta$ denote arbitrary
environments, while Roman capitals $Z,X,V$ denote lists of variables of type level $\leq k$.
(Of course, the idea of plugging can be formulated without any restrictions on types,
but we wish to emphasize at the outset that only pluggings at level $\leq k$
will feature in our proof.)

\begin{definition}[Plugging]  \label{plugging-def}
Suppose given the following data:
\begin{itemize}
\item a variable environment $\Gamma$,
\item a finite list $Z$ of `plugging variables' $z$ of level $\leq k$, 
disjoint from $\Gamma$,%
\item a \emph{root expression} $\Gamma,Z \vdash e$,
\item a substitution $\xi$ assigning to each $z^\sigma \in Z$ a procedure $\Gamma,Z \vdash \xi(z) : \sigma$.
\end{itemize}

In this situation, we define the \emph{($k$-)plugging} $\Pi_{\Gamma,Z}(e,\xi)$ 
(often abbreviated to $\Pi(e,\xi)$) 
to be the meta-term obtained from $e$ by repeatedly expanding variables $z \in Z$ to $\xi(z)$.
To formalize this, let $T^\circ$ denote the meta-term obtained from $T$ by replacing
each ground type subterm $z \vec{Q}$ (where $z \in Z$) by $\bot$.
We may now define, up to $\alpha$-equivalence,
\begin{eqnarray*} 
   \Pi^0(e,\xi) & = & e \;, \\
   \Pi^{m+1}(e,\xi) & = & \Pi^m(e,\xi)[z \mapsto \xi(z) \mbox{~for all $z \in Z$}] \;, \\
   \Pi(e,\xi) & = & \bigsqcup_m \Pi^m(e,\xi)^\circ \;, 
\end{eqnarray*}
where $\bigsqcup$ denotes supremum with respect to the syntactic order on meta-terms.
\end{definition}

It is easy to see that $\Pi_{\Gamma,Z}(e,\xi)$ is well-typed in environment $\Gamma$.
Note that some renaming of bound variables will typically be necessary
in order to realize $\Pi_{\Gamma,Z}(e,\xi)$ as a concrete term conforming to the no-variable-hiding condition;
we will not need to fix on any one particular way of doing this.

The operation on NSPs corresponding to clause~6 of Proposition~\ref{long-beta-eta-prop}(ii)
may now be described as follows:

\begin{proposition}  \label{NSP-Y-prop}
Suppose that $\sigma = \sigma_0,\ldots,\sigma_{r-1} \rightarrow \nat$ is of level $\leq k$
and that $\Gamma \vdash Y_\sigma M N_0 \ldots N_{r-1}$ in ${\mathrm{PCF}}^\Omega_k$,
where $\sem{M}_\Gamma = \lambda z^\sigma.p = \lambda z^{\sigma} x_0^{\sigma_0} \cdots x_{r-1}^{\sigma_{r-1}}.\,e$ 
and $\sem{N_i}_\Gamma = q_i$ for each $i$.
Then 
\[ \sem{Y_\sigma M N_0 \ldots N_{r-1}}_\Gamma ~=~ \lambda. \dang{\Pi_{\Gamma,Z}(e,\xi)}
\]
where $Z = z,x_0,\ldots,x_{r-1}$, $\xi(z) = p$, and
$\xi(x_i) = q_i$ for each $i$.
\end{proposition}

\vspace*{1.0ex}
\begin{proof}
Note that in this instance of plugging, the repeated substitutions are needed only for the sake
of the term $\xi(z)$ which may contain $z$ free---only a single substitution step is needed for the
plugging variables $x_i$, since the $q_i$ contain no free variables from $Z$.
We may thus rewrite $\Pi_{\Gamma,Z}(e,\xi)$ as 
\[ \Pi_{\Gamma,\vec{x},Z'}(e,\xi')\;[\vec{x}\mapsto{q}] \,, \] 
where $Z'=\{z\}$ and $\xi'(z)=p$.
The proposition will therefore follow easily (with the help of Theorem~\ref{eval-thm}) once we know that
\[ \sem{Y_\sigma M}_\Gamma ~=~ 
   \lambda \vec{x}.\, \dang{\Pi_{\Gamma,\vec{x},Z'}(e,\xi')} \;. \]
To see this, write $Y_\sigma = \lambda g.F_\sigma[g]$ where 
$F_\sigma[g] = \lambda \vec{x}.\,\caseof{g\,(F_\sigma[g])\,\vec{x}^{\,\eta}}{i \Rightarrow i}$ 
as at the start of Section~\ref{subsec-interp}.
Then clearly 
\[ \sem{Y_\sigma M}_\Gamma ~=~ (\lambda g. F_\sigma[g]) \cdot (\lambda z.p) ~=~
   \dang{F_\sigma[\lambda z.p]}  \;. \]
Here the meta-term $F_\sigma[\lambda z.p]$ is specified corecursively (up to $\alpha$-equivalence) by
\[ F_\sigma[\lambda z.p] ~=~ \lambda \vec{x}.\,\caseof{(\lambda z. p)\,(F_\sigma[\lambda z.p])\,\vec{x}^{\,\eta}}{i \Rightarrow i} \]
That is, $F_\sigma[\lambda z.p]$ coincides with the meta-term $G = \bigsqcup_m G^m$, where
\[  G^0 ~=~ \bot_\sigma \;, ~~~~~~
    G^{m+1} ~=~ \lambda \vec{x}.\,\caseof{(\lambda z.p)\,G^m\,\vec{x}^{\,\eta}}{i \Rightarrow i} \;. \]
We may now compare this with the meta-term $H = \bigsqcup_m H^m$, where
\[  H^0 ~=~ \bot_\sigma \;, ~~~~~~
    H^{m+1} ~=~ \lambda \vec{x}.\, e [z \mapsto H^m] \;. \]
Noting that $(\lambda z.p)\,G^m\,\vec{x}^{\,\eta} \rightsquigarrow e[z \mapsto G^m,\,\vec{x} \mapsto \vec{x}^{\,\eta}]$,
we have by Lemmas~\ref{eta-properties} and \ref{rightward-leaf-prop} that
\[ \dang{G^{m+1}} ~=~ \dang{\lambda \vec{x}.\, e[z \mapsto G^m]}  \]
whence by Theorem~\ref{eval-thm} and an easy induction we have $\dang{G^m} \,=\, \dang{H^m}$ 
for all $m$. Hence $\dang{\!G\!} \,=\, \dang{\!H\!}$.

Moreover, it is immediate from the definition that $H$ coincides with the meta-term
$\lambda \vec{x}.\, \Pi_{\Gamma,\vec{x},Z'}(e,\xi')$ mentioned earlier.
We thus have
\[ \sem{Y_\sigma M}_\Gamma ~=~ \dang{F_\sigma[\lambda z.p]} ~=~ \dang{\!G\!} ~=~ \dang{\!H\!} 
   ~=~ \lambda \vec{x}.\, \dang{\Pi_{\Gamma,\vec{x},Z'}(e,\xi')} \]
and the proof is complete.
(We have glossed over some fine details of variable renaming here, but these are easily attended to.)
$\Box$
\end{proof}

\vspace*{1.0ex}
Combining Propositions~\ref{NSP-nf-prop}, \ref{NSP-arith-prop} and \ref{NSP-Y-prop} 
with Proposition~\ref{long-beta-eta-prop}, the results of this section may be summarized as follows.

\begin{theorem}   \label{denotable-inductive-thm}
The class of ${\mathrm{PCF}}^\Omega_k$-denotable procedures-in-environment $\Gamma \vdash p$
is the class generated inductively by the following rules:
\begin{enumerate}
\item If $\Gamma \vdash q_i$ is denotable for each $i<r$ and $x \in \Gamma$, then
\[ \Gamma \vdash \lambda.\,\caseof{x q_0 \ldots q_{r-1}}{j \Rightarrow j} \] is denotable.
\item If $\Gamma,x \vdash p$ is denotable, then $\Gamma \vdash \lambda x.p$ is denotable.
\item Each $\Gamma \vdash \lambda.n$ is denotable.
\item If $\Gamma \vdash p$ is denotable and $f : \mathbb{N} \rightharpoonup \mathbb{N}$, then
$\Gamma \vdash p[i \mapsto f(i)]$ is denotable. (The constructions for ${\mathit{suc}}$ and ${\mathit{pre}}$
are special cases of this).
\item If $\Gamma \vdash p$,  $\Gamma \vdash \lambda.d$ and $\Gamma \vdash \lambda.e$ are denotable,
then $\Gamma \vdash p[0 \mapsto d, i+1 \mapsto e]$ is denotable.
\item If $\Gamma \vdash \lambda z^\sigma x_0^{\sigma_0} \cdots x_{r-1}^{\sigma_{r-1}}.e$ is denotable
where $\sigma = \sigma_0,\ldots,\sigma_{r-1}\rightarrow\nat$ is of level $\leq k$,
and $\Gamma \vdash q_i:\sigma_i$ is denotable for each $i<r$, then
\[ \Gamma \vdash \lambda. \dang{\Pi_{\Gamma,Z}(e,\xi)} \] 
is denotable, where $Z = z,\vec{x}$ is disjoint from $\Gamma$, 
$\xi(z) = \lambda \vec{x}.e$, and $\xi(x_i)=q_i$ for each $i$.
\end{enumerate}
\end{theorem}


To conclude this section,
we introduce a useful constraint on NSPs which, although not satisfied by all ${\mathrm{PCF}}^\Omega_k$-denotable
procedures, will hold for all those that we will need to consider in the course of our main proofs.
As we shall see, this constraint will interact well with the inductive rules just presented.

Referring back to the examples in Section~\ref{sec-intro}, we see that the recursive definitions of both $Y_{k+1}$ and $\Phi_{k+1}$ involved a variable $g$ of type level $k+2$.
It is therefore natural that our analysis will involve the consideration of terms in which such a variable $g$
appears free. However, it will turn out that apart from this one designated variable, our terms need never
involve any other variables of level $>k$, and this has a pleasant simplifying effect on our arguments.
This motivates the following definition:


\begin{definition}  \label{regular-def}
Suppose $g$ is a variable of type level $k+2$. 

(i) An environment $\Gamma$ is \emph{($g$-)regular}
if $\Gamma$ contains $g$ but all other variables in $\Gamma$ are of type level $\leq k$.

(ii) A meta-term $T$ is regular if all free and bound variables within $T$ are of level $\leq k$,
except possibly for free occurrences of $g$.

(iii) A meta-term-in-environment $\Gamma \vdash T$ is regular if both $\Gamma, T$ are regular.
\end{definition}

There is a useful alternative characterization of regularity in the case of normal forms:

\begin{proposition}  \label{regular-term-prop}
A term-in-environment $\Gamma \vdash t$ is regular iff $\Gamma$ is regular and $t$ is
not a procedure of type level $\geq k+2$.
\end{proposition}

\begin{proof}
The left-to-right implication is trivial, since a procedure of level $\geq k+2$ would have the form
$\lambda \vec{x}.\cdots$ where at least one of the $x_i$ was of level $\geq k+1$.
For the converse, suppose $\Gamma$ is regular and $t$ is not a procedure of level $\geq k+2$.
Then $t$ contains no free variables of level $\geq k$ other than $g$, so we just need to show
that all variables bound by a $\lambda$-abstraction within $t$ are of level $\leq k$.
Suppose not, and suppose that $\lambda \vec{x}.e$ is some outermost subterm of $t$
with ${\mathrm{lv}}(\vec{x}) > k$. Then $\lambda \vec{x}.e$ cannot be the whole of $t$, 
since $t$ would then be a procedure of level $>k+1$. 
Since $t$ is a normal form, the subterm $\lambda \vec{x}.e$ (of level $>k+1$)
must therefore occur as an argument to some variable $w$
of level $> k+2$. But this is impossible, since $\Gamma$ contains no
such variables, nor can such a $w$ be bound within $t$, since the relevant subterm
$\lambda \vec{w}.d$ would then properly contain $\lambda \vec{x}.e$, contradicting
the choice of the latter.
$\Box$
\end{proof}

\vspace*{1.0ex}
Let us now consider how the inductive clauses of Theorem~\ref{denotable-inductive-thm} may be used to 
generate \emph{regular} procedures-in-environment $\Gamma \vdash t$.
The following gives a useful property of derivations involving these clauses:

\begin{proposition}  \label{regular-generation-prop}
If $\Gamma \vdash t$ is regular and ${\mathrm{PCF}}^\Omega_k$-denotable,
then any inductive generation of the denotability of $\Gamma \vdash t$ via the clauses of
Theorem~\ref{denotable-inductive-thm} will consist entirely of regular procedures-in-environment.
\end{proposition}

\begin{proof}
It suffices to observe that for each of the six inductive clauses (regarded as rules), if the conclusion
is a regular procedure-in-environment then so are each of the premises.
For clause 1, this is clearly the case because the $q_i$ are subterms of the procedure in the conclusion.
For clause 2, we note that if $\lambda x.p$ is regular then so is $p$, and moreover $x$ has
level $\leq k$ so that $\Gamma,x$ is regular.
Clauses 3 and 4 are trivially handled.
For clause 5, we not that if $\Gamma \vdash p[0 \vdash d, i+1 \mapsto e]$ is regular
then $\Gamma \vdash p$, $\Gamma \vdash \lambda.d$ and $\Gamma \vdash \lambda.e$
are immediately regular by Proposition~\ref{regular-term-prop}
(regardless of whether any leaves $0$ or $i+1$ appear in $p$).
Likewise, for clause 6, we note that under the given hypotheses, both 
$\lambda z \vec{x}.e$ and each $q_i$ are of level $\leq k+1$; hence if $\Gamma$ is regular
then immediately $\Gamma \vdash \lambda z \vec{x}.e$ and $\Gamma \vdash q_i$ are
regular by Proposition~\ref{regular-term-prop}.
$\Box$
\end{proof}


\vspace*{1.0ex}
In particular, let us consider again the construction of the procedure $Y_{k+1}$ as
$\lambda g.F_{k+1}[g]$, where
\[ F_{k+1}[g] ~=~ \lambda x^k.\;\caseof{g\,(F_{k+1}[g])\,x^\eta}{i \Rightarrow i} \;. \]
It is clear by inspection that $g \vdash F_{k+1}[g]$ is regular;
hence, if it were ${\mathrm{PCF}}^\Omega_k$-denotable, then Proposition~\ref{regular-generation-prop} would apply.
We shall show, however, that a purely regular derivation via the clauses of 
Theorem~\ref{denotable-inductive-thm} cannot generate `spinal' terms such as $F_{k+1}[g]$;
hence $g \vdash F_{k+1}[g]$ is not ${\mathrm{PCF}}^\Omega_k$-denotable.
This will immediately imply that $Y_{k+1}$ itself is not ${\mathrm{PCF}}^\Omega_k$-denotable
(Theorem~\ref{SP0-main-thm}),
since the only means of generating non-nullary $\lambda$-abstractions is via clause 2 of Theorem~\ref{denotable-inductive-thm}.


\section{${\mathrm{PCF}}^\Omega_k$-denotable procedures are non-spinal}  \label{sec-Y-k+1}

In this section, we will introduce the crucial notion of a \emph{($k\!+\!1$-)spinal term}, and
will show that the clauses of Theorem~\ref{denotable-inductive-thm} (in the regular case)
are unable to generate spinal terms from non-spinal ones. 
Since the procedure $F_{k+1}[g]$ will be easily seen to be spinal, this will 
establish Theorem~\ref{SP0-main-thm}.

More specifically, we will actually introduce the notion of a \emph{$g$-spinal term}, 
where $g$ is a free variable which we treat as fixed throughout our discussion.
We shall do this first for the case 
\[ g ~:~ (k+1) \rightarrow (k+1) \] 
as appropriate for the analysis of $F_{k+1}[g]$ and hence of $Y_{k+1}$.
Later we will also consider a minor variation for $g$ of type $\nat \rightarrow (k+1) \rightarrow (k+1)$,
as appropriate to the definition of $\Phi_{k+1}$ given in Section~\ref{sec-intro}.
In both cases, we shall be able to dispense with the term `$k\!+\!1$-spinal', since the type level $k+1$ 
may be read off from the type of $g$.

Some initial intuition for the concept of spinality was given in Section~\ref{sec-intro}.
We now attempt to provide some further motivation by examining a little more closely
the crucial difference between $Y_{k+1}$ and $Y_k$ (say) that we are trying to capture.

The most obvious difference between these procedures is that
$Y_{k+1}$ involves an infinite sequence of nested calls to a variable $g: (k+1) \rightarrow (k+1)$,
whereas $Y_k$ does not.
One's first thought might therefore be to try and show that no procedure
involving an infinite nesting of this kind can be constructed using the means at our disposal
corresponding to ${\mathrm{PCF}}^\Omega_k$ terms.

As it stands, however, this is not the case.
Suppose, for example, that ${\mathit{up}}_k: {k} \rightarrow {k+1}$ and
${\mathit{down}}_k: {k+1} \rightarrow {k}$ are ${\mathrm{PCF}}_0$ terms defining a
standard retraction ${k} \lhd {k+1}$. 
More specifically, let us inductively define
\[ \begin{array}{rclrcl}
{\mathit{up}}_0   & = & \lambda x^0.\lambda z^0.x \;, &
{\mathit{down}}_0 & = & \lambda y^1.\,y\,\num{0} \;, \\
{\mathit{up}}_{k+1}   & = & \lambda x^{k+1}.\lambda z^{k+1}.\,x({\mathit{down}}_k\;z) \;, &
{\mathit{down}}_{k+1} & = & \lambda y^{k+2}.\lambda w^{k}.\,y({\mathit{up}}_k\;w) \;.
\end{array} \]
Now consider the ${\mathrm{PCF}}_k$ program 
\[ Z_{k+1} ~=~ 
   \lambda g: (k+1) \rightarrow (k+1).\; {\mathit{up}}_k\,(Y_k\,({\mathit{down}}_k \circ g \circ {\mathit{up}}_k)) \;. \]
This is essentially just a representation of $Y_k$ modulo our encoding of type $k$ in type $k+1$.
A simple calculation shows that the NSPs for $Y_{k+1}$ and $Z_{k+1}$ are
superficially very similar in form, both involving an infinite sequence
of nested calls to $g: (k+1) \rightarrow (k+1)$.
(These NSPs are shown schematically in Figure~1 for the case $k=2$.)
We will therefore need to identify some more subtle property of NSPs that differentiates
between $Y_{k+1}$ and $Z_{k+1}$.

\begin{figure}  \label{Y_Z_NSPs}
\begin{center}
\includegraphics[scale=.60]{Y_Z_NSPs.pdf}
\end{center}
\caption{The NSPs for $Y_3$ and $Z_3$. Here $\lambda.z$ abbreviates
$\lambda.\,\caseof{z}{i \Rightarrow i}$.}
\vspace*{1.0ex}
\end{figure}

The intuitive idea will be that in the NSP for $Z_{k+1}$, the full potency of 
$g$ as a variable of type ${k+1} \rightarrow {k+1}$ is not exploited,
since both the input and output of $g$ are `funnelled' through the simpler type ${k}$. 
Such funnelling will inevitably entail some loss of information, 
as Theorem~\ref{no-retraction-thm} tells us that the type ${k}$ cannot
fully represent the structure of the type ${k+1}$.
A useful mental picture here is that of a $(k+1)$-dimensional space being `flattened' down to a $k$-dimensional one.
Broadly speaking, then, we shall want to define a $g$-spinal term to be one containing an infinite sequence
of nested calls to $g$ but with no essential `flattening' of the arguments.
It will then be the case that $Y_{k+1}$ is $g$-spinal, but $Z_{k+1}$ is not.

We now approach the formal definition of a $g$-spinal term, 
generalizing the structure exhibited by the terms $F_{k+1}[g]$.
To get our bearings, let us examine the form of these terms one more time.
Note that $F_{k+1}[g]$ has the form $\lambda x^k. H[g,x]$, where
\[  H[g,x] ~=~ \caseof{g(\lambda x'.H[g,x']) x^\eta}{\cdots} \;. \]
Spinal expressions of this kind, in which the topmost $g$ of the spine appears at the very head of
the expression, will be referred to as \emph{head-spinal}.
In fact, we shall say that $H[g,x]$ is head-spinal with respect to the variable $x$, since as noted above,
it is significant here that $x$ is passed to $g$ with no `flattening' (in the form of the procedure $x^\eta$).
As a first attempt, then, one might hazard that we should define a concept of head-spinality relative to
a type $k$ variable coinductively as follows: an expression $e$ is $g$-head-spinal w.r.t.\ $x$ if it is of the
form
\[ \caseof{g(\lambda x'.e')x^\eta}{\cdots} \]
where $e'$ is itself $g$-head-spinal w.r.t.\ $x$.

In fact, in order for the set of non-spinal terms to have appropriate closure properties,
we shall need to relax this definition in two ways.
Firstly, we allow $\lambda x'.e'$ to be replaced by $\lambda x'.E[e']$ where $E[-]$ is any expression context:
that is, we allow the head-spinal subterm $e'$ to appear at positions other than the head of this procedure.
Secondly, we allow $x^\eta$ to be replaced by a procedure term $o$ that can be \emph{specialized} to
(something close to) $x^\eta$: the intuition is that any such $o$ will embody the whole content of $x$ 
with no flattening.

This leads us, at last, to the following definition. Note that this makes reference to the technical notion of
an \emph{$x,V$-closed substitution}, the explanation of which we shall defer to Definition~\ref{xV-closed-def} below. This entails that the notion of head-spinality needs to be defined relative to a certain set $V$ of
variables as well as a type $k$ variable $x$.
We shall adopt the convention that any environment denoted by $\Gamma$ will be $g$-regular
(and hence will contain $g$); recall that Roman letters such as $V,X,Z$ 
always denote lists of variables of level $\leq k$ (which may also contribute to the
environments we consider).


\begin{definition}[Spinal terms]  \label{spinal-def}
Suppose $g$ has type $(k+1) \rightarrow (k+1)$.
Suppose $\Gamma \vdash e$ is $g$-regular, 
and that $x^k \in \Gamma$ and $V \subseteq \Gamma$.

(i) In this situation, we coinductively declare $e$ to be \emph{$g$-head-spinal with respect to $x,V$} 
iff $e$ has the form
\[  \caseof{g(\lambda {x'}. E[e']) o}{\cdots} \]
where $E[-]$ is an expression context, and
\begin{enumerate}
\item for some $x,\!V$-closed substitution $^\circ$ covering the free variables of $o$ other than $x$,
we have $o^\circ \succeq x^\eta$,
\item $e'$ is $g$-head-spinal with respect to $x',V'$, where $V'$ is the local variable environment for $E[-]$.
(Clearly $e'$ will automatically be $g$-regular in some $\Gamma'$ that contains both $x'$ and $V'$.)
\end{enumerate}
In other words, we take `$e$ is $g$-head-spinal w.r.t.\ $x,V$' to be the largest relation
that satisfies the above statement.

(ii) In the above setting, we may also refer to the application $g(\lambda {x'}. E[e']) o$ itself
as $g$-head-spinal w.r.t.\ $x,V$. 

(iii) We say a term $t$ is \emph{$g$-spinal} if it contains a subexpression that is $g$-head-spinal
w.r.t.\ some $x,V$.
\end{definition}

Whilst this definition makes use of local variable environments which in principle pertain to concrete terms,
it is easily seen that the notion of $g$-spinal term is $\alpha$-invariant.
Since we are taking $g$ to be fixed throughout the discussion, we will usually omit mention of it
and speak simply of spinal and head-spinal terms, and of regular (meta-)terms and environments.

In condition~1, one might have expected to see
$o^\circ \approx x^\eta$, but it turns that the argument goes through most smoothly
with $\succeq$ in place of $\approx$.
In Appendix~A we will see that $o^\circ \succeq x^\eta$
is actually equivalent to $o^\circ \approx x^\eta$, 
although this is somewhat non-trivial to show and is not needed for our main proof.

It remains to define the notion of an $x,V$-closed substitution.
Suppose that $^\circ = [\vec{w} \mapsto \vec{r}\,]$ is some substitution proposed for use in
condition 1 of Definition~\ref{spinal-def}(i).
Since we are wishing to compare $o^\circ$ with $x^\eta$, it is natural to require that the $\vec{r}$
contain no free variables other than $x$.
However, what we want to ensure here is intuitively that the whole unflattened content of $x$ is present
in $o$ itself rather than simply being introduced by the substitution.
This can be ensured if we allow $x$ as a free variable only in procedures $r_i$ of type level $<k$:
such procedures can only introduce `flattened' images of $x$, since the $x$ is here being funnelled through
a type of level $\leq k-1$.

For technical reasons, we furthermore need to restrict such occurrences of $x$ to those $r_i$
substituted for variables $w_i$ in a certain set $V$, 
which in practice will consist of variables
bound between one spinal occurrence of $g$ and the next 
(as can be seen from the specification of $V'$ above).
The necessity for the set $V$ is admittedly difficult to motivate at this point:
it is simply what the details of the proof seem to demand
(see the last page of the proof of Lemma~\ref{g-lemma-2}).

\begin{definition}  \label{xV-closed-def}
If $x$ is a variable of type $k$ and $V$ a set of variables, 
a substitution $^\circ = [\vec{w} \mapsto \vec{r}\,]$ is called
\emph{$x,\!V$-closed} if the $r_i$ contain no free variables, except that if $w_i \in V$
and ${\mathrm{lv}}(w_i)<k$ then $r_i$ may contain $x$ free.
\end{definition}

It is worth remarking that if we were only interested in
showing the non-definability of $Y_{k+1}$ as an element of ${\mathsf{SP}}^0$, one
could do without the notion of $x,\!V$-closedness altogether, 
and more simply require in Definition~\ref{spinal-def} that $^\circ$ is closed 
(and moreover that $\dang{\!o^\circ\!}\, = x^\eta$ on the nose).
The weaker definition we have given is designed with the proof of non-definability 
in ${\mathsf{SF}}$ in mind: we will be able to show in Section~\ref{sec-extensional} that every (simple) procedure 
representing the functional $\Phi_{k+1} \in {\mathsf{SF}}$ is spinal in this weaker sense.%
\footnote{It can be shown using Theorem~\ref{no-retraction-thm} that if $o^\circ \succeq x^\eta$
where $^\circ$ is $x,\!V$-closed, then at least one $x$ in $\dang{\!o^\circ\!}$ must originate from
$o$ rather than from $^\circ$.
We have not actually settled the question of whether there are procedures $o$ such that
$o^\circ \succeq x^\eta$ for some $x,\!V$-closed $^\circ$ but not for any closed $^\circ$;
fortunately this is not necessary for the purpose of our proof.}


We now digress briefly to explain the small modification of this machinery 
that we will need in Section~\ref{sec-extensional}.
Since our purpose there will be to analyse the functional $\Phi_{k+1}$ which we defined
in Section~\ref{sec-intro}, we shall be working in a setup in which
the global variable $g$ has the slightly different type ${0}\rightarrow(k+1)\rightarrow(k+1)$.
In this setting, we may vary the above definition
by coinductively declaring $e$ to be $g$-head-spinal w.r.t.\ $x,V$ iff $e$ has the form
\[  \caseof{gb(\lambda x'. E[e']) o}{\cdots} \]
where $b$ is a procedure term of type ${0}$ and conditions~1 and 2 above are also satisfied.
Subject to this adjustment, all the results and proofs of the present section go through
in this modified setting, with the extra argument $b$ playing no active role.
For the remainder of this section, we shall work with
a global variable $g$ of the simpler type $(k+1) \rightarrow (k+1)$, on the understanding that
the extra arguments $b$ can be inserted where needed to make formal
sense of the material in the modified setting.
We do not expect that any confusion will arise from this.

Clearly $g \vdash F_{k+1}[g]$ is spinal. 
The main result of this section will be that every ${\mathrm{PCF}}^\Omega_k$-denotable procedure
$\Gamma \vdash p$ is non-spinal (Theorem~\ref{no-gremlin-thm}).
We shall establish this by induction on the generation of denotable terms
as in Theorem~\ref{denotable-inductive-thm}, the only challenging case being the one for rule~6,
which involves plugging.
Here we require some technical machinery whose purpose is to show that 
if the result of a plugging operation is spinal,
then a spinal structure must already have been present in one of the components of the plugging:
there is no way to `assemble' a spinal structure from material in non-spinal fragments.

The core of the proof will consist of some lemmas developing the machinery 
necessary for tackling rule 6.
We start with some technical but essentially straightforward facts concerning evaluation and 
the tracking of subterms and variable substitutions. 

\begin{lemma} \label{g-lemma-1}
Suppose that
\[ \Gamma \;\vdash\; \dang{K[d]} \;=_\alpha\; K'[c] \]
where $K[-],K'[-]$ are concrete meta-term contexts with local environments $\vec{v},\vec{v}\,'$ respectively,
and $\Gamma,\vec{v} \vdash d = \caseof{gpq}{\cdots}$,
$\Gamma,\vec{v}\,' \vdash c = \caseof{gp'q'}{\cdots}$ are concrete expressions.
Suppose also that:
\begin{enumerate}
\item $\Gamma \vdash K[d]$ is regular;
\item in the evaluation above, the head $g$ of $c$ originates from that of $d$.
\end{enumerate}
Then:

(i) There is a substitution $^\dag = [\vec{v}\mapsto\vec{s}\,]$ of level $\leq k$
arising from the $\beta$-reductions in the above evaluation,
with $\Gamma,\vec{v}\,' \vdash \vec{s}$ regular, such that
$\Gamma,\vec{v}\,' \vdash gp'q' =_\alpha \,\dang{\!(gpq)^\dag\!}$,
whence $\dang{\!d^\dag\!}$ is of form $\caseof{gp'q'}{\cdots}$ up to $=_\alpha$.%
\footnote{Note that although both $\dang{\!d^\dag\!}$ and $c$ have the form $\caseof{gp'q'}{\cdots}$,
they will in general have different case-branches, for instance when $K[-]$ is of the form
$\caseof{-}{\cdots}$.}

(ii) If furthermore 
$c$ is head-spinal w.r.t.\ some $x,V$,
then also $\dang{d^\dag}$ is head-spinal w.r.t.\ $x,V$.

(iii) If $K[-]$ contains no $\beta$-redexes $P \vec{Q}$ with $P$ of type level $k+1$,
then $^\dag$ is \emph{trivial for level $k$ variables}: 
that is, there is an injection $\iota$ mapping each level $k$ variable $v_i \in \vec{v}$ to
a variable $\iota(v_i) \in \vec{v}\,'$ such that $s_i = \iota(v_i)^\eta$.
\end{lemma}

In reference to part~(iii), recall that substitutions $v \mapsto v^\eta$ have no effect on the meaning
of a term, as established by Lemma~\ref{eta-properties}.
Note that the environments $\vec{v},\vec{v}\,'$, and hence the injection $\iota$, will in general
depend on the concrete choice of $K[d]$ and $K'[c]$. However, for the purpose of proving the theorem,
it is clearly harmless to assume that $K'[c]$ is, on the nose, the concrete term obtained by evaluating $K[d]$.
In this case, we will see from the proof below that each $\iota(v_i)$ will be either $v_i$ itself or a renaming
of $v_i$ arising from the evaluation.

\vspace*{1.0ex}
\begin{proof}
(i) We first formulate a suitable property of terms that is preserved under all individual reduction steps.
Let $K[-],d,p,q$ and $\vec{v}$ be fixed as above, and suppose that
\[ K^0[\caseof{gP^0Q^0}{\cdots}] ~\rightsquigarrow~ K^1[\caseof{gP^1Q^1}{\cdots}] \]
via a single reduction step,
where the $g$ on the right originates from the one on the left, 
and moreover $K^0,P^0,Q^0$ enjoy the following properties 
(we write $\vec{v}\,^0$ for the local environment for $K^0[-]$):
\begin{enumerate}
\item $\Gamma \vdash K^0[\caseof{gP^0Q^0}{\cdots}]$ is regular.
\item There exists a substitution $^{\dag0} = [\vec{v}\mapsto\vec{s}\,^0]$
(with $\Gamma,\vec{v}\,^0 \vdash \vec{s}\,^0$ regular) 
such that $\dang{gP^0Q^0} \,=_\alpha\, \dang{(gpq)^{\dag 0}}$.
\end{enumerate}

We claim that $K^1,P^1,Q^1$ enjoy these same properties w.r.t.\ the
local environment $\vec{v}\,^1$ for $K^1[-]$.
For property~1, clearly $K^1[\caseof{gP^1Q^1}{\cdots}]$ cannot contain
variables of level $>k$ other than $g$, because $K^0[\caseof{gP^0Q^0}{\cdots}]$ does not.
For property~2, we define the required substitution $^{\dag1} = [\vec{v}\mapsto\vec{s}\,^1]$
by cases according to the nature of the reduction step:
\begin{itemize}
\item If the subexpression $\caseof{gP^0Q^0}{\cdots}$ is unaffected by the
reduction (so that $P^0=P^1$ and $Q^0=Q^1$),
or if the reduction  is internal to $P^0,Q^0$ or to the rightward portion $(\cdots)$,
or if the reduction has the form
\begin{eqnarray*}
\caseof{(\caseof{gP^0Q^0}{i \Rightarrow E^0_i})}{j \Rightarrow F_j} & \rightsquigarrow & \\
\caseof{gP^0Q^0}{i \Rightarrow \caseof{E^0_i}{j \Rightarrow F_j}} 
\end{eqnarray*}
then the conclusion is immediate, noting that $\vec{v}\,^1 = \vec{v}\,^0$ 
and taking $^{\dag 1}=\,^{\dag 0}$.
\item If the reduction is for a $\beta$-redex $(\lambda \vec{x}.E)\vec{R}$
where the indicated subexpression $\caseof{gP^0Q^0}{\cdots}$ lies within some $R_i$,
we may again take $^{\dag 1}$ to be $^{\dag 0}$, with obvious adjustments to compensate for
any renaming of bound variables within $R_i$ or $\vec{s}\,^0$.
In this case $\vec{v}\,^1$ may contain more variables than $\vec{v}\,^0$,
but we will still have that $\Gamma,\vec{v}\,^1 \vdash \vec{s}\,^1$
once these renamings have been effected.
\item If the reduction is for a $\beta$-redex $(\lambda \vec{x}.E)\vec{R}$
where $\caseof{gP^0Q^0}{\cdots}$ lies within $E$,
then $P^1 = P^0[\vec{x} \mapsto \vec{R}\,']$ and $Q^1 = Q^0[\vec{x} \mapsto \vec{R}\,']$
for some $\vec{R}\,' =_\alpha \vec{R}$.
In this case, the local environment $\vec{v}\,^1$ for $K^1[-]$ will be $\vec{v}\,^0 - \vec{x}$
(perhaps modulo renamings of the $v^0_i$),
so that the conclusion follows if we take $^{\dag 1} = [\vec{v}\mapsto\vec{s}\,^1]$ 
where $s_i^1 = \dang{s_i^0 [\vec{x}\mapsto\vec{R}\,']}$ for each $i$
(modulo the same renamings).
Note here that $\Gamma,\vec{v}\,^1 \vdash \vec{s}\,^1$ is regular since $\vec{R}$ is regular
by condition~1 of the hypothesis.
\end{itemize}

Now in the situation of the lemma we will have some 
finite reduction sequence
\[  K[\caseof{gpq}{\cdots}] ~\rightsquigarrow^*~ K''[\caseof{gP'Q'}{\cdots}] \;, \]
where, intuitively, $K''[-]$ is fully evaluated down as far as the hole.
More formally, there is a finite normal-form context $t[-] \sqsubseteq K''[-]$
containing the hole in $K''[-]$ such that $t[-] \sqsubseteq K'[-]$;
from this we may also see that $\dang{\!P'\!}\,=p'$, $\dang{\!Q'\!}\,=q'$
and $\dang{\!K''[-]\!}\,=K'[-]$. 
Moreover, we now see that $K,p,q$ themselves trivially satisfy the above invariants
if we take $^\dag = [\vec{v} \mapsto \vec{v}\,^\eta]$ (Lemma~\ref{eta-properties} is used here).
We therefore infer by iterating the argument above that $K'',P',Q'$ also satisfy these invariants with respect to 
some $^\dag = [\vec{v} \mapsto \vec{s}\,]$ with $\Gamma,\vec{v}\,' \vdash \vec{s}$ regular.
(The environment $\Gamma,\vec{v}\,'$ is correct here, as $K'[-],K''[-]$ have the same
local environment.)
We now have $gp'q' =\, \dang{gP'Q'} \,=_\alpha\, \dang{(gpq)^\dag}$.
That the $\vec{v}$ are of level $\leq k$ is automatic, because $K[d]$ is regular.
It also follows immediately that $\dang{\!d^\dag\!}$ has the stated form.

(ii) If $c$ is head-spinal w.r.t.\ $x,V$, then we see from Definition~\ref{spinal-def} that $gp'q'$ 
and hence $\dang{d^\dag} \,= \caseof{gp'q'}{\cdots}$ are head-spinal w.r.t.\ $x,V$.

(iii) From the proof of (i), we see that in the reduction of $K[\caseof{gpq}{\cdots}]$
to $K''[\caseof{gP'Q'}{\cdots}]$, any $v_i \in \vec{v}$ 
can be tracked through the local environments for the intermediate contexts $K^0[-],K^1[-],\ldots$
until (if ever) it is a substitution variable for a $\beta$-reduction.
For those $v_i$ that never serve as such a variable, it is clear from the construction that
$v_i$ gives rise to some variable $\iota(v_i) \in \vec{v}\,'$ (either $v_i$ itself or a renaming thereof),
and that $s_i = v_i^\dag = \iota(v_i)^\eta$.
We wish to show that all $v_i \in \vec{v}$ of level $k$ are in this category.

Recalling that $\vec{v}$ is the local environment for $K[-]$,
any $v_i \in \vec{v}$ of level $k$ is bound by the leading $\lambda$ of some
meta-procedure $P$ within $K[-]$ of level $k+1$. 
By hypothesis, this $P$ does not occur in operator position; nor can it occur
as an argument to another $\lambda$-abstraction within $K[-]$, 
since this would require a bound variable of level $\geq k+1$.
It must therefore occur as a level $k+1$ argument to $g$,
so that we have a subterm $g(\lambda v_i.E[-])\cdots$.
But this form of subterms is stable under reductions, since $g$ is a global variable;
it follows easily that this subterm has a residual $g(\lambda v_i'.E'[-])\cdots$ in each of the intermediate
reducts (where $v_i'$ is either $v_i$ or a renaming thereof),
and thus that $v_i$ and renamings thereof never serve as substitution variables for $\beta$-reductions.
$\Box$
\end{proof}

\vspace*{1.0ex}
Thus, in the setting of the above lemma, 
if $c$ is head-spinal then $d$ can be specialized and evaluated to yield a head-spinal term via the substitution
$[\vec{v} \mapsto \vec{s}\,]$.
However, we wish to show more, namely that in this setting, $d$ itself is already a spinal term,
so that the $\vec{s}$ make no essential contribution to the spinal structure.
(This will give what we need in order to show that $k$-pluggings cannot manufacture spinal terms
out of non-spinal ones.) 
This is shown by the next lemma, whose proof forms the
most complex and demanding part of the entire argument.
The main challenge will be to show that all the head-spinal occurrences of $g$ in
$\dang{d[\vec{v} \mapsto \vec{s}\,]}$ originate from $d$ rather than from $\vec{s}$.
The reader is advised that great care is needed as regards which variables can appear free where,
and for this reason we shall make a habit of explicitly recording the variable environment 
for practically every term or meta-term that we mention.

\begin{lemma}  \label{g-lemma-2}
Suppose we have regular terms
\[ \Gamma,\vec{v} \;\vdash~ d ~=~ \caseof{gpq}{\cdots} \;, ~~~~~~
   \Gamma,\vec{v}\,' \vdash \vec{s} \;,  ~~~~~~
   {\mathrm{lv}}(\vec{v}), {\mathrm{lv}}(\vec{v}\,') \leq k \;, \] 
where $\Gamma,\vec{v}\,' \vdash\, \dang{d[\vec{v}\mapsto\vec{s}\,]}$ 
is head-spinal with respect to some $x,V$.
Then $d$ itself is spinal.
\end{lemma}

\begin{proof}
We begin with some informal intuition.
The term $\dang{d[\vec{v}\mapsto\vec{s}\,]}$, being head-spinal, will be of the form
\[ \Gamma,\vec{v}\,' \;\vdash~ t ~=~ 
    \caseof{g\,(\lambda x'.\,E[\caseof{gF'o'}{\cdots}])\,o}{\cdots} \;, \]
where ${o'}^\circ \succeq {x'}^\eta$ for some $^\circ$ (and likewise for $o$ and $x$).
Here the head $g$ of $t$ clearly originates from that of $d$;
likewise, the $\lambda x'$ originates from the leading $\lambda$ of $p$ within $d$, 
rather than from $\vec{s}$.
Suppose, however, that the second displayed spinal occurrence of $g$ in $t$
originated from some $s_i$ rather than from $d$. 
In order to form the application of this $g$ to $o'$, 
the whole content of ${x'}^\eta$ would in effect need to be passed in to $s_i$
when $d$ and $\vec{s}$ are combined. But this is impossible, since the arguments to
$s_i$ are of level $<k$, so by Theorem~\ref{no-retraction-thm} 
we cannot funnel the whole of ${x'}^\eta$ through them:
that is, the interface between $d$ and $\vec{s}$ is too narrow 
for the necessary interaction to occur.
(The situation is made slightly more complex by the fact that 
some components of $^\circ$ might also involve $x'$, but the same idea applies.)
It follows that the second spinal $g$ in $t$ originates from $d$ after all.
By iterating this argument, we can deduce that all the spinal occurrences of $g$,
and indeed the entire spinal structure, comes from $d$.

We now proceed to the formal proof.
By renaming variables if necessary,
we may assume for clarity that the same variable is never bound in two places within the entire
list of terms $d,\vec{s}$,
and that all bound variables within $d,\vec{s}$ are distinct from those of $\vec{v}$ and $\vec{v}\,'$.

Let $^\dag = [\vec{v}\mapsto\vec{s}\,]$,
and let us write the subterm $p$ appearing within $d$ as $\Gamma,\vec{v} \vdash \lambda {x'}^k.e$,
where $\Gamma,\vec{v},x' \vdash e$ is regular. Then
\[ \dang{d^\dag} ~=~ \caseof{g\,(\lambda x'.\!\dang{e^\dag})\dang{q^\dag}}{\cdots} \;, \]
and since $\dang{\!d^\dag\!}$ is head-spinal by hypothesis,
$\dang{\!{e}^\dag\!}$ will be some term 
$\Gamma,x',\vec{v}\,' \vdash E[c]$, where
$\Gamma,x',\vec{v}\,',\vec{y}\,' \vdash c~=~ \caseof{gF'o'}{\cdots}$ is itself head-spinal 
with respect to $x'$ and $\vec{y}\,'$.
(Here $\vec{y}\,'$ denotes the local environment for $E[-]$, so that
$\Gamma,x',\vec{v}\,',\vec{y}\,'$ contains no repetitions.)
We will first show that the head $g$ of $c$ comes from $e$ rather than from $^\dag$;
we will later show that the same argument can be repeated for 
lower spinal occurrences of $g$.

\vspace*{1.0ex}
\emph{Claim 1: In the evaluation $\dang{\!e^\dag\!} \,= E[c]$, the head $g$ of $c$
originates from $e$.}

\vspace*{1.0ex}
\emph{Proof of Claim 1:} Suppose for contradiction that the head $g$ of $c$ originates from some
substituted occurrence of an $s_i$ within ${e}^\dag$, 
say as indicated by ${e}^\dag = D[s_i]$ and $s_i = L[d']$, 
where $\Gamma,x',\vec{v}\,' \vdash D[-]$, $\Gamma,\vec{v}\,' \vdash s_i$,
and $\Gamma,\vec{v}\,',\vec{z} \vdash d' = \caseof{gp'q'}{\cdots}$.
(Here $\vec{z}$ is the local variable environment for $L[-]$;
note that $\vec{z}$ is disjoint from $\Gamma,x',\vec{v}\,'$, but may well overlap with $\vec{y}\,'$.)
Then
\[ \Gamma,x',\vec{v}\,' \;\vdash~
   \dang{\!e^\dag\!} ~=~ \dang{D[L[d']]} ~=~ E[c] \;, \]
where the head $g$ in $d'$ is the origin of the head $g$ in $c$.
We will use this to show that a head-spinal term may be obtained from $d'$ 
via a substitution of level $< k$;
this will provide the bottleneck through which ${x'}^\eta$ is unable to pass.

We first note that the above situation satisfies the conditions of Lemma~\ref{g-lemma-1}, 
where we take the $\Gamma,K,d,K',c$ of the lemma 
to be respectively $(\Gamma,x',\vec{v}\,'),$ $D[L[-]],d',E,c$.
Condition~1 of the lemma holds because $\Gamma,\vec{v},x' \vdash e$ and 
$\Gamma,\vec{v}\,' \vdash \vec{s}$ are clearly regular, 
whence so is $\Gamma,x',\vec{v} \vdash e^\dag = D[L[d']]$;
condition~2 is immediate in the present setup.


We conclude that there is a substitution $[\vec{y}\mapsto\vec{t}\,]$ 
(called $[\vec{v}\mapsto\vec{s}\,]$ in the statement of Lemma~\ref{g-lemma-1})
with $\vec{y}$ the local environment for $D[L[-]]$ 
and $\Gamma,x',\vec{v}\,',\vec{y}\,' \vdash \vec{t}$
(recalling that $\vec{y}\,'$ are the local variables for $E[-]$),
such that $\dang{d'[\vec{y}\mapsto\vec{t}\,]}$ is head-spinal
and indeed of the form $\caseof{gF'o'}{\cdots}$ with $F',o'$ as above.
Furthermore, the only $\beta$-redexes in $e^\dag$ are those arising from the
substitution $^\dag$, with some $s_j$ of level $k$ as operator.
There are therefore no $\beta$-redexes in $e^\dag$ with a substitution variable of level $k$,
so by Lemma~\ref{g-lemma-1}(iii), the substitution $[\vec{y} \mapsto \vec{t}\,]$ 
is trivial for variables of level $k$.
Note also that $\vec{y}$ (the environment for $D[L[-]]$) subsumes $\vec{z}$
(the environment for $L[-]$); it is disjoint from $\Gamma,x,\vec{v}\,'$ but may well overlap with $\vec{y}\,'$.

Let us now split the substitution $[\vec{y} \mapsto \vec{t}\,]$ as
$[\vec{y}\,^+ \mapsto \vec{t}\,^+, \, \vec{y}\,^- \mapsto \vec{t}\,^-]$,
where $\vec{y}\,^+$ consists of the variables in $\vec{y}$ of level $k$, 
and $\vec{y}\,^-$ consists of those of level $<k$.
As we have noted, the substitution for $\vec{y}\,^+$ is trivial:
that is, there is a mapping associating with each $y_j \in \vec{y}\,^+$ a variable $\iota(y_j) \in \vec{y}\,'$
such that $t_j = \iota(y_j)^\eta$.


Taking stock, we have that
\begin{eqnarray*} 
   \Gamma,x',\vec{v}\,',\vec{y}\,'  & \vdash &
   \dang{d'[\vec{y} \mapsto \vec{t}\,]} ~=~ \caseof{gF'o'}{\cdots} \;,  \\
   \Gamma,\vec{v}\,',\vec{z} & \vdash & d' = \caseof{gp'q'}{\cdots} \;, \\
    \Gamma,x',\vec{v}\,',\vec{y}\,' & \vdash & \vec{t} \;,
\end{eqnarray*}
where $[\vec{y}\mapsto\vec{t}\,]$ is trivial for level $k$ variables, 
and $gF'o'$ is head-spinal w.r.t.\ $x,\vec{y}\,'$.
From this we may read off that
\[ \Gamma,x',\vec{v}\,',\vec{y}\,' \;\vdash\; \dang{q'[\vec{y} \mapsto \vec{t}\,]} ~=~ o' \;. \]
Since $\vec{y}$ subsumes $\vec{z}$,
we may henceforth regard $q'$ as a term in environment $\Gamma,\vec{v}\,',\vec{y}$.
(This is compatible with the no-variable-hiding condition:
our conventions ensure that the variables of $\vec{y}-\vec{z}$ come from $d$ rather than $s_i$
and so do not appear bound in $q'$.)
We may harmlessly write $q'[\vec{y} \mapsto \vec{t}\,]$ as above, 
even though there are variables of $\vec{y}$ that cannot appear in $q'$.

Since $x'$ does not occur free in $q'$, each free occurrence of $x'$ in $o'$ above must originate from some $t_j \in \vec{t}$, 
which must furthermore have some type $\rho_j$ of level $<k$, since if $t_j$ had level $k$ then 
we would have $t_j = \iota(y_j)^\eta$ which does not contain $x'$ free.  
In fact, we may decompose the substitution $[\vec{y} \mapsto \vec{t}\,]$ as
$[\vec{y}\,^+ \mapsto \iota(\vec{y}\,^+)^\eta]$ followed by $[\vec{y}\,^- \mapsto \vec{t}\,^-]$,
since none of variables of $\vec{y}\,^-$ appear free in the $\iota(y_j)^\eta$ for $y_j \in \vec{y}\,^+$.
Setting $q'^* =\, \dang{q[\vec{y}\,^+ \mapsto \iota(\vec{y}\,^+)^\eta]}$ (so that $q'^*$ is just $q'$
with the variables in $\vec{y}\,^+$ rewritten via $\iota$),
we therefore have $\dang{q'^*[\vec{y}\,^- \mapsto \vec{t}\,^-]}\, = o'$.
Thus:
\begin{eqnarray*}
\Gamma,\vec{v}\,',\iota(\vec{y}\,^+),\vec{y}\,^- & \vdash & q'^* : \pure{k} \;, \\
\Gamma,x,\vec{v}\,',\vec{y}\,' & \vdash & t_j : \rho_j \mbox{~~~for $t_j \in \vec{t}\,^-$} \;, \\
\Gamma,x,\vec{v}\,',\vec{y}\,' & \vdash & \dang{q'^* [\vec{y}\,^- \mapsto \vec{t}\,^-]} ~=~ o' \;.
\end{eqnarray*}


Since $o'^\circ \succeq {x'}^\eta$ for a suitable $x',\vec{y}\,'$-closed substitution $^\circ$
(as part of the fact that $\dang{\!d^\dag\!}$ is head-spinal),
the above already comes close to exhibiting $\pure{k}$ as a pseudo-retract of a level $<k$ product type, 
contradicting Theorem~\ref{no-retraction-thm}.
To complete the argument, we must take account of the effect of $^\circ$,
which we here write as $[\vec{w}\mapsto\vec{r}\,]$
(we may assume that $\vec{w}$ is exactly $\Gamma,\vec{v}\,',\vec{y}\,'$). 
Reordering our variables, we may now write $x,\vec{w} \vdash \vec{t}\,^-$.


Next, let us split $^\circ$ into two independent parts: a substitution $[\vec{w}\,^+\mapsto\vec{r}\,^+]$ 
covering the variables in $\Gamma,\vec{v}\,',\vec{y}\,'$ of level $\geq k$, 
and $[\vec{w}\,^-\mapsto\vec{r}\,^-]$ covering those of level $<k$.
Since $^\circ$ is $x',\vec{y}\,'$-closed, we have $\vdash \vec{r}\,^+$ and $x' \vdash \vec{r}\,^-$.
Now set ${q'}^\wr =\, \dang{{q'^*}[\vec{w}\,^+\mapsto\vec{r}\,^+]}$, 
so that $\vec{u}\,^- \vdash {q'}^\wr$ 
where $\vec{u}\,^-$ consists of the variables of $\Gamma,\vec{v}\,',\vec{y}$ of level $<k$.
The idea is that $\vec{u}\,^- \vdash {q'}^\wr$ may now serve as one half of a suitable pseudo-retraction.
For the other half, let $[\vec{u}\,^- \mapsto \vec{a}\,^-]$ denote the effect of the substitution
$[\vec{y}\,^- \mapsto \vec{t}\,^-]$ followed by $^\circ =[\vec{w} \ \mapsto \vec{r}\,]$
(the order is important here as $\vec{y}\,^-$ and $\vec{w}$ may overlap).
Since $\vec{u}\,^- \subseteq \vec{y} \cup \vec{w}$ and $x,\vec{w} \vdash \vec{t}\,^-$ 
and $x \vdash \vec{r}$,
this substitution does indeed cover at least the variables of $\vec{u}\,^-$ and we have $x \vdash \vec{a}\,^-$.
We may now verify that  $\vec{u}\,^- \vdash {q'}^\wr$ and $x \vdash \vec{a}\,^-$ 
constitute a pseudo-retraction as follows:
\begin{eqnarray*}
x' & \vdash & \dang{{q'}^\wr [\vec{u}\,^- \mapsto \vec{a}\,^-]} \\
   & = & \dang{q'^* [\vec{w}\,^+ \mapsto \vec{r}\,^+] [\vec{y}\,^- \mapsto \vec{t}\,^-] [\vec{w} \ \mapsto \vec{r}\,]} \\
   & = & \dang{(q'^* [\vec{y}\,^- \mapsto \vec{t}\,^-]) [\vec{w} \ \mapsto \vec{r}\,] } \\
   & = & \dang{{o'}^\circ} ~\succeq~ {x'}^\eta \;.
\end{eqnarray*}
As regards the second equation here, the first substitution $[\vec{w}\,^+ \mapsto \vec{r}\,^+]$ may be safely
omitted as $\vec{w}\,^+$ and $\vec{y}\,^-$ are disjoint and the terms $\vec{r}\,^+$ do not contain
any of the $\vec{w}\,^+$ or $\vec{y}\,^-$ free.
We therefore have $\pure{k}$ as a pseudo-retract of a product of level $<k$ types.
This contradicts Theorem~\ref{no-retraction-thm}, so the proof of Claim 1 is complete.

\vspace*{1.0ex}
We may therefore suppose that in the evaluation $\dang{\!e^\dag\!} = E[c]$,
the originating occurrence of the head $g$ in $c$
is as indicated by $\Gamma,x',\vec{v} \vdash e = C[d']$,
where $\Gamma,x',\vec{v},\vec{v}\,'' \vdash d' =\caseof{gp'q'}{\cdots}$.
(Here $\vec{v}\,''$ is the local environment for $C[-]$.
The symbols $d',p',q'$ are available for recycling now that the proof of Claim~1 is complete.)

In order to continue our analysis to greater depth, note that we may write
\[ \Gamma,x',\vec{v}\,' \vdash~
    \dang{\!e^\dag\!} ~=~ \dang{(\lambda \vec{v}.\,C[d'])\vec{s}} ~=~ E[c] \;, \] 
where $c$ is head-spinal w.r.t.\ $x',\vec{y}\,'$, and the head $g$ of $d'$ is the origin of the head $g$ of $c$.
(Recall that $\vec{y}\,'$ is the local environment for $E[-]$ and that 
$\Gamma,x',\vec{v} \vdash e = C[d']$ is regular, whence ${\mathrm{lv}}(\vec{v}\,'') \leq k$.)

We claim that once again we are in the situation of Lemma~\ref{g-lemma-1},
taking $\Gamma,K,d,K',c$ of the lemma to be respectively
$(\Gamma,x',\vec{v}\,')$, $(\lambda \vec{v}.\,C[-])\vec{s}$, $d', E, c$.
Condition~2 of the lemma is immediate in the present setup;
for condition~1, we again note that
$\Gamma,x',\vec{v}\,' \vdash C[d'] = e$ and $\Gamma,\vec{v}\,' \vdash \vec{s}$ are regular,
so by Proposition~\ref{regular-term-prop} contain no bound variables of level $>k$;
hence the same is true for $(\lambda \vec{v}.\,C[d'])\vec{s}$.
Applying Lemma~\ref{g-lemma-1}, we obtain a substitution 
$^{\dag'} = [\vec{v}\,^+ \mapsto \vec{s}\,^+]$ of level $\leq k$
(with $\vec{v}\,^+ = \vec{v},\vec{v}\,''$),
where $\Gamma,x',\vec{v}\,',\vec{v}\,^+ \vdash d'$ 
and $\Gamma,x',\vec{v}\,',\vec{y}\,' \vdash \vec{s}\,^+$ are regular, such that 
\[ \Gamma,x',\vec{v}\,',\vec{y}\,' \;\vdash\; \dang{d'[\vec{v}\,^+\!\mapsto\!\vec{s}\,^+]} \]
is head-spinal w.r.t.\ $x',\vec{y}\,'$, and indeed of the form $\caseof{gF'o'}{\cdots}$ with $F',o'$ as above.
(We may in fact write just $\Gamma,x',\vec{v}\,^+ \vdash d'$
since, by assumption, the variables of $\vec{v}\,'$ do not overlap with the free or bound variables of $d$
so do not appear in $d$.)
We may also read off that $\dang{(p')^{\dag'}} \,= F'$ and $\dang{(q')^{\dag'}} \,= o'$.
As regards the substitution $^{\dag'} =  [\vec{v}\,^+ \mapsto \vec{s}\,^+]$, it is clear that this
extends $ [\vec{v} \mapsto \vec{s}\,]$ since the evaluation of $(\lambda \vec{v}.\,C[d'])\vec{s}$
starts by $\beta$-reducing this term. 
Moreover, the argument of Lemma~\ref{g-lemma-1}(iii) shows that $^{\dag'}$ is trivial for
any level $k$ variables in $\vec{v}\,''$, as $C[-]$ is in normal form.

We are now back precisely where we started, in the sense that $d',\vec{v}\,^+,\vec{s}\,^+$
themselves satisfy the hypotheses of Lemma~\ref{g-lemma-2}, 
with $(\Gamma,x')$ now playing the role of $\Gamma$
and $(\vec{v}\,',\vec{y}\,')$ that of $\vec{v}\,'$.
Explicitly, we have regular terms
\[  \Gamma,x,\vec{v}\,^+ \vdash d' = \caseof{gp'q'}{\cdots} \;, ~~~~~~~~
    \Gamma,x,\vec{v}\,',\vec{y}\,' \vdash \vec{s}\,^+  \]
(so that ${\mathrm{lv}}(\vec{v}\,^+), {\mathrm{lv}}(\vec{v}\,',\vec{y}\,') \leq k$) 
where $\Gamma,x,\vec{v}\,',\vec{y}\,' \vdash \dang{d' [\vec{v}\,^+ \mapsto \vec{s}\,^+]}$
is head-spinal w.r.t.\ $x',\vec{y}\,'$.
We can therefore iterate the whole of the above argument to obtain an infinite descending chain of subterms
\[ \begin{array}{rrlrl}
\Gamma,\vec{v} \vdash &
d = & \caseof{gpq}{\cdots} \;, & 
p = & \lambda x'.\,C[d'] \;, \\
\Gamma,\vec{v},x',\vec{v}\,'' \vdash &
d' = & \caseof{gp'q'}{\cdots} \;, & 
p' = & \lambda x''.\,C'[d''] \;, \\
\Gamma,\vec{v},x',\vec{v}\,'',x'',\vec{v}\,'''' \vdash &
d'' = & \caseof{gp''q''}{\cdots} \;, &
p'' = & \lambda x'''.\,C''[d'''] \;, \\
& \cdots & & \cdots &
\end{array} \]
along with associated substitutions $^\dag$, $^{\dag'}$, $^{\dag''}, \ldots$
applicable to $d,d',d'',\ldots$ respectively, such that 
$\dang{\!p^\dag\!}$, $\dang{(p')^{\dag'}\!}$, $\dang{(p'')^{\dag''}\!}, \ldots$
coincide with the successive procedure subterms $F,F',F'',\ldots$ 
from the spine of the original term $\dang{\!d^\dag\!}$,
and likewise $\dang{\!q^\dag\!}$, $\dang{(q')^{\dag'}\!}$, $\dang{(q'')^{\dag''}\!}, \ldots$
coincide with $o,o',o''\ldots$.

We cannot quite conclude that $d$ is head-spinal, because the critical $x$ in 
$q^\dag$ might originate not from $q$ but from a level $k$ term in $\vec{s}$ (for example).
However, we can show that this problem does not arise for $q',q'',\ldots$, essentially because
$x',x'',\ldots$ are bound locally within $p$. 
We will in fact show that $d'$ is head-spinal w.r.t.\ $x',\vec{v}\,''$,
where $\vec{v}\,''$ is the local environment for $C[-]$; 
this will imply that $d$ is spinal.
In the light of Definition~\ref{spinal-def}, 
it will be sufficient to show that $(q')^{\circ'} \succeq {x'}^\eta$ for some $x',\vec{v}\,''$-closed
specialization $^{\circ'}$ covering the free variables of $q'$ except $x'$
(namely those of $\Gamma,\vec{v},\vec{v}\,''$);
the same argument will then obviously apply also to $q'',q''',\ldots$.

Recall that $\Gamma,\vec{v},\vec{v}\,'',x' \vdash q'$ 
and $\Gamma,\vec{v}\,',\vec{y}\,',x' \vdash o'$.
Since $gF'o'$ is head-spinal w.r.t.\ $x,\vec{y}\,'$,
we may as before take $^\circ = [\vec{w}\mapsto\vec{r}\,]$
$x',\vec{y}\,'$-closed such that ${o'}^\circ \succeq {x'}^\eta$,
where $\vec{w} = \Gamma,\vec{v}\,',\vec{y}\,'$ and $x' \vdash \vec{r}$.
Now define
\[ ^{\circ'} ~=~ [\vec{v} \mapsto \vec{s}\,^\circ, \; 
                          \vec{v}\,'' \mapsto (\vec{s}\,'')^\circ, \; 
                          \vec{w}\,^\Gamma\mapsto\vec{r}\,^\Gamma] \]
(where we write $\vec{s}\,^+$ as $\vec{s},\vec{s}\,''$, 
and $\vec{w}\,^\Gamma \mapsto \vec{r}\,^\Gamma$ denotes the restriction of $^\circ$ to $\Gamma$).
This covers the free variables of $q'$ except $x'$, and we have
$x' \vdash \vec{s}\,^\circ, (\vec{s}\,'')^\circ, \vec{r}\,^\Gamma$ because 
$\vec{w} \vdash \vec{s}, \vec{s}\,''$ and $x \vdash \vec{r}$.
Moreover, we have
\[  q'^{\circ'} ~=~ (q' [\vec{v}\,^+ \mapsto (\vec{s}\,^+)^\circ])[\vec{w}\,^\Gamma \mapsto \vec{r}\,^\Gamma] ~=~ (q'^{\dag'})^\circ ~\approx~ o^\circ ~\succeq~ x'^\eta \]
since $(\vec{s}\,^+)^\circ$ contains no free variables except $x$.
To check that $^{\circ'}$ is $x',\vec{v}\,''$-closed, it remains to show that
that $u^{\circ'}$ may contain $x'$ free only when $u \in \vec{v},''$ and $u$ is of level $<k$.
(Indeed, it is because of the possibility of $x'$ occurring free in these terms that 
the machinery of $x,V$-closed substitutions is necessary at all.)
The remaining cases are handled as follows:
\begin{itemize}
\item The terms $\vec{s}$ exist in environment $\Gamma,\vec{v}\,'$, so do not involve $x'$
or any of the variables of $\vec{y}\,'$. 
Since $^\circ$ is $x',\vec{y}\,'$-closed, it follows that the terms $\vec{s}\,^\circ$ do not involve $x'$.
\item For any variables $v \in \vec{v}\,''$ of level $k$, we have $v^{\dag'} = v^\eta$ which contains
no free variables of level $<k$, so that $(v^{\dag'})^\circ$ cannot involve $x'$.
\item The $\vec{r}\,^\Gamma$ cannot involve $x'$, since $^\circ$ is $x',\vec{y}\,'$-closed
and $\Gamma$ is disjoint from $\vec{y}\,'$.
\end{itemize}
This completes the verification that $d$ is spinal.
$\Box$
\end{proof}



\vspace*{1.0ex}
From the above lemma we may immediately conclude, for example, 
that in the setting of Lemma~\ref{g-lemma-1}(ii) the term $d$ is spinal.

We are now ready for the main result of this section:

\begin{theorem}  \label{no-gremlin-thm}
Every ${\mathrm{PCF}}^\Omega_k$-denotable procedure $\Gamma \vdash p$ is non-$g$-spinal
where $g: (k+1) \rightarrow (k+1)$.
\end{theorem}

\begin{proof}
In the light of Section~\ref{sec-denotations}, it will suffice to show that the 
clauses of Theorem~\ref{denotable-inductive-thm}
cannot generate spinal terms from non-spinal ones.
For clauses 1--5 this is very easily seen.
For clause 6, it will be sufficient to show that non-spinal terms are closed under $k$-plugging,
and it is here that the machinery of Lemmas~\ref{g-lemma-1} and \ref{g-lemma-2} comes into play.

Suppose that $t =\, \dang{\Pi_{\Gamma,Z}(e,\xi)}$ where $\Gamma,Z \vdash e$
and $\Gamma,Z \vdash \xi(z)$ for each $z \in Z$.
For later convenience, to each $z_i \in Z$ let us associate the procedure
$\Gamma \vdash r_i =\, \dang{\Pi_{\Gamma,Z}(\xi(z_i),\xi)}$;
it is then clear from the definition of plugging and the evaluation theorem
that $t =\, \dang{e[\vec{z}\mapsto\vec{r}\,]}$
and that $r_i =\, \dang{\xi(z_i)[\vec{z}\mapsto\vec{r}\,]}$ for each $i$.

It will be natural to frame the argument contrapositively.
Suppose that $t$ is spinal, i.e.\ $t$ contains some head-spinal expression $c$ 
at position $K[-]$. We shall focus on the head occurrence of $g$ in $c$.
Clearly this occurrence must originate from one of the ingredients
$\Gamma,Z \vdash e$ or $\Gamma,Z \vdash \xi(z)$ of the plugging $\Pi_{\Gamma,Z}(e,\xi)$;
let us denote this ingredient by $\Gamma,Z \vdash t_0$.
We will show that $t_0$ itself is spinal.

Suppose that the relevant occurrence of $g$ in $t_0$ is as indicated by
\[ \Gamma,Z \;\vdash\; t_0 ~=~ L[d] \;, ~~~~~~ 
    \Gamma,Z,\vec{v} \;\vdash\; d ~=~ \caseof{g p q}{\cdots} \;, \]
where $\vec{v}$ is the local environment for $L[-]$.
Writing $^\star$ for $[\vec{z}\mapsto\vec{r}\,]$, we have
$\dang{\Pi(t_0,\xi)} \,=\, \dang{t_0^\star}$; 
and if $C[-]$ is the context encapsulating the remainder of the plugging $\Pi(e,\xi)$, 
then we may write
\[ \Gamma \vdash~ K[c] ~=~ t ~=~ \dang{C[\Pi(t_0,\xi)]} ~=~ \dang{C[t_0^\star]} 
   ~=~ \dang{C[L^\star[\dang{\!d^\star\!}]]} \;, \]
where 
\[ \dang{\!d^\star\!} ~=~ 
   \caseof{g \dang{\!p^\star\!} \dang{\!q^\star\!}\!}{\cdots} \;. \]
We claim that we are in the situation of Lemma~\ref{g-lemma-1},
taking $\Gamma,K,d,K',c$ of the lemma to be respectively
$\Gamma,C[L^\star[-]],\dang{\!\!d^\star\!\!},K,c$.
Condition~1 of the lemma holds because $C[t_0^\star]$ is constructed by substitution
from normal-form terms of level $\leq k$,
and condition~2 is immediate in the present setup.

By Lemma~\ref{g-lemma-1}, 
we may therefore conclude that
for a suitable substitution $[\vec{v} \mapsto \vec{s}\,]$
with $\Gamma, \vec{v}\,' \vdash \vec{s}$ regular, 
$\Gamma \vdash\, \dang{\dang{\!d^\star\!}[\vec{v}\mapsto\vec{s}\,]}$ is head-spinal.
(Note that the local variables of $C[-]$ do not appear in $d^\star$, because $\Gamma,Z \vdash t_0$
and $\Gamma \vdash \vec{r}$.)
Equivalently, we may say that $\dang{d[\vec{v}\,^+\!\mapsto\vec{s}\,^+]}$ 
is head-spinal, where
\[ [\vec{v}\,^+\mapsto\vec{s}\,^+] ~=~ 
   [\vec{z}\mapsto\vec{r}, ~\vec{v}\mapsto\vec{s}\,]
   
\;, \]
so that $\Gamma, \vec{v}\,' \vdash \vec{s}\,^+$ 
and ${\mathrm{lv}}(\vec{v}\,^+),{\mathrm{lv}}(\vec{v}\,') \leq k$.
(Note that the $\vec{z}$ do not appear free in $\vec{s}$, nor the $\vec{v}$ in $\vec{r}$.)

Since $\Gamma,Z,V \vdash d$ and $\Gamma,\vec{v}\,' \vdash \vec{s}\,^+$ are regular,
we are in the situation of Lemma~\ref{g-lemma-2}, so may conclude
that $d$ itself is spinal, and hence that $t_0$ is spinal.
We have thus shown that $k$-plugging cannot assemble spinal terms from non-spinal ones,
and this completes the proof.
$\Box$
\end{proof}

\vspace*{1.0ex}
In particular, since the procedure $g \vdash F_{k+1}[g]$ mentioned at the start of the section
is spinal, we may conclude that this procedure is not ${\mathrm{PCF}}^\Omega_k$-denotable,
and hence neither is the procedure $Y_{k+1}$. This establishes Theorem~\ref{SP0-main-thm}.

We conclude the section by mentioning some minor variations on Theorem~\ref{no-gremlin-thm}
that we will require below.
First, as already indicated, the whole of the above proof goes through for the modified notion 
of spinal term appropriate to a variable $g: 0 \rightarrow (k+1) \rightarrow (k+1)$.
Secondly, the theorem also holds for an innocuous extension of
${\mathrm{PCF}}^\Omega_k$ with a constant ${\mathit{byval}} : (\nat\rightarrow\nat) \rightarrow \nat \rightarrow \nat$,
whose denotation in ${\mathsf{SP}}^0$ we take to be 
\[ \lambda fx.\;\caseof{x}{i \Rightarrow \caseof{fi}{j \Rightarrow j}} \;. \]
To see that the proof of Theorem~\ref{no-gremlin-thm} goes through in the presence of ${\mathit{byval}}$,
it suffices simply to add an extra clause to the inductive proof noting that the procedure for ${\mathit{byval}}$
is non-spinal.
This mild extension will allow a significant simplification of the
forms of procedures that we need to consider in Section~\ref{sec-extensional}.%
\footnote{The operator ${\mathit{byval}}$ plays a major role in \cite[Section~7.1]{Longley-Normann},
where it is shown that every element of ${\mathsf{SP}}^0$ is denotable in ${\mathrm{PCF}}^\Omega + {\mathit{byval}}$.
The sense in which it is innocuous is that its denotation in ${\mathsf{SF}}$ coincides with that of
$\lambda fx.\,{\mathit{ifzero}}\,x\,(fx)(fx)$; thus ${\mathit{byval}}$ adds nothing to the expressivity of ${\mathrm{PCF}}^\Omega_k$
as regards ${\mathsf{SF}}$.}


\section{Non-definability in the extensional model}  \label{sec-extensional}

To obtain corresponding non-definability results for ${\mathsf{SF}}$ rather than ${\mathsf{SP}}^0$,
one must show not only that the canonical procedures $Y_\tau$ considered above are not
${\mathrm{PCF}}^\Omega_k$-denotable, but also that no extensionally equivalent procedures are.
It is easy to see that there are indeed many other procedures $Z$
with the same extension as $Y_\tau$.
To give a trivial example, we may present the canonical procedure $Y_{k+1}$ as
$\lambda gx.C[g,x]$, where
\[ C[g,x] ~=~ \caseof{A[g,x]}{i \Rightarrow i} \;, ~~~~~~~~
   A[g,x] ~=~ g(\lambda x'.C[g,x'])x^\eta \;. \]
However, another candidate for the fixed point operator is
\[  Z_0 ~=~ \lambda gx.\,\caseof{A[g,x]}{i \Rightarrow {C[g,x]}} \;. \]
Intuitively, this computes the desired value twice, discarding the first result.

As a slightly more subtle example, consider the procedure
\[  Z_1 ~=~ \lambda gx.\,\caseof{g(\lambda x'.\caseof{A[g,x]}{i \Rightarrow {C[g,x']}}) x^\eta}{k \Rightarrow k} \;. \]
Here, within the $\lambda x'$ subterm, we have smuggled in a repetition of the 
top-level computation $A[g,x]$ before proceeding to evaluate what is really required.
The effect is that
$\lambda x'.\caseof{A[g,x]}{i \Rightarrow C[g,x']}$ may be 
extensionally below $\lambda x'.C[g,x']$,
and this may indeed affect the result when $g$ is applied.
However, this can only happen when $Y_{k+1}gx$ is undefined anyway,
so it is easy to see that $Z_1$ as a whole will have the same extension as $Y_{k+1}$.

Yet another way to construct procedures extensionally equivalent to $Y_{k+1}$
is to vary the subterms of the form $x^\eta$ (where $x$ has type $k$).
For instance, in the case $k=1$, we could replace $x^\eta$ by an
extensionally equivalent term such as
\[ X_0 ~=~ \lambda y^0.\,
   x(\lambda.\,\caseof{y}{0 \Rightarrow \caseof{x(\lambda.0)}{j \Rightarrow 0} \mid i+1 \Rightarrow i+1}) \;. \]
This is different in character from the previous examples: 
rather than simply repeating the computation of $xy^\eta$, we are performing the specific
computation $x(\lambda.0)$ which we can see to be harmless given that
this point in the tree has been reached.
Clearly, such `time-wasting' tricks as the above may be combined and elaborated 
to yield more complex examples of procedures equivalent to $Y$.

However, all of the above are rather innocuous variations
and do not really yield a fundamentally different method of computing fixed points.
For example, the bodies of both $Z_0,Z_1$ are still head-spinal terms, 
and it is essentially the spines that are really computing the desired fixed point
by the canonical method.
This suggests that we should try to show that every procedure extensionally equivalent
to $Y_{k+1}$ is spinal;
from this it would follow easily by Theorem~\ref{no-gremlin-thm} that
the fixed point functional $Y_{k+1}$ in ${\mathsf{SF}}$ 
is not denotable in ${\mathrm{PCF}}^\Omega_k$.

Unfortunately, we are currently unable to show this in the case of $Y_{k+1}$:
indeed, the syntactic analysis of procedures $Z \approx Y_{k+1}$
appears to present considerable technical difficulties.
We shall establish the result for $Y_{0 \rightarrow (k+1)}$, 
although even here, it is simplest to concentrate not on $Y_{0 \rightarrow (k+1)}$ itself, 
but on a certain functional that is readily definable from it,
namely the functional $\Phi_{k+1}$ introduced in Section~\ref{sec-intro}.
Nonetheless, the above examples of `time-wasting' procedures illustrate some of the 
situations that our proof will need to deal with, and they may help to motivate
some of the technical machinery that follows.

Specifically, within ${\mathrm{PCF}}_{k+1}$, let us define
\begin{eqnarray*}
\Phi_{k+1} & : & ({0} \rightarrow (k+1) \rightarrow (k+1)) 
                         \rightarrow ({0} \rightarrow (k+1))  \\
\Phi_{k+1}\,g^{0\rightarrow(k+1)\rightarrow(k+1)} & = & 
   Y_{{0}\rightarrow(k+1)}\, (\lambda f^{0\rightarrow(k+1)}.\lambda n.\,g\,n\,(f({\mathit{suc}}\;n))) \;,
\end{eqnarray*}
so that informally
\[ \Phi_{k+1}\,g\,n ~=~ g\,n\,(g\,(n+1)\,(g\,(n+2)\, (\cdots))) \;. \]
For the rest of this section we will write $\Phi_{k+1}$ simply as $\Phi$.
For each $n \in \mathbb{N}$, let $g \vdash \Phi^n[g] = \Phi\,g\,\num{n} : {k+1}$,
and let $p_n \in {\mathsf{SP}}(k+1)$ be the canonical NSP for $\Phi^n[g]$
(that is, the one arising from the above ${\mathrm{PCF}}$ definition
via the standard interpretation in ${\mathsf{SP}}^0$).
These procedures may be defined simultaneously by:
\[ g~ \vdash~ p_n ~=~ \lambda x^k.\,\caseof{g\,(\lambda.n)\,p_{n+1}\,x^\eta}{i \Rightarrow i}
   ~:~ k+1 \;. \]

By a syntactic analysis of the possible forms of (simple) procedures $g \vdash q \approx p_n$,
we will show that any such $q$ is necessarily spinal.
Here we have in mind the modified notion of spinal term that is applicable to
terms involving a global variable $g : \rho$, where $\rho = {0} \rightarrow (k+1) \rightarrow (k+1)$
(see the explanation following Definition~\ref{spinal-def}).
Using Theorem~\ref{no-gremlin-thm} (adapted to this modified setting), 
it will then be easy to conclude that within ${\mathsf{SF}}$, the element
$\sem{\lambda g.\,\Phi^0[g]}$, and hence $Y_{{0}\rightarrow(k+1)}$ itself, 
is not ${\mathrm{PCF}}^\Omega_k$-denotable in ${\mathsf{SF}}$.

To show that any $q \approx p_n$ is head-spinal, our approach will be as follows.
First, we show that any such $q$ must broadly resemble $p_n$ in at least its top-level
structure, in that $q$ must have the form $\lambda x.\,\caseof{garo}{\cdots}$,
where the arguments $a,r,o$ are closely related to the corresponding
arguments $(\lambda.n), p_{n+1}, x^\eta$ occurring within $p_n$.
We do this by showing that if $q$ were to deviate in any way from this prescribed form, 
we would be able to cook up procedures $G \in {\mathsf{SP}}^0(\rho)$ and $X \in {\mathsf{SP}}^0(k)$ 
manifesting an extensional difference between $q$ and $p_n$, i.e.\ such that
$q[g \mapsto G]X \not\approx p_n[g \mapsto G]X$.
(Contrary to our usual convention, we will here use the uppercase letters $G,X$ to range over
normal-form closed procedures that may be substituted for $g,x$ respectively.)
In particular, we shall establish a sufficiently close relationship between $r$ and $p_{n+1}$
that the same analysis can then be iterated to arbitrary depth, showing that $q$ has
a spinal structure as required.

The main complication is that $r$ need not superficially resemble $p_{n+1}$,
since within $r$, the crucial application of $g$ that effectively computes the value of $p_{n+1}$
may be preceded by other `time-wasting' applications of $g$ (the idea is illustrated by the
example $Z_1$ above). However, it turns out that such time-wasting subterms
$g a^1 r^1 o^1$ must be of a certain kind if the extensional behaviour $q \approx p_n$
is not to be jeopardized: in particular, the first argument $a^1$ must evaluate to some 
$i < n$. (As in the example of $Z_1$, the idea is that if the subterm $g a^1 r^1 o^1$
merely repeats some `outer' evaluation, it will make no overall difference to the extension
if the evaluation of this subterm does not terminate.)
In order to formulate the relationship between $r$ and $p_{n+1}$, we therefore need a
means of skipping past such time-wasting applications in order to reach the application of $g$
that does the real work.
We achieve this with the help of a \emph{masking} operator $\mu_{n,n'}$,
which (for any $n \leq n'$) overrides the behaviour of $g$ on numerical arguments $n \leq i < n'$
with a trivial behaviour returning the dummy value $0$.

We now proceed to our formal development.
As a brief comment on notation, we recall from Section~\ref{sec-background}
that the relations $\approx$ and $\preceq$ of observational equivalence and inequality
make sense not just for elements of ${\mathsf{SP}}^0$ but for arbitrary meta-terms (including applications), closed or otherwise. 
Throughout this section, for typographical convenience, 
we will tend to express the required relationships mostly at the level of meta-terms,
writing for instance $pq \approx \lambda.n$ rather than the equivalent $\dang{pq}\,=\lambda.n$
or $p \cdot q = \lambda.n$.
We shall also perpetrate other mild abuses of notation, such as writing 
a procedure $\lambda.n$ simply as $n$ (except for special emphasis),
$\lambda \vec{x}.\bot$ as $\bot$, $x^\eta$ as $x$,
a meta-expression $\caseof{A}{i \Rightarrow i}$ just as $A$,
and abbreviating a substitution $[g \mapsto G,\, x \mapsto X]$ to $[G,X]$.

We shall say that $G \in {\mathsf{SP}}^0(0\rightarrow(k+1)\rightarrow(k+1))$ is \emph{strict} 
if $G\bot ro \approx \bot$ for any $r,o$. Clearly, $G$ is strict iff
$G \approx \lambda izx.\,\caseof{i}{j \Rightarrow G (\lambda.j) z^\eta x^\eta}$.
In connection with meta-terms with free variable $g$, we shall write
$T \approx^\prime T'$ to mean that $T[g \mapsto G] \approx T'[g \mapsto G]$
for all \emph{strict} $G$; the notation $\preceq^\prime$ is used similarly.
We shall actually analyse the syntactic forms of procedures $g \vdash q$ based on
the assumption that $q \succeq^\prime p_n$,
where $p_n$ is the canonical procedure for $\Phi^n[g]$ as above.

We shall say a procedure $g \vdash q$ is \emph{simple} if for every application
$garo$ appearing within $q$, the first argument $a$ is just a numeral $\lambda.n$.
The following observation simplifies our analysis of terms considerably;
it uses the operator ${\mathit{byval}}$ and its NSP interpretation,
as introduced at the end of Section~\ref{sec-Y-k+1}.

\begin{proposition}  \label{simple-prop}
If there is a procedure $g \vdash q \succeq^\prime p_n$-denotable in ${\mathrm{PCF}}^\Omega_k$,
then there is a simple procedure $g \vdash q' \succeq^\prime p_n$
denotable in ${\mathrm{PCF}}^\Omega_k + {\mathit{byval}}$.
\end{proposition}

\begin{proof}
Suppose $g \vdash q$ is ${\mathrm{PCF}}^\Omega_k$-denotable where $q \succeq^\prime p_n$, and write 
\[ S[g] ~=~ \lambda izx.\,\caseof{i}{j \Rightarrow g(\lambda.j)z^\eta x^\eta} \;. \]
It is easy to see that
\[ S[g] ~=~ \sem{\lambda izx.\,{\mathit{byval}}\,(\lambda j. gjzx)\,i\,}_g \;. \]
Take $g \vdash q' =\, \dang{q[g \mapsto S[g]]}$,
so that $q'$ is denotable in ${\mathrm{PCF}}^\Omega_k + {\mathit{byval}}$.
Then $q \approx^\prime q'$ since $S[G] \approx G$ for all strict $G$, so 
$q' \succeq^\prime p_n$. Finally, $q'$ is clearly simple: every occurrence of $g$ within
$q[g \mapsto S[g]]$ has a numeral as its first argument, so the same will be true of
$\dang{q[g \mapsto S[g]]}$.
$\Box$
\end{proof}

\vspace*{1.0ex}
For any $n \leq n'$, let us define the \emph{masking} $\mu_{n,n'}(g)$ of $g$
to be the following procedure term (here $\rho = 0 \rightarrow (k+1) \rightarrow (k+1)$):
\[ g^\rho ~\vdash~ \mu_{n,n'}(g) ~=~ 
   \lambda izx.\, 
   \caseof{i}{n \Rightarrow 0 \mid \cdots \mid n'-1 \Rightarrow 0 \mid - \Rightarrow gizx} ~:~ \rho \;. 
\]
(The wildcard symbol `$-$' covers all branch indices not covered by the preceding clauses.)
We may also write $\mu_{n,n'}(P)$  for $\mu_{n,n'}(g)[g \mapsto P]$ if $P$ is any
meta-procedure of type $\rho$.
We write $\mu_{n,n+1}$ simply as $\mu_n$; note also that $\mu_{n,n}(g) \approx^\prime g$.
Clearly $\mu_n(\mu_{n+1}(\cdots(\mu_{n'-1} (g)) \cdots)) \approx \mu_{n,n'}(g)$.

We shall say that a closed meta-term $\vdash P : \rho$ is \emph{trivial at $n$} if 
$P(\lambda.n)\bot\bot \approx 0$.
Note that $\mu_{n,n'}(G)$ is trivial at each of $n,\ldots,n'-1$ for any closed $G$;
indeed, $G$ is trivial at $n,\ldots,n'-1$ iff $G \succeq^\prime \mu_{n,n'}(G)$.

The following lemma now implements our syntactic analysis of the top-level structure of simple procedures $q \succeq^\prime p_n$.

\begin{lemma}  \label{Qn-lemma}
Suppose $g \vdash q$ is simple and $q \succeq^\prime p_n$. Then $q$ has the form 
$\lambda x^k.\,\caseof{g a r o}{\cdots}$, where:
\begin{enumerate}
\item $a = \lambda.n$,
\item $o[g \mapsto G] \succeq x^\eta$ whenever $\vdash G$ is trivial at $n$,
\item $r[g \mapsto \mu_n(g),\, x \mapsto X] \succeq^\prime p_{n+1}$ for any $X$.
\end{enumerate}
\end{lemma}

\begin{proof}
Suppose $q = \lambda x^k.e$.
Clearly $e$ is not constant since $q \succeq p_n$;
and if $e$ had head variable $x$, we would have
$q[g \mapsto \lambda izx.\,\caseof{i}{- \Rightarrow 0}](\lambda w.\bot) = \bot$, 
whereas
$p_n[g \mapsto \lambda izx.\,\caseof{i}{- \Rightarrow 0}](\lambda w.\bot) = 0$,
contradicting $q \succeq^\prime p_n$.
So $e$ has the form $\caseof{g a r o}{\cdots}$.

For claim~1, we have $a = \lambda.m$ for some $m \in \mathbb{N}$
because $q$ is simple. Suppose that $m \neq n$,
and consider
\[ G' ~=~ \lambda izx.\,\caseof{i}{n \Rightarrow 0 \mid - \Rightarrow \bot} \;. \]
Then for any $X$, clearly $q[G']X \approx \bot$, whereas
$p_n[G']X \approx G'(\lambda.n)(\cdots)X \approx 0$,
contradicting $q \succeq^\prime p_n$. Thus $a = \lambda.n$.

For claim~2, suppose that $G (\lambda.n) \bot \bot \approx 0$
but not $o[G] \succeq x^\eta$;
then we may take $X \in {\mathsf{SP}}^0({k})$ and $u \in {\mathsf{SP}}^0({k-1})$
such that $Xu \approx l \in \mathbb{N}$ but $o[G,X]u \not\approx l$. Now define
\[ G' ~=~ \lambda izx.\,\caseof{i}{j \Rightarrow \caseof{xu}{l \Rightarrow Gjzx \mid - \Rightarrow \bot}} \;. \]
Then $G' \preceq G$, so $o^* u \not\approx l$ where $^* = [G',X]$;
hence $G' a^* r^* o^* \approx \bot$ and so $q[G']X \approx \bot$.
On the other hand, we have
\begin{eqnarray*}
p_n[G']X & \approx & \caseof{G'(\lambda.n)p_{n+1}^*X}{i \Rightarrow i} \\
& \approx & \caseof{Xu}{l \Rightarrow G(\lambda.n)p_{n+1}^*X} ~\approx~ 0 \;,
\end{eqnarray*}
contradicting $q \succeq^\prime p_n$ (note that $G'$ is strict). Thus $o[G] \succeq x^\eta$.

For claim~3, suppose that $p_{n+1}[G]X' \approx l$ for some strict $G \in {\mathsf{SP}}^0(\rho)$
and $X' \in {\mathsf{SP}}^0(k)$.
We wish to show that $r[\mu_n(G), X]X' \approx l$ for any $X$.
Suppose not, and consider
\[ G' ~=~ 
\lambda izx.\,\caseof{i}{n \Rightarrow \caseof{zX'}{l \Rightarrow 0 \mid - \Rightarrow \bot} \mid - \Rightarrow Gizx} \;. \]
Then $G' \preceq \mu_n(G)$, so $r[G',X]X' \not\approx l$.
Moreover, since $a=\lambda.n$ by claim~1, 
we see that $G' a^* r^* o^* \approx \bot$, where $^* = [G',X]$,
so that $q[G']X \approx \bot$.
On the other hand, we have
\begin{eqnarray*}
p_n[G']X & \approx & \caseof{G'(\lambda.n)p_{n+1}^*X}{i \Rightarrow i} \\
& \approx & \caseof{p_{n+1}^*X'}{l \Rightarrow 0} \;.
\end{eqnarray*}
Here, since $p_{n+1}$ does not contain $x$ free, we have $p_{n+1}^* = p_{n+1}[G']$.
But it is easy to see that $p_{n+1}[G'] \approx p_{n+1}[G]$, since
every occurrence of $g$ within $p_{n+1}$ is applied to $\lambda.n'$ for some $n'>n$,
and for all such $n'$ we have $G'(\lambda.n') \approx G(\lambda.n')$.
(Since $p_{n+1}$ contains infinitely many applications of $g$, an appeal to continuity is
formally required here.)
But $p_{n+1}[G]X' \approx l$ by assumption; thus $p_{n+1}^*X' \approx l$,
allowing us to complete the proof that $p_n[G']X \approx 0$.
Once again, this contradicts $q \succeq^\prime p_n$, so claim~3 is established.
$\Box$
\end{proof}

\vspace*{1.0ex}
In the light of Appendix~A, one may strengthen claim~2 of the above lemma 
by writing $o[g \mapsto G] \approx x^\eta$.
This gives a fuller picture of the possible forms of terms $q \approx p_n$,
but is not needed for showing that such $q$ are spinal.

We have now almost completed a circle, in the sense that claim~3 tells us that 
$\dang{r[g \mapsto \mu_n(g), x \mapsto X]}$ itself satisfies the hypothesis for $q$
(with $n+1$ in place of $n$).
However, there still remains a small mismatch between the hypothesis and the conclusion,
in that claim~3 concerns not $r$ itself but rather $\dang{r[g \mapsto \mu_n(g)]}$. 
(In the light of claim~3, the variable $x$ may be safely ignored here.)
This mismatch is repaired by the following lemma, which lends itself to iteration to any depth.
Note the entry of a term context $E[-]$ here.

\begin{lemma}  \label{q-structure-lemma}
Suppose $g \vdash q$ is simple and 
$q[g \mapsto \mu_{n,n'}(g)] \succeq^\prime p_{n'}$.
Then $q$ has the form $\lambda x^k.\,E[\caseof{garo}{\cdots}]$, where:
\begin{enumerate}
\item $E[-]$ has empty local variable environment,
\item $a = \lambda.n'$,
\item $o[g \mapsto G] \succeq x^\eta$ whenever $\vdash G$ is trivial at $n,\cdots,n'$,
\item $r[g \mapsto \mu_{n,n'+1}(g),\,x \mapsto X] \succeq^\prime p_{n'+1}$ 
for any $X$.
\end{enumerate}
\end{lemma}

\begin{proof}
Let $q' =\, \dang{q[g \mapsto \mu_{n,n'}(g)]}$.
Under the above hypotheses, we have by Lemma~\ref{Qn-lemma} that $q'$ is of the form 
$\lambda x.\,\caseof{ga'r'o'}{\cdots}$, where $a'=\lambda.n'$,
$o'[g \mapsto G] \succeq x^\eta$ whenever $G$ is trivial at $n'$,
and $r'[g \mapsto \mu_{n'}(g)] \succeq^\prime p_{n'+1}$.
Write $q$ as $\lambda x.\,E[\caseof{garo}{\cdots}]$ where the displayed
occurrence of $g$ originates the head $g$ of $q'$ via the substitution
$^\dag = [g \mapsto \mu_{n,n'}(g)]$.

Suppose that the hole in $E[-]$ appeared within an abstraction $\lambda y.-$;
then the hole in $E[g \mapsto \mu_{n,n'}(g)][-]$ would likewise appear within such an abstraction.
Moreover, the evaluation of $q[g \mapsto \mu_{n,n'}(g)]$ consists simply of the contraction of
certain expressions $\mu_{n,n'}(g)(\lambda.m)r''o''$ to either $0$ or $g(\lambda.m)r''o''$,
followed by some reductions $\caseof{0}{i \Rightarrow e_i} \rightsquigarrow e_0$;
thus, any residue in $q'$ of the critical $g$ identified above will likewise appear underneath $\lambda y$.
But this is impossible, because the head $g$ of $q'$ is a residue of this $g$ by assumption.
This establishes condition~1.

In the light of this, by Lemma~\ref{g-lemma-1}(i) we have
$a' \approx a^\dag$, $o' \approx o^\dag$ and $r' \approx r^\dag$.
But since $q$ is simple, $a$ is a numeral, so $a=\lambda.n'$, giving condition~2.
For condition~3, suppose $G$ is trivial at $n,\ldots,n'$.
Then $G \succeq \mu_{n,n'+1}(G)$, so that
\[ o[g \mapsto G] ~\succeq~ o[g \mapsto \mu_{n,n'}(\mu_{n'}(G))]
   ~\approx~ o^\dag[g \mapsto \mu_{n'}(G)] ~\approx~ o'[g \mapsto \mu_{n'}(G)]
   ~\succeq~ x^\eta  \;, \]
since $\mu_{n'}(G)$ is trivial at $n'$.
Condition~4 also holds since 
$r'[g \mapsto \mu_{n'}(g)] \approx r^\dag[g \mapsto \mu_{n'}(g)] \approx r[g \mapsto \mu_{n,n'+1}(g)]$, where
$r'[g \mapsto \mu_{n'}(g)] \succeq^\prime p_{n'+1}$.
$\Box$
\end{proof}


\begin{corollary}  \label{spinal-cor}
If $g \vdash q$ is simple and $q \succeq^\prime p_n$, then $q$ is spinal
(in the modified sense).
\end{corollary}

\begin{proof}
Since condition~3 of the above lemma matches its hypotheses, 
starting from the assumption that 
$q \approx^\prime q[g \mapsto \mu_{n,n}(g)] \succeq^\prime p_n$,
we may apply the lemma iteratively to obtain a spinal structure as
prescribed by Definition~\ref{spinal-def} 
(subject to the relevant adjustments for $g : 0 \rightarrow (k+1) \rightarrow (k+1)$).
Note that at each level, a suitable substitution $^\circ$ will be the closed one that
specializes $g$ to $\lambda izx.0$ (which is trivial at all $n$), and all variables $x^k$
other than the innermost-bound one to $\bot$.
$\Box$
\end{proof}

\vspace*{1.0ex}
Thus, if there exists a ${\mathrm{PCF}}^\Omega_k$-denotable procedure $t \approx \lambda g.p_0$, for instance,
then by Proposition~\ref{simple-prop} there is also a simple such procedure $t'$
denotable in ${\mathrm{PCF}}^\Omega_k + {\mathit{byval}}$,
and by Corollary~\ref{spinal-cor}, this $t'$ will be spinal in the modified sense.
But this contradicts Theorem~\ref{no-gremlin-thm} 
(understood relative to the modified setting, and applied to ${\mathrm{PCF}}^\Omega_k + {\mathit{byval}}$ as indicated
at the end of Section~\ref{sec-Y-k+1}).
We conclude that no $t \approx \lambda g.p_0$ can be ${\mathrm{PCF}}^\Omega_k$-denotable.
Since the interpretation of ${\mathrm{PCF}}^\Omega_k$ in ${\mathsf{SF}}$ factors through ${\mathsf{SP}}^0$,
this in turn means that within the model ${\mathsf{SF}}$, the element $\sem{\lambda g.\,\Phi^0[g]}$
is not denotable in ${\mathrm{PCF}}^\Omega_k$.
On the other hand, this element is obviously denotable
relative to $Y_{{0}\rightarrow (k+1)} \in {\mathsf{SF}}$ even in ${\mathrm{PCF}}_1$,
so the proof of Theorems~\ref{operational-main-thm}(i) and \ref{SF-main-thm}(i) is complete.
This establishes Berger's conjecture, and also suffices for the proof of Corollary~\ref{no-finite-basis-cor}.


\section{Extensional non-definability of $Y_{k+1}$}  \label{sec-pure-type}

We have so far shown that the element $Y_{0 \rightarrow (k+1)} \in {\mathsf{SF}}$ is not ${\mathrm{PCF}}^\Omega_k$-denotable.
We shall now refine our methods slightly to show that 
even $Y_{k+1} \in {\mathsf{SF}}$ is not ${\mathrm{PCF}}^\Omega_k$-denotable. 
Since $\pure{k+1}$ is clearly a ${\mathrm{PCF}}_0$-definable retract of every level $k+1$ type,
this will establish that no $Y_\sigma \in {\mathsf{SF}}$ where ${\mathrm{lv}}(\sigma)=k+1$ is denotable in
${\mathrm{PCF}}^\Omega_k$.

The idea is as follows. Within each type level $k \geq 1$, we can stratify the types into 
\emph{sublevels} $(k,l)$ where $l = 1,2,\ldots$, essentially by taking account of the `width' of the type 
as well as its depth. 
We thus obtain a sublanguage ${\mathrm{PCF}}^\Omega_{k,l}$ of ${\mathrm{PCF}}^\Omega_k$ by admitting $Y_\sigma$
only for types $\sigma$ of sublevel $(k,l)$ or lower.
We show that, roughly speaking, all our previous methods can be adapted to show that for a given $k$,
the languages ${\mathrm{PCF}}^\Omega_{k,l}$ for $l = 1,2,\ldots$ form a strict hierarchy.
(This is true as regards definability in ${\mathsf{SP}}^0$; for ${\mathsf{SF}}$, we will actually show only that
${\mathrm{PCF}}^\Omega_{k,l}$ is strictly weaker than ${\mathrm{PCF}}^\Omega_{k,l+2}$.)
This more refined hierarchy is of some interest in its own right, and illustrates that the structure of ${\mathsf{SF}}$
is much richer than that manifested by the pure types.%
\footnote{This contrasts with the situation for extensional \emph{total} type structures over $\mathbb{N}$, for example.
There, under mild conditions, every simple type is isomorphic to a pure type: 
see Theorem~4.2.9 of \cite{Longley-Normann}.}

Any term of ${\mathrm{PCF}}^\Omega_k$ will be a term of some ${\mathrm{PCF}}^\Omega_{k,l}$.
Previously we showed only that no such term could define $Y_{0 \rightarrow (k+1)} \in {\mathsf{SF}}$;
however, we now see that there is actually plenty of spare headroom between ${\mathrm{PCF}}^\Omega_{k,l}$
and the pure type $k+1$.
Specifically, there are operators $Y_\sigma$ in ${\mathrm{PCF}}^\Omega_{k,l+2}$
that are not ${\mathrm{PCF}}^\Omega_{k,l}$-denotable; and since all such $\sigma$ are of level $\leq k$
and are thus easily seen to be retracts of $\pure{k+1}$,
we may conclude that $Y_{k+1} \in {\mathsf{SF}}$ is not ${\mathrm{PCF}}^\Omega_k$-denotable.

The following definition sets out the more fine-grained stratification of types that we shall use.

\begin{definition}  \label{width-def}
(i) The \emph{width} $w(\sigma)$ of a type $\sigma$ is defined inductively as follows:
\[  w(\nat) ~=~ 0 \;, ~~~~~~~~ 
w(\sigma_0,\ldots,\sigma_{r-1} \rightarrow \nat) ~=~ \max(r,w(\sigma_0),\ldots,w(\sigma_{r-1})) \;. \]
For $k,l \geq 1$, we say $\sigma$ has \emph{sublevel} $(k,l)$ if ${\mathrm{lv}}(\sigma)=k$ and $w(\sigma)=l$.
If $\sigma=\nat$, we simply say that $\sigma$ has \emph{sublevel} $0$.
We order sublevels in the obvious way: $0$ is the lowest sublevel,
and $(k,l) < (k',l')$ if either $k<k'$ or $k=k'$ and $l<l'$.

(ii) For each $k,l$ we define a type $\rho_{k,l}$ by:
\[ \rho_{0,l} ~=~ \nat \;, ~~~~~~~~
   \rho_{k+1,l} ~=~ \rho_{k,l},\ldots,\rho_{k,l} \rightarrow \nat \mbox{~~($l$ arguments)} \;. \]
When $k,l \geq 1$, we may call $\rho_{k,l}$ the \emph{homogeneous} type of sublevel $(k,l)$.
\end{definition}

The following facts are easily established. Here we shall say that $\sigma$ is a \emph{simple retract} of $\tau$
if there is a ${\mathrm{PCF}}_0$-definable retraction $\sigma \lhd \tau$ within ${\mathsf{SP}}^0$.

\begin{proposition}  \label{sublevel-facts}
(i) For $k \geq 1$, every type of sublevel $(k,l)$ or lower is a simple retract of $\rho_{k,l}$.
Hence, for all $k \geq 0$, for every finite list of types $\sigma_i$ of level $\leq k$, there is some $l$ such that
each $\sigma_i$ is a simple retract of $\rho_{k,l}$.

(ii) Every finite product of level $\leq k$ types is a simple retract of $\pure{k+1}$.

(iii) If $\sigma$ is a simple retract of $\tau$, then $Y_\sigma$ is ${\mathrm{PCF}}_0$-definable from $Y_\tau$ in ${\mathsf{SP}}_0$.
\end{proposition}

\begin{proof}
(i) The first claim (for $k \geq 1$) is easy by induction on $k$, and the second claim 
(which is trivial when $k=0$) follows easily.

(ii) By induction on $k$. The case $k=0$ is easy. For $k \geq 1$,
suppose $\sigma_0,\ldots,\sigma_{m-1}$ are level $\leq k$ types, 
where $\sigma_i = \tau_{i0},\ldots,\tau_{i(n_i-1)} \rightarrow \nat$ for each $i$.
Here the $\tau_{ij}$ are of level at most $k-1$, so by (i), 
we may choose $l$ such that each $\tau_{ij}$ is a simple retract of $\rho_{k-1,l}$.
Taking $n = \max_i n_i$, we then have that each $\sigma_i$ is a simple retract of
$\rho_{k-1,l},\ldots,\rho_{k-1,l} \rightarrow \nat$ (with $n$ arguments).
The product $\Pi_i \sigma_i$ is therefore a simple retract of the type
$\sigma = \nat,\rho_{k-1,l},\ldots,\rho_{k-1,l} \rightarrow \nat$.
But by the induction hypothesis, the product of $\nat,\rho_{k-1,l},\ldots,\rho_{k-1,l}$
is a simple retract of $\pure{k}$ whence $\sigma$ itself is a simple retract of $\pure{k+1}$.

We leave (iii) as an exercise.
$\Box$
\end{proof}

\vspace*{1.0ex}
Next, we adapt the proof of Theorem~\ref{no-retraction-thm} to establish the crucial gap
between $\rho_{k,l}$ and $\rho_{k,l+1}$. 
This gives an indication of how our methods may be used to map out the embeddability relation 
between types in finer detail, although we leave an exhaustive investigation of this to future work.

\begin{theorem}  \label{kl-no-retraction-thm}
Suppose $k \geq 1$. Within ${\mathsf{SF}}$, the type $\rho_{k,l+1}$ is not a pseudo-retract of any finite product
of types of sublevel $\leq (k,l)$ or lower.
\end{theorem}

\begin{proof}
In view of Proposition~\ref{sublevel-facts}(i), it will suffice to show that $\rho_{k,l+1}$ is not 
a pseudo-retract of a finite power of $\rho_{k,l}$.
We argue by induction on $k$. The arguments for both the base case $k=1$ and the step case $k>1$ 
closely parallel the argument for the step case in Theorem~\ref{no-retraction-thm}, 
so we treat these two cases together as far as possible, omitting details that are easy adaptations
of those in the earlier proof.

Suppose for contradiction that there were procedures
\[ z^\rho \vdash t_i : \sigma  ~~(i<m) \;, ~~~~~~~~
   x_0^{\sigma},\ldots,x_{m-1}^{\sigma} \vdash r : \rho \;, \]
 where $\rho = \rho_{k,l+1}$, $\sigma = \rho_{k,l}$,
 such that $z \vdash r[\vec{x} \mapsto \vec{t}\,] \succeq z^\eta$.
 Let $v =\, \dang{r[\vec{x} \mapsto \vec{t}\,]}$, so that $z \vdash v \succeq z^\eta$.
 As in the proof of Theorem~\ref{no-retraction-thm}, one may show that
 $v$ has the form $\lambda f_0 \ldots f_l.\,\caseof{zp_0 \ldots p_l}{\cdots}$
 where $p_i[z \mapsto \lambda \vec{w}.0] \succeq f_i$ for each $i$.   
Next, we note that $r[\vec{x} \mapsto \vec{t}\,]$ reduces in finitely many steps to a head normal form
$\lambda f_0 \ldots f_l.\, \caseof{zP_0\ldots P_l}{\cdots}$ where $\dang{P_i} \,= p_i$ for each $i$;
moreover, the ancestor of the leading $z$ here must lie within some $t_i$, say at the head of
some subterm $z\vec{P}\,'$, where $\vec{P}$ is an instance of $\vec{P}\,'$ via some substitution $^\dag$.

At this point, the arguments for the base case and step case part company.
In the base case $k=1$, we have that $z^\rho \vdash t_i : \sigma$ 
where $\rho = \nat^{l+1} \rightarrow \nat$ and $\sigma = \nat^l \rightarrow \nat$;
thus there are no bound variables within $t_i$ except the top-level ones---say $w_0,\ldots,w_{l-1}$,
all of type $\nat$. So in fact $^\dag$ has the form $[\vec{w} \mapsto \vec{W}]$ for certain meta-terms
$f_0^\nat,\ldots,f_l^\nat,z \vdash W_j : \nat$. 
Now consider the terms $f_0,\ldots,f_l \vdash \vec{W}^*$ and $\vec{w} \vdash \vec{P}\,' {}^*$,
writing $^*$ for the substitution $[z \mapsto \lambda w.0]$.
These compose to yield $\vec{f} \, \vdash \, \dang{\vec{P}\,^*} \,=\, \dang{\vec{p}\,^*} \,\succeq f$,
so we have expressed $\nat^{l+1}$ as a pseudo-retract of $\nat^l$ within ${\mathsf{SF}}$.
As already noted in the course of the proof of Theorem~\ref{no-retraction-thm}, this is impossible.

For the step case $k>1$, we proceed much as in the proof of Theorem~\ref{no-retraction-thm}, 
using the substitution $^\dag$ to express $\rho_{k-1,l+1}$ as a retract of a finite product of types
of sublevel $(k-1,l)$ or lower, contrary to the induction hypothesis. We leave the remaining details
as an exercise.
$\Box$
\end{proof}

\vspace*{1.0ex}
We now outline how the ideas of Sections~\ref{sec-denotations}, \ref{sec-Y-k+1} and \ref{sec-extensional}
may be adapted to show that $Y_{\rho_{k,l+1}} \in {\mathsf{SP}}^0$ is not ${\mathrm{PCF}}^\Omega_{k,l}$-denotable.
We assume that $k \geq 2$ for the time being (the case $k=1$ will require special treatment).
The adaptations are mostly quite systematic: the type $\rho_{k,l+1}$ 
now plays the role of $\pure{k+1}$; types of sublevel $\leq (k,l)$ play the role of types of level $\leq k$;
$\rho_{k-1,l+1}$ plays the role of $\pure{k}$; and types of sublevel $\leq (k-1,l)$ play the role of types
of level $<k$. Since the proof we are adapting is quite lengthy, we leave many routine details to be checked
by the reader.

First, the evident adaptation of Definition~\ref{plugging-def} yields a notion of
\emph{$k,l$-plugging} where the plugging variables are required to be of sublevel $\leq (k,l)$,
and we thus obtain an inductive characterization of the ${\mathrm{PCF}}^\Omega_{k,l}$-denotable procedures
analogous to Theorem~\ref{denotable-inductive-thm}.
We also adapt the notion of \emph{regular} meta-term as follows:

\begin{definition}
Suppose $g : \rho_{k,l+1} \rightarrow \rho_{k,l+1}$ for $k,l \geq 1$.
An environment $\Gamma$ is \emph{$g$-($k,l$-)regular} if $\Gamma$ contains $g$ but no other variables
of sublevel $> (k,l)$. A meta-term $T$ is $g$-regular if it contains no variables of sublevel $> (k,l)$
except possibly for free occurrences of $g$. We say $\Gamma \vdash T$ is $g$-regular
if both $\Gamma$ and $T$ are $g$-regular.
\end{definition}

Next, we proceed to the ideas of Section~\ref{sec-Y-k+1}.
Our convention here will be that $\Gamma$ ranges over regular environments, and Roman capitals $V,Z$
range over sets of variables of sublevel $\leq (k,l)$. 
The notions of $x,V$-closed substitution and spinal term carry over as follows:

\begin{definition}  \label{kl-spinal-def}
Suppose $g : \rho_{k,l+1} \rightarrow \rho_{k,l+1}$ where $k \geq 2$ and $l \geq 1$.

(i) If $\vec{x}$ is a list of variables of type $\rho_{k-1,l+1}$ and $V$ a set of variables of sublevel
$\leq (k,l)$, a substitution $^\circ = [\vec{w} \mapsto \vec{r}\,]$ is called \emph{$\vec{x},V$-closed}
if the $r_i$ contain no free variables, except that if $w_i \in V$ and $w_i$ is of sublevel $\leq (k-1,l)$
then $r_i$ may contain the $\vec{x}$ free.

(ii) Suppose $\Gamma \vdash e$ is $g$-$k,l$-regular and $\vec{x},V \subseteq \Gamma$.
We coinductively declare $e$ to be \emph{$g$-head-spinal} 
w.r.t.\ $\vec{x},V$ iff $e$ has the form 
$\caseof{g(\lambda \vec{x}\,'.E[e'])\vec{o}}{\cdots}$, where $E[-]$ is an expression context, and 
\begin{itemize}
\item for some $\vec{x},V$-closed specialization $^\circ$ covering the free variables of $\vec{o}$ 
other than those of $\vec{x}$, we have $\vec{o}\,^\circ \succeq \vec{x}^{\,\eta}$,
\item $e'$ is $g$-head-spinal w.r.t.\ $\vec{x}\,',V'$, where $V'$ is the local variable environment for $E[-]$.
\end{itemize}

(iii) We say a regular term $\Gamma \vdash t$ is \emph{$g$-spinal} 
if it contains a $g$-head-spinal subexpression w.r.t.\ some $\vec{x},V$.
\end{definition}

Lemma~\ref{g-lemma-1} and its proof go through with the above adaptations;
here the local environments $\vec{v}, \vec{v}\,'$ are now of sublevel $\leq (k,l)$, and part~(iii)
of the lemma now states that if $K[-]$ contains no redexes with operator of sublevel $> (k,l)$, then
the substitution $^\dag$ is trivial for variables of sublevel $\geq (k-1,l+1)$.
The crucial Lemma~\ref{g-lemma-2}, which forms the heart of our proof, now translates as follows:

\begin{lemma}  \label{kl-lemma-2}
Suppose $g: \rho_{k,l+1} \rightarrow \rho_{k,l+1}$ and we have $g$-regular terms
\[ \Gamma,\vec{v} \;\vdash~ d ~=~ \caseof{gpq}{\cdots} \;, ~~~~~~
   \Gamma,\vec{v}\,' \vdash \vec{s} \,, \] 
where $\vec{v},\vec{v}\,'$ are of sublevel $\leq (k,l)$, and
$\Gamma,\vec{v}\,' \vdash\, \dang{d[\vec{v}\mapsto\vec{s}\,]}$ 
is $g$-head-spinal with respect to some $\vec{x},V$.
Then $d$ itself is $g$-spinal.
\end{lemma}

The entire proof of this lemma translates systematically according to the correspondences we have indicated,
invoking Theorem~\ref{kl-no-retraction-thm} for the fact that
$\rho_{k-1,l+1}$ is not a pseudo-retract of a product of sublevel $\leq (k-1,l)$ types.
The analogue of Theorem~\ref{no-gremlin-thm} now goes through readily, so we obtain:

\begin{theorem}
If $k \geq 2$ and $l \geq 1$, every ${\mathrm{PCF}}^\Omega_{k,l}$-denotable procedure is non-$g$-spinal
where $g: \rho_{k,l+1} \rightarrow \rho_{k,l+1}$.
$\Box$
\end{theorem}

As in our original proof, we will actually use a version of this theorem for the modified notion of 
spinal term that incorporates the extra argument $b$, and for the extension of ${\mathrm{PCF}}^\Omega_{k,l}$
with the operator ${\mathit{byval}}$.

To adapt the material of Section~\ref{sec-extensional}, we now take $g$ to be a variable of type
$0 \rightarrow \rho_{k,l+1} \rightarrow \rho_{k,l+1}$, and argue that the ${\mathrm{PCF}}_{k,l+2}$-denotable element
$\Phi = \lambda g.\, Y_{0 \rightarrow \rho_{k,l+1}} (\lambda fn.\,g\,n\,(f({\mathit{suc}}\,n)))$ within ${\mathsf{SF}}$
is not ${\mathrm{PCF}}^\Omega_{k,l}$-denotable. The proof is a completely routine adaptation of that in
Section~\ref{sec-extensional}. Since $Y_{0 \rightarrow \rho_{k,l+1}}$ is readily definable from $Y_{k+1}$
by Proposition~\ref{sublevel-facts}, this implies that $Y_{k+1} \in {\mathsf{SF}}$ is not ${\mathrm{PCF}}^\Omega_{k,l}$-denotable.
We have thus shown:

\begin{theorem}
(i) For $k \geq 2$ and $l \geq 1$, the element $Y_{0 \rightarrow \rho_{k,l+1}} \in {\mathsf{SF}}$ is denotable
in ${\mathrm{PCF}}_{k,l+2}$ but not in ${\mathrm{PCF}}^\Omega_{k,l}$.

(ii) For $k \geq 2$, the element $Y_{k+1} \in {\mathsf{SF}}$ is not denotable in ${\mathrm{PCF}}^\Omega_k$.
\end{theorem}

A slightly different approach is needed for the case $k=1$. This is because at level $0$ our only
type is $\nat$, so we are unable to make a distinction between sublevels $l$ and $l+1$.
To establish Lemma~\ref{kl-lemma-2} in this case, we again wish to show that we cannot pass in
the content of the relevant $\vec{x}\,'$ to the relevant $s_i$, but now the idea is to appeal to the
fact that $\vec{x}\,'$ consists of $l+1$ variables of type $\nat$, whereas $s_i$ accepts at most $l$ 
arguments of type $\nat$. (We have already seen that $\nat^{l+1}$ cannot be a retract of $\nat^l$.)
However, we also need to exclude the possibility that the substitution $^\circ$
is being used to import some components of $\vec{x}\,'$.
We can achieve this if we require $^\circ$ to be actually closed rather than just $\vec{x}\,',V'$-closed, 
and it turns out that this is permissible if we also tighten our notion of spinal term slightly, 
essentially to ensure that no intermediate applications
of $g$ appear in between those declared to constitute the spine of the term:

\begin{definition}  \label{strongly-spinal-def}
Suppose $g : \rho_{1,l+1} \rightarrow \rho_{1,l+1}$ where $l \geq 1$.


Suppose $\Gamma \vdash e$ is $g$-$1,l$-regular and $\vec{x} \subseteq \Gamma$.
We coinductively declare $e$ to be \emph{strongly $g$-head-spinal} w.r.t.\ $\vec{x}$
iff $e$ has the form $\caseof{g(\lambda \vec{x}\,'.E[e'])\vec{o}}{\cdots}$, 
where $E[-]$ is an expression context, and 
\begin{itemize}
\item the hole within $E[-]$ does not itself occur within an application $gpq$,
\item for some closed substitution $^\circ$ covering the free variables of $\vec{o}$ other than
those of $\vec{x}$, we have $\vec{o}\,^\circ \succeq \vec{x}^{\,\eta}$,
\item $e'$ is strongly $g$-head-spinal w.r.t.\ $\vec{x}\,'$.
\end{itemize}
The notion of strongly $g$-spinal term follows suit.
\end{definition}

The counterpart of Lemma~\ref{g-lemma-1} goes through as expected, although without part~(iii):
the relevant sublevel distinction does not exist at type level $0$, and we cannot conclude that the
substitution in question is trivial for all variables of type $\nat$.
We may now indicate the required changes to Lemma~\ref{kl-lemma-2} and its proof:

\begin{lemma}
Suppose $g: \rho_{1,l+1} \rightarrow \rho_{1,l+1}$ and we have $1,l$-regular terms
\[ \Gamma,\vec{v} \;\vdash~ d ~=~ \caseof{gpq}{\cdots} \;, ~~~~~~
   \Gamma,\vec{v}\,' \vdash \vec{s} \,, \] 
where $\vec{v},\vec{v}\,'$ are of sublevel $\leq (1,l)$, and
$\Gamma,\vec{v}\,' \vdash\, \dang{d[\vec{v}\mapsto\vec{s}\,]}$ 
is strongly $g$-head-spinal with respect to some $\vec{x}$.
Then $d$ itself is strongly $g$-spinal.
\end{lemma}

\begin{proof}
The proof of Lemma~\ref{g-lemma-2} up to the end of the proof of Claim~1 adapts straightforwardly, 
and is somewhat simplified by the fact that the substitution $^\circ$ is closed.
As sketched above, the crucial contradiction is provided by the fact that $\nat^{l+1}$ is not a pseudo-retract
of $\nat^l$ in ${\mathsf{SF}}$.

For the remainder of the proof, the key point to note is that $\vec{v}\,''$ (the local environment for $C[-]$)
is actually empty in this case. This is because $C[-]$ is in normal form and contains no free variables
of level $\geq 2$ except $g$, so any $\lambda$-term containing the hole would need to appear as an
argument to $g$. It would then follow that the hole within $E[-]$ lay within an argument to some occurrence of $g$, as precluded by the definition of strongly spinal term. 

It follows trivially that the $\vec{x}\,',\vec{v}\,''$-closed substitution $^{\circ'}$ constructed at the very end
of the proof is actually closed. Moreover, the spinal structure of $d'$ identified by the proof cannot contain
any intermediate applications of $g$, since these would give rise under evaluation to intermediate applications
of $g$ in the spine of $\dang{d[\vec{v}\mapsto\vec{s}\,]}$ as precluded by 
Definition~\ref{strongly-spinal-def}.
Thus, the identified spinal structure in $d'$ is actually a strongly spinal structure, 
and the argument is complete.
$\Box$
\end{proof}

\vspace*{1.0ex}
A trivial adaptation of the proof of Theorem~\ref{no-gremlin-thm} now yields:

\begin{theorem}
No ${\mathrm{PCF}}^\Omega_{1,l}$-denotable procedure can be strongly $g$-spinal
where $g: \rho_{1,l+1} \rightarrow \rho_{1,l+1}$. 
$\Box$
\end{theorem}

As before, this adapts easily to a variable $g$ of type $0 \rightarrow \rho_{1,l+1} \rightarrow \rho_{1,l+1}$.
From here on, we again follow the original proof closely. 
The only additional point to note is that in place of Corollary~\ref{spinal-cor}
we now require that any simple $g \vdash q$ with $q \succeq' p_n$ must be \emph{strongly} spinal,
but this is already clear from the proof of Lemma~\ref{q-structure-lemma}.
We therefore have everything we need for:

\begin{theorem}
(i) For any $l \geq 1$, $Y_{\nat^{l+2} \rightarrow \nat} \in {\mathsf{SF}}$ is
not denotable in ${\mathrm{PCF}}^\Omega_{1,l}$.

(ii) $Y_2 \in {\mathsf{SF}}$ is not denotable in ${\mathrm{PCF}}^\Omega_1$. 
$\Box$
\end{theorem}

The proof of Theorems~\ref{operational-main-thm} and \ref{SF-main-thm} is now complete.


\section{Related and future work}  \label{sec-further-work}

\subsection{Other hierarchies of Y-combinators}

There have been a number of previous results from various research traditions showing that in some sense
the power of level $k$ recursions increases strictly with $k$.
Whilst many of these results look tantalizingly similar to ours,
it turns out on inspection that their mathematical substance is quite different, and 
we do not expect any substantial technical connections with our own work to be forthcoming.
Nonetheless, it is interesting to see how strikingly different ideas and methods arising in other contexts 
can lead to superficially similar results.

Previous results to the effect that $Y$-combinators for level $k+1$ are not definable
from those for level $k$ have been obtained by Damm \cite{Damm-IO} and 
Statman \cite{Statman-lambda-Y}. It is convenient to discuss the latter of these first.
Statman works in the setting of the simply typed \emph{$\lambda Y$-calculus}, essentially the
pure $\lambda$-calculus extended with constants $Y_\sigma : (\sigma\rightarrow\sigma)\rightarrow\sigma$
and reduction rules $Y_\sigma M \rightsquigarrow M(Y_\sigma M)$. 
He gives a succinct proof that $Y_{k+1}$ is not definable from $Y_k$ 
up to computational equality, based on the following idea. 
If $Y_{k+1}$ were definable from $Y_k$, it would follow that 
the recursion equation $Y_{k+1} g = g(Y_{k+1} g)$ could be derived with only finitely many
uses of the equation $Y_k M = M(Y_k M)$ (say $m$ of them).
It would then follow, roughly speaking, that $mn$ recursion unfoldings for $Y_k$ 
would suffice to fuel $n$ recursion unfoldings for $Y_{k+1}$. 
On the other hand, it can be shown that the size of normal-form terms definable 
using $n$ unfoldings of $Y_{k+1}$ (as a function of the size of the starting term) grows faster than
can be accounted for with $mn$ unfoldings of $Y_k$.

The language $\lambda Y$ is seemingly less powerful than ${\mathrm{PCF}}$,%
\footnote{One might consider translating ${\mathrm{PCF}}$ into $\lambda Y$ by representing natural numbers as Church numerals; 
however, it appears that predecessor is not $\lambda Y$-definable for this representation.}
although this is perhaps not the most essential difference between Statman's work and ours.
More fundamentally, Statman's method establishes the non-definability only up to computational
equality (that is, the equivalence relation generated by the reduction rules), 
whereas we have been concerned with non-definability modulo observational 
(or extensional) equivalence.
Even for non-denotability in ${\mathsf{SP}}^0$, an approach along Statman's lines 
would be unlikely to yield much information,
since there is no reason why the number of unfoldings of $Y_k$ required to generate 
the NSP for $Y_{k+1}$ to depth $n$ should not grow dramatically as a function of $n$.

A result very similar to Statman's was obtained earlier in Damm \cite{Damm-IO}, 
but by a much more indirect route as part of a far-reaching investigation of the 
theory of tree languages.
In Damm's setting, programs are \emph{recursion schemes}---%
essentially, families of simultaneous (and possibly mutually recursive) defining
equations in typed $\lambda$-calculus---but in essence these can be considered as terms of $\lambda Y$
relative to some signature consisting of typed constants. 
(Actually, Damm's $\lambda$-terms involve a restriction to \emph{derived types}, 
which has the effect of limiting attention to what are elsewhere called \emph{safe} recursion schemes.)
Any such program can be expanded to an
infinite tree (essentially a kind of B\"ohm tree), and Damm's result (Theorem~9.8 of \cite{Damm-IO}) 
is that if programs are considered up to \emph{tree equality}, 
then safe level $k+1$ recursions give more expressive power than safe level $k$ ones.
Damm's result is thus distinguished from Statman's in two ways: by the restriction to safe recursion schemes,
and by the use of tree equality in place of the stricter computational equality.
This latter point brings Damm's work somewhat closer in spirit to our work:
indeed, in the case of pure $\lambda Y$, tree equality agrees with equality of innocent strategies
if the base type is interpreted as a certain trivial game---or equivalently with equality in a variant of 
our ${\mathsf{SP}}^0$ with no ground type values.
However, in the case of a signature for ${\mathrm{PCF}}$, tree equality will still be considerably more fine-grained than
equality in ${\mathsf{SP}}^0$, let alone in ${\mathsf{SF}}$, since in effect the ${\mathrm{PCF}}$ constants are left uninterpreted.
It therefore seems unlikely that an approach to our theorems via Damm's methods is viable.

The strictness of the recursion scheme hierarchy was further investigated by Ong
\cite{Ong-model-checking}, who used innocent game semantics to show that the complexity of
certain model checking problems for trees generated by level $k$ recursions increases strictly with $k$.
It was also shown by Hague, Murawski, Ong and Serre \cite{Collapsible-PDA} 
that the trees generated by level $k$ recursion schemes (with no safety restriction) 
were precisely those that could be generated by \emph{collapsible pushdown automata} of order $k$.
Later, Kartzow and Parys \cite{Kartzow-Parys} used pumping lemma techniques to show that
the collapsible pushdown hierarchy was strict, hence so was the recursion scheme hierarchy
with no safety restriction.
Again, despite the intriguing parallels to our work, these results appear to be manifesting something 
quite different:
in our setting, the power of level $k$ recursion has nothing to do with the `difficulty of computing' the relevant
NSPs in the sense of automata theory, since the full power of Turing machines is required even at level~1.

Also of interest is the work of Jones \cite{Jones-cons-free} from the functional programming community.
Jones's motivation is close to ours in that he seeks to give mathematical substance to the programming
intuition that some (combinations of) language features yield 
`more expressive power' than others.
As Jones notes, an obvious obstacle to obtaining such results is that all programming languages
of practical interest are Turing complete, so that no such expressivity distinctions are visible at the level of the
(first-order) computable functions.
Whereas we have responded to this by considering the situation at higher types, where genuine 
expressivity differences do manifest themselves, Jones investigates the power of (for example)
recursion at different type levels in the context of a restricted language of `read-only' or `cons-free' programs.
Amongst other results, he shows that in such a language, if data values and general recursion of type level 
$k \geq 1$ are admitted, then the computable functions from lists of booleans to booleans are exactly the
{\sc{exp}$^k$\sc{time}} computable ones.
These results inhabit a mathematical territory very different from ours, although yet again,
the basic moral that the power of recursion increases strictly with its type level shines through.


\subsection{Relationship to game semantics}

Next, we comment briefly how our work relates
to the known \emph{game models} of PCF \cite{AJM,Hyland-Ong}.
It is known that these models are in fact isomorphic to our ${\mathsf{SP}}^0$,
although the equivalence between the game-theoretic definition of application and our own
is mathematically non-trivial (see \cite[Section~7.4]{Longley-Normann}).
This raises the obvious question of whether our proofs could be conducted equally well, or better, in a game-semantical setting. 

Whilst a direct translation is presumably possible, our present impression 
is that the sequential procedure presentation, and our calculus of meta-terms in particular, 
allows one to see the wood for the trees more clearly, 
and also to draw more easily on familiar intuitions from $\lambda$-calculus.
However, a closer look at game-semantical approaches would be needed in order to judge
whether either approach really offers some genuine mathematical or conceptual advantage over the other.

\subsection{Sublanguages of ${\mathrm{PCF}}$: further questions}  \label{sublang-questions}

We now turn our attention to some potential extensions and generalizations of our work.


So far, we have worked mainly with a coarse stratification of types in terms of their levels, 
although we have illustrated in Section~\ref{sec-pure-type} how finer stratifications are also possible.
Naturally, there is scope for a still more fine-grained analysis of types and the relative strength of their
$Y$-combinators; this is of course closely related to the task of mapping out the embeddability relation 
between types (as in Subsection~\ref{subsec-embed}) in finer detail.

Even at level~1, there is some interest here.
Our analysis in Section~\ref{sec-pure-type} has shown that, for $l \geq 1$,
\begin{itemize}
\item the element $Y_{\nat^{l+1} \rightarrow \nat} \in {\mathsf{SP}}^0$ is not ${\mathrm{PCF}}^\Omega_0$-definable
from $Y_{\nat^l \rightarrow \nat}$,
\item  the element $Y_{\nat^{l+2} \rightarrow \nat} \in {\mathsf{SF}}$ is not ${\mathrm{PCF}}^\Omega_0$-definable
from $Y_{\nat^l \rightarrow \nat}$.
\end{itemize}

However, this leaves us with a small gap for ${\mathsf{SF}}$: e.g.\ we have not shown either that
$Y_{\nat\rightarrow\nat}$ is strictly weaker than $Y_{\nat^2\rightarrow\nat}$ or that
$Y_{\nat^2\rightarrow\nat}$ is strictly weaker than $Y_{\nat^3\rightarrow\nat}$,
although according to classical logic, at least one of these must be the case.
(This is reminiscent of some well-known situations from complexity theory.)
We expect that a more delicate analysis will allow us to fill this gap.
One can also envisage an even more fine-grained hierarchy obtained by admitting other base types
such as the booleans or unit type.

A closely related task is to obtain analogous results for the \emph{call-by-value} 
interpretation of the simple types (as embodied in Standard~ML, for example).
As is shown in \cite[Section~4.3]{Longley-Normann}, a call-by-value (partial) type structure
${\mathsf{SF}}_v$ can be constructed by fairly general means from ${\mathsf{SF}}$:
here, for example, ${\mathsf{SF}}_v(\pure{1})$ consists of all partial functions $\mathbb{N} \rightharpoonup \mathbb{N}$
rather than (monotone) total functions $\mathbb{N}_\bot \rightarrow \mathbb{N}_\bot$, 
and ${\mathsf{SF}}_v(\pure{2})$ consists of partial functions $\mathbb{N}_\bot^\mathbb{N} \rightharpoonup \mathbb{N}$.
From known results on the interencodability of call-by-name and call-by-value models
(see \cite[Section~7.2]{Longley-Normann}), it is easy to read off the analogue of 
Corollary~\ref{no-finite-basis-cor} for ${\mathsf{SF}}_v^{\mbox{\scriptsize \rm eff}}$; 
however, more specific results on the relative strengths of various $Y$-combinators
within ${\mathsf{SF}}_v$ would require some further reworking of our arguments.%

Of course, one can also pose relative definability questions for other elements besides 
$Y$-combinators.
For instance, it is natural to consider the \emph{higher-order primitive recursors} ${\mathit{rec}}_\sigma$
of System~T, as well as the closely related \emph{iterators} ${\mathit{iter}}_{\sigma\tau}$:
\begin{eqnarray*}
{\mathit{rec}}_\sigma & : & 
  \sigma \rightarrow (\sigma \rightarrow \nat \rightarrow \sigma) \rightarrow \nat \rightarrow \sigma \;, \\
{\mathit{iter}}_{\sigma\tau} & : & (\sigma\rightarrow (\sigma + \tau)) \rightarrow \sigma \rightarrow \tau \;.
\end{eqnarray*}
The idea behind the latter is to embody the behaviour of a \emph{while} construct for
imperative-style loops with state $\sigma$ and exit type $\tau$.%
\footnote{The sum type $\sigma + \tau$ is not officially part of our system, but can 
(for any given $\sigma,\tau$) be represented as a retract of some existing simple type.}


It is shown in \cite[Section~6.3]{Longley-Normann} that each ${\mathit{rec}}_\sigma$ may be
interpreted by a \emph{left-well-founded} procedure (cf.\ Subsection~\ref{subsec-power-pcf1}),
and it is not hard to check that the same is true for each ${\mathit{iter}}_{\sigma\tau}$.
Furthermore, it is clear that all left-well-founded procedures are non-spinal,
so it requires only the addition of an easy base case in the proof of Theorem~\ref{no-gremlin-thm}
to show that $Y_{k+1}$ cannot be defined 
even in ${\mathrm{PCF}}_k^\Omega$ extended with all primitive recursors and iterators.
(Actually, we can dispense with the primitive recursors here, 
as it is straightforward to define them from suitable iterators.)

The dual question, roughly speaking, is whether any or all of the recursors ${\mathit{rec}}_\sigma$ 
or iterators ${\mathit{iter}}_{\sigma\nat}$ for types $\sigma$ of level $k+1$ are definable in ${\mathrm{PCF}}_k$.
We conjecture that they are not, and that this could be shown by suitably 
choosing a substructure of ${\mathsf{SP}}^0$ so as to exclude such ${\mathit{iter}}_{\sigma\nat}$.
(This would incidentally answer Question~2 of \cite[Section~5]{Berger-min-recursion}.)
One could also look for substructures that more precisely determine the strength of 
various \emph{bar recursion} operators or the \emph{fan functional}.
All in all, our experience leads us to expect that many further substructures of ${\mathsf{SP}}^0$ 
should be forthcoming, leading to a harvest of non-definability results exhibiting a rich
`degree structure' among ${\mathrm{PCF}}$-computable functionals.

Another very natural kind of question is the following: given a particular sublanguage ${\mathcal{L}}$
of ${\mathrm{PCF}}$, what is the \emph{simplest} possible type for an element of ${\mathsf{SF}}^{\mbox{\scriptsize \rm eff}}$ that is 
not denotable in ${\mathcal{L}}$? Or to look at it another way: given a type $\sigma$, what is the
smallest natural sublanguage of ${\mathrm{PCF}}$ that suffices for defining all elements of 
${\mathsf{SF}}^{\mbox{\scriptsize \rm eff}}(\sigma)$?
Here the analysis of \cite[Section~7.1]{Longley-Normann} yields several
positive definability results, whilst the analysis of the present paper provides ammunition on the
negative side. The current state of our knowledge is broadly as follows.
As in \cite{Longley-Normann}, we write ${\Klex^\smallmin}$ of the language of \emph{Kleene $\mu$-recursion}:
this is equivalent (in its power to denote elements of ${\mathsf{SF}}$) 
to ${\mathrm{PCF}}_0$ extended with a strict primitive recursor for ground type and a strict iterator for ground type, 
but with no form of general recursion.

\begin{itemize}
\item For first-order types $\sigma$, even ${\Klex^\smallmin}$ suffices for defining
all elements of ${\mathsf{SF}}^{\mbox{\scriptsize \rm eff}}(\sigma)$; likewise, the oracle version ${\Klex^\smallmin}^\Omega$
suffices for ${\mathsf{SF}}(\sigma)$.
\item For second-order types $\sigma$ of the special form $(\nat\rightarrow\nat)^r \rightarrow \nat$,
${\Klex^\smallmin}^\Omega$ still suffices for ${\mathsf{SF}}(\sigma)$; however, this result is non-constructive,
and ${\Klex^\smallmin}$ does not suffice for ${\mathsf{SF}}^{\mbox{\scriptsize \rm eff}}(\sigma)$.
(We draw here on some recent work of Dag Normann \cite{Normann-sequential-dcpo}.)
\item For general second-order types, ${\Klex^\smallmin}^\Omega$ no longer suffices, but 
the languages ${\mathrm{PCF}}_1$, ${\mathrm{PCF}}_1^\Omega$ suffice for ${\mathsf{SF}}^{\mbox{\scriptsize \rm eff}}$, ${\mathsf{SF}}$ respectively---%
indeed, even the single recursion operator $Y_{\nat\rightarrow\nat}$ is enough here.
\item For third-order types, we do not know whether ${\mathrm{PCF}}_1$ suffices (for ${\mathsf{SF}}^{\mbox{\scriptsize \rm eff}}$).
We do know that ${\mathrm{PCF}}_2$ suffices, and that $Y_{\nat\rightarrow\nat}$ alone is not enough.
\item For types of level $k \geq 4$, ${\mathrm{PCF}}_{k-3}$ does not suffice,
but ${\mathrm{PCF}}_{k-2}$ does.
\end{itemize}
Again, there is scope for a more fine-grained view of the hierarchy of types. 





\subsection{Other languages and models}

We have so far concentrated almost entirely on ${\mathrm{PCF}}$-style sequential computation.
To conclude, we briefly consider which other notions of higher-order computation
are likely to present us with an analogous situation.

As already noted at the end of Subsection~\ref{subsec-embed}, 
several extensions of ${\mathrm{PCF}}$ studied in the literature present a strikingly different picture:
in these settings, universal types exist quite low down, and as a consequence, 
only $Y$-combinators of low type (along with the
other constants of the language) are required for full definability. 
There is, however, one important language which appears to be more analogous 
to pure ${\mathrm{PCF}}$ in these respects, namely an extension with \emph{local state} 
(essentially Reynolds's \emph{Idealized Algol}).
This language was studied in \cite{Game-semantics}, where a fully abstract game model 
was provided (consisting of well-bracketed but possibly non-innocent strategies).
Unpublished work by Jim Laird has shown that there is no universal type 
in this setting.
We would conjecture also that the recursion hierarchy for this language is strict,
where we consider expressibility modulo observational equivalence in Idealized Algol.
This would constitute an interesting variation on our present results.

Related questions also arise in connection with \emph{total} functionals.
Consider, for example, the type structure ${\mathsf{Ct}}$ of total continuous functionals,
standardly constructed as the extensional collapse (relative to $\mathbb{N}$) of 
the Scott model ${\mathsf{PC}}$ of partial continuous functionals.
It is shown by Normann \cite{Normann2000} that every effective element of ${\mathsf{Ct}}$
is represented by a \emph{${\mathrm{PCF}}$-denotable} element of ${\mathsf{PC}}$, and the proof actually
shows that ${\mathrm{PCF}}_1$ suffices here. (The further generalization of these ideas by
Longley \cite{Ubiquity} makes some use of second-order recursions as in ${\mathrm{PCF}}_2$;
we do not know whether these can be eliminated.) Thus, in this setting,
only recursions of low type are needed to obtain all computable functionals.
Similar remarks apply to the total type structure ${\mathsf{HEO}}$, obtained as the extensional collapse
of ${\mathsf{PC}}^{\mbox{\scriptsize \rm eff}}$.

On the other hand, one may consider the \emph{Kleene computable} functionals over ${\mathsf{Ct}}$,
or over the full set-theoretic type structure ${\mathsf{S}}$, as defined by the schemes S1--S9.
As explained in \cite[Chapter~6]{Longley-Normann}, sequential procedures can be
seen as abstracting out the algorithmic content common to both ${\mathrm{PCF}}$-style and
Kleene-style computation (note that Kleene's S9 in some sense does duty for the 
$Y$-combinators of ${\mathrm{PCF}}$). 
This naturally suggests that our strict hierarchy for ${\mathrm{PCF}}$ may have an analogue for
the Kleene computable functionals (say over ${\mathsf{S}}$ or ${\mathsf{Ct}}$), where at level $k$ we consider
the evident restriction of S9 to types of level $\leq k$.
We conjecture that this is indeed the case, although the required counterexamples may be
more difficult to find given that we are limited to working with total functionals.


\section*{Appendix A: Super-identity procedures}

In the course of our proof, we have frequently encountered assertions of the form
$p \succeq x^\eta$ for various procedures $x^k \vdash p$.
Although not necessary for our main argument, it is natural to ask whether there are
any such procedures other than those for which $p \approx x^\eta$.
The following theorem shows that the answer is no: in other words,
no procedure $\lambda x.p : \pure{k} \rightarrow \pure{k}$  
can extensionally `improve on' the identity function.
We present this as a result of some interest in its own right,
whose proof is perhaps less trivial than one might expect.

Recall that $\preceq$ denotes the extensional order on ${\mathsf{SF}}$,
as well as the associated preorder on ${\mathsf{SP}}^0$.
Within ${\mathsf{SF}}$, we will write $f \prec f'$ to mean $f \preceq f'$ but $f \neq f'$;
we also write $f \sharp f'$ to mean that $f,f'$ have no upper bound
with respect to $\preceq$.

We call an element of ${\mathsf{SP}}^0$ \emph{finite} if it is a finite tree once branches
of the form $i \Rightarrow \bot$ have been deleted.
We say an element of ${\mathsf{SF}}$ is \emph{finite} if it is represented by some finite
element of ${\mathsf{SP}}^0$.
We write ${\mathsf{SP}}^{0,{\mbox{\scriptsize \rm fin}}}(\sigma), {\mathsf{SF}}^{\mbox{\scriptsize \rm fin}}(\sigma)$ for the set of finite elements
in ${\mathsf{SP}}^0(\sigma), {\mathsf{SF}}(\sigma)$ respectively.

\begin{theorem}  \label{super-identity-thm}
(i) If $f \in {\mathsf{SF}}^{\mbox{\scriptsize \rm fin}}(k)$ and $f \prec f'$, then there exists $f'' \sharp f'$
with $f \prec f''$.

(ii) If $\Phi \in {\mathsf{SF}}(k \rightarrow k)$ and $\Phi \succeq {\mathit{id}}$,
there can be no $f \in {\mathsf{SF}}(k)$ with $\Phi(f) \succ f$; hence $\Phi = {\mathit{id}}$.
\end{theorem}

\begin{proof}
(i) The cases $k=0,1$ are easy, so let us assume $k \geq 2$.
Suppose $f \prec f'$ where $f$ is finite.
Then for some $g \in {\mathsf{SF}}(k-1)$ we have $f(g)=\bot$ but $f'(g)=n \in \mathbb{N}$, say,
and by continuity in ${\mathsf{SP}}^0$ we may take $g$ here to be finite.
Take $p,q \in {\mathsf{SP}}^{0,{\mbox{\scriptsize \rm fin}}}$ representing $f,g$ respectively;
we may assume that $p,q$ are `pruned' so that every subtree containing no
leaves $m \in \mathbb{N}$ must itself be $\bot$.

\textit{Case 1:} $g(\bot^{k-2}) = a \in \mathbb{N}$.
In this case, we may suppose that $q = \lambda x.a$.
Consider the computation of $p \cdot q$.
Since all calls to $q$ trivially evaluate to $a$, this computation follows the
rightward path through $p$ consisting of branches $a \Rightarrow \cdots$.
But since $p$ is finite and $p \cdot q = \bot$ (because $f(g)=\bot$), 
this path must end in a leaf occurrence of $\bot$ within $p$.
Now extend $p$ to a procedure $p'$ by replacing this leaf occurrence
by some $n' \neq n$; then clearly $p' \cdot q = n'$.
Taking $f''$ to be the function represented by $p'$, we then have
$f \preceq f''$ and $f''(g) = n' \,\sharp\, n = f'(g)$, so $f'' \sharp f'$
(whence also $f'' \neq f$ so $f \prec f''$).

\textit{Case 2:} $g(\bot^{k-2}) = \bot$.
Take $N$ larger than $n$ and all numbers appearing in $p,q$.
Define $p' \sqsupseteq p$ as follows: if $p = \lambda x.\bot$, take $p' = \lambda x.N$,
otherwise obtain $p'$ from $p$ by replacing each case branch $j \Rightarrow \bot$
anywhere within $p$ by $j \Rightarrow N$ whenever $j \leq N$.
Extend $q$ to $q'$ in the same way. 
Note in particular that every ${\mathtt{case}}$ subterm within $p',q'$ will now be equipped
with a branch $N \Rightarrow N$.

Now consider the computation of $p \cdot q$.
Since $p,q$ are finite and $f(g)=\bot$, this evaluates to an occurrence of $\bot$
which originates from $p$ or $q$. Since no numbers $>N$ ever arise in the computation,
this occurrence of $\bot$ cannot be part of a branch $j \Rightarrow \bot$ for $j>N$,
so will have been replaced by $N$ in $p'$ or $q'$.
Now suppose that we head-reduce $pq$ until $\bot$ first appears in head position,
and let $U$ be the resulting meta-term.
Then it is easy to see that $p'q'$ correspondingly reduces to a meta-term $U'$
with $N$ in head position. (Formally, we reason here by induction on the length
of head-reduction sequences not involving the rule for $\caseof{\bot}{\cdots}$.)

We now claim that $p' \cdot q' = N$. Informally, this is because the head
occurrence of $N$ in $U'$ will be propagated to the top level by the case branches 
$N \Rightarrow N$ within both $p'$ and $q'$.
More formally, let us define the set of meta-expressions \emph{led by $N$}
inductively as follows:
\begin{itemize} 
\item $N$ is led by $N$.
\item If $E$ is led by $N$, then so is $\caseof{E}{i \Rightarrow D_i}$.
\end{itemize}
We say that an NSP meta-term $T$ is \emph{saturated at $N$}
if every ${\mathtt{case}}$ subterm within $T$ has a branch $N \Rightarrow E$ 
where $E$ is led by $N$.
Clearly $p'q'$ is saturated at $N$, and it is
easy to check that the terms saturated at $N$ are closed under head reductions;
thus $U'$ is saturated at $N$.
But we have also seen that $U'$ has $N$ in head position, so is led by $N$.
Finally, an easy induction on term size shows that
every finite meta-term that is led by $N$ and saturated at $N$ evaluates to $N$ itself.
This shows that $p' \cdot q' = N$.

To conclude, let $f'',g'$ be the functions represented by $p',q'$ respectively,
so that $f \preceq f''$ and $g \preceq g'$.
Then $f'(g')=n$, but $p'' \cdot q = N$ so $f''(g')=N \neq n$, whence $f'' \sharp f'$
(and also $f'' \neq f$ so $f \prec f''$).
 
(ii) Suppose $\Phi \succeq {\mathit{id}}$ and $\Phi(f) \succ f$ for some $f$.
Again by continuity, we may take $f$ to be finite. 
Then by (i), we may take $f'' \succ f$ such that $f'' \,\sharp\, \Phi(f)$.
But this is impossible because $\Phi(f'') \succeq f''$ and $\Phi(f'') \succeq \Phi(f)$.
Thus $\Phi={\mathit{id}}$.
$\Box$
\end{proof}

\vspace*{1.0ex}
It is easy to see that the above theorem holds with any finite type over $\nat$
in place of $\pure{k}$.
However, it will trivially fail if the unit type $\unit$ is admitted as an additional base type:
e.g.\ the function $(\lambda x.\top) \in {\mathsf{SF}}(\unit\rightarrow\unit)$ strictly dominates the identity.
An interesting question is whether the theorem holds for all finite types over 
the type $\bool$ of booleans: note that the above proof fails here
since it requires the base type to be infinite.
For comparison, we mention that in other models of computation,
improvements on the identity are sometimes possible for such types.
For example, if $\sigma=\bool\rightarrow\bool$, then a functional
of type $\sigma\rightarrow\sigma$ strictly dominating the identity exists in the Scott model.
Indeed, such a function $J$ can be defined in
${\mathrm{PCF}}$ augmented with the parallel conditional ${\mathit{pif}}$, e.g.\ as 
\[ J ~=~ \lambda f^\sigma.\lambda x^\bool.\,\mathit{vote}(f(x),f({\mathit{tt}}),f({\mathit{ff}})) \;. \]
Here $\mathit{vote}$ is Sazonov's voting function, definable by
\[ \mathit{vote}(x,y,z) ~=~ {\mathit{pif}}(x,{\mathit{pif}}(y,{\mathit{tt}},z),{\mathit{pif}}(y,z,{\mathit{ff}})) \;. \]
The point is that $J$ will `improve' the function $\lambda x.\,{\mathit{if}}(x,{\mathit{tt}},{\mathit{tt}})$
to $\lambda x.{\mathit{tt}}$.
We do not know whether phenomena of this kind can arise within the model ${\mathsf{SF}}$.
 
 
