\textbf{Stage 1: Learning the unstable subspace of $A$.}
%Analogous to the setup Stage 1 of LTI\textsubscript{0} in \cite{LTI}, when we apply $A$ recursively, the states get pushed closer to $E_u$. 
We let the system run in open-loop (with control input $u_t \equiv 0$) for $T$ time steps. Per the stable/unstable decomposition, the ratio between the norms of the state components in the unstable and stable subspace increases exponentially, and, very quickly, the state will lie ``almost'' in $E_u$. Consequently, the subspace spanned by the $T$ states, i.e. the column space of $D := [x_{1}, \cdots, x_{T}]$, is very close to $E_u$. %Naturally, as the system runs in the open-loop, the state space would get closer to the unstable subspace $E_u$. 
Thus, we use the top $k$ left singular vectors of $D$ (the top $k$ eigenvectors of $DD^*$), denoted as $U^{(k)}$, as an estimate of the basis of the unstable subspace $\hat{P}_1$. In other words, we set $\hat{P}_1 = U^{(k)}$ and use it to construct the orthogonal projector onto $E_u$, namely $\hat{\Pi}_1 = U^{(k)}(U^{(k)})^*$, as an estimation of the projector $\Pi_1 = P_1 P_1^*$ onto $E_u$. 

\textbf{Stage 2: Learn $M_1$ on the unstable subspace.} Recall that $M_1$ is the system dynamics matrix for the subspace $E_u$ under $E_u \oplus E_u^\perp$-decomposition. Therefore, to estimate $M_1$, we first compute the projection of states $x_{1:T}$ on subspace $E_u$, i.e. $\hat{y}_{1,t} = \hat{P}_1^* x_{1,t}$ for $t = 1,\cdots,T$. Then we use least squares to estimate $M_1$, i.e. find $\hat{M}_1$ that minimizes the square loss:
\begin{equation}
\label{eqn:est_M_1}
    \begin{split}
        \mathcal{L}(\hat{M}_1) &:= \sum_{t = 0}^T \norm{\hat{y}_{1,t+1}- \hat{M}_1 \hat{y}_{1,t}}^2 .
    \end{split}
\end{equation}
%We will show that the unique solution to \eqref{eqn:est_M_1} is $\hat{M}_1 = (U^{(k)})^* A U^{(k)} + \varpi$ in Lemma~\ref{lemm:ls}, where $\varpi$ is error. 

\textbf{Stage 3: Learn $B_\tau$ for $\tau$-hop control.} In this stage, we estimate $B_\tau$, which quantifies the effect of control input on states in the unstable subspace $E_u$ (as discussed in \Cref{section:tau-hop-control}). Note that \eqref{eqn:system_y_tilde} shows
\begin{equation}
\label{eqn:y_for_b}
    \begin{split}
        y_{1,t_i + \tau} =& M^\tau y_{1,t_i} + \Delta_\tau y_{2,t_i} + B_\tau u_{t_i}
        \\
        &+ \sum_{j = 1}^{\tau-1} M^{\tau-j} \eta_{1,t_i + j} + \Delta_{\tau-j} \eta_{2,t_i + j} .
    \end{split}
\end{equation}
%\guannan{also mention that we are estimating the column one by one. For each column, we have this stopping time thing. Basically, make Stage 3 more readable. }\ziyi{better? Stopping time was already mentioned at the bottom.} 
We estimate the columns of $B_\tau$ one by one. Specifically, we use a scaled unit vector $e_i$ as control input at time $t_i$, run the system in open loop for $\tau$ steps, and use \eqref{eqn:y_for_b} but simply ignore the $\Delta_{\tau}$ related terms to estimate $b_i$, the $i$-th column of $B_\tau$, as
\begin{equation}
\label{eqn:b}
    \hat{b}_i = \frac{1}{\norm{u_{t_i}}}\left(\hat{P}_1^* x_{t_i+\tau} - \hat{M}_1^\tau \hat{P}_1^* x_{t_i} \right) ,
\end{equation}
where $u_{t_i}$ is parallel to $e_i$ with magnitude $\alpha \norm{x_{t_i}}$ for normalization. Here, $\alpha$ is an adjustable constant to guarantee that the $E_s$-component does not increase too much to blur our estimation after injecting $u_{t_i}$. Since we ignored the $\Delta_\tau$ related terms in the estimation of $b_i$, to ensure that those terms do not cause much error in our estimation of $B_{\tau}$,  we let the system run in open loop for $\omega_i$ time steps before the estimation of $b_i$ starts. Here, $\omega_i$ is a stopping time (cf. Line \ref{alg:stopping_time} in \cref{alg:LTS0}). The purpose of the stepping time is to reduce the estimation error caused by the $\Delta_\tau$. For more details, see \Cref{prop:G6} in the proof.

\textbf{Stage 4: Construct a $\tau$-hop stabilizing controller $K$.} With the estimated $M_1^\tau$ and $B_\tau$ from the last stage, denoted as $\hat{M}_1^\tau$ and $\hat{B}_\tau$, the learner can choose any stabilization algorithm to find $\hat{K}_1$ by stabilizing the linear system 
\begin{equation*}
    \hat{\Tilde{y}}_{i+1} = \hat{M}_1^\tau \hat{\Tilde{y}}_i + \hat{B}_\tau \Tilde{u}_{i}, \qquad \Tilde{u}_i = \hat{K}_1 \hat{\Tilde{y}}_i ,
\end{equation*}
where the tilde in $\hat{\Tilde{y}}$ emphasizes the use of $\tau$-hop control and the hat emphasizes the use of estimated projector $\hat{P}_1$, which introduces an extra estimation error to the final closed-loop dynamics. As $\hat{K}_1$ is chosen by the learner, we denote $\mathcal{K}$ to be a constant such that $\norm{\hat{K}_1} < \mathcal{K}$. Furthermore, by \Cref{prop:controllable_Mtau}, there exists a positive definite matrix $\Bar{U}$ such that $\norm{\hat{M}_1^{\tau} - \hat{B}_{\tau} \hat{K}_1}_{\Bar{U}} := \mathcal{U} < 1$, where $\norm{\cdot}_{\Bar{U}}$ denotes the weighted norm induced by $\Bar{U}$. These user-defined constants are used in the proof of \Cref{thm:main}.%\ziyi{check} \guannan{also mention the constant of stable, i.e. the eigenvalue of the closed loop system is upper bounded by that constant}

To sum up, \Cref{alg:LTS0} terminates in $T + \sum_{i=1}^m(1+\omega_i + \tau)$ time steps, where $\omega_i$ is the stopping time for the system to satisfy $\frac{\norm{(I - \hat{\Pi}_1)x_{t_i}}}{\norm{x_{t_i}}} < (1-\epsilon)\gamma$ and $\frac{C}{\norm{x_{t_i}}} < \delta$. 

\begin{remark}
    Our algorithm is different from the algorithm proposed in \citet{LTI} in three aspects. Firstly, to account for the noise, we do not directly use the span of consecutive $k$ vectors as the estimator for the unstable subspace. Instead, to identify the unstable subspace under noise, we utilize the singular value decomposition to identify the dominating state space in the trajectory and use that space as an estimation of $P_1$. Such an estimator requires a much more delicate analysis framework to bound the error based on Davis-Kahan Theorem, which we elaborate in \Cref{Appendix:proj_proof}. Secondly, the above algorithm generalizes the problem to an under-actuated setting, where the control matrix $B \in \mathbb{R}^{n \times m}$ with $m \neq k$. To achieve this, unlike \cite{LTI} we no longer try to cancel out the unstable matrix $M_1$, but rather allow the learner to choose the stabilization controller. We show in \Cref{sec:simulation} that our algorithm outperforms \citet{LTI} in an under-actuated setting in simulation. Thirdly, we use a stopping time to monitor the state norm in estimating $B_{\tau}$, so that our algorithm always terminates at the earliest possible time.
\end{remark}

%In the next section, we show how the parameters are chosen to guarantee both stability and avoid the exponential blow-up in the state norm. %sub-linear sample complexity. 