We consider an LTI system $x_{t+1} = A x_t + B u_t + \eta_t$ where $x_t, \eta_t \in \mathbb{R}^n$ and $u_t \in \mathbb{R}^m$ are the state, noise, and control input at time step $t$, respectively. The system dynamics determined by $A$ and $B$ are \emph{unknown} to the learner. We further assume $\E[\eta_t] = 0$, and there exists constant $C \in \mathbb{R}^+$ such that $\norm{\eta_{t}} < C$ for all $t \in \mathbb{N}$.\footnote{  The assumption on boundedness of noise can be loosened to sub-Gaussian random variables at the cost of a slightly more complicated proof. Indeed, in the simulation in \Cref{sec:simulation}, we show our algorithm stabilizes an LTI system with additive Gaussian noise.}

% \begin{remark}
% \ziyi{
  
%     }
% \end{remark}

The goal of the learning is to stabilize the system with a learned controller, defined as follows:
\begin{defn}[Stabilizing controller]
\label{defn:stb_cont}
    Control rule $(u_t)$ is called a \textbf{stabilizing controller} if and only if the closed-loop system $x_{t+1} = A x_t + B u_t + \eta_t$ is ultimately bounded; i.e. when $\Vert \eta_t\Vert\leq C$ for all $t$, $\lim\sup_{t \rightarrow \infty} \norm{x_t} < C_n$ is guaranteed in the closed-loop system for some $C_n \in \mathbb{R}^+$. 
\end{defn}

The learner is allowed to learn the system by interacting with it on a single trajectory. More specifically, the learner can observe $x_t$ and freely determine $u_t$. In this paper, we make the standard assumption that $(A,B)$ is controllable. We also assume $x_0 = 0$ for simplicity of proof. Our proof can be easily generalized to nonzero initial conditions.  

\textbf{Exponential blow-up.} Although there are many existing works in the learn-to-stabilize problem, including classical adaptive control \citep{Sun01} or more recent learning-based control papers \citep{Abbasi-Yadkori11,Chen07,Ibrahimi12,Lale20}, it is widely recognized that any generic learn-to-stabilize algorithm inevitably causes exponential blow-up in the state norm as shown by the lower bound in \citet{Chen07} and \citet{Tsiamis2021}. This is because $\Theta(n)$ samples are mandatory to sufficiently explore the $n$-dimensional state space and estimate the system dynamics before designing a stabilizing controller is possible. In contrast to these existing approaches that estimate the full system, our approach breaks the lower-bound by isolating the smaller unstable subspace from the stable subspace, estimating the system dynamics in the unstable subspace under stochastic coupling, and showing that by stabilizing the "smaller" subspace, we can stabilize the entire state space. As such, our approach breaks the exponential blow-up lower-bound in the regime when the unstable subspace has smaller dimension than $n$. 

%