Driven by the success of machine learning and the practical engineering need in control, there has been a lot of 
interests in learning-based control of unknown dynamical systems~\cite{Beard97,Li22,Bradtke94,Krauth19,Dean20}. However, the existing methods commonly rely on the strong assumption of having access to a known stabilizing controller. This motivates the learning-to-stabilize problem, i.e. learning to stabilize an unknown dynamical system, particularly on a single trajectory, which has long been a challenging problem both in theory and for applications such as control of automatic vehicles and unmanned aerial vehicles (UAV). 

%\guannan{check my edits below}

 Although many classical adaptive control approaches can solve the learn-to-stabilize problem and achieve asymptotic stability guarantees~\citep{Astrom96,Sun01}, it is well known that the learn-to-stabilize problem suffers from an issue known as \emph{exponential blow-up} during transients. As an example, \citet{Abbasi-Yadkori11} and \citet{Chen07} presented a model-based approach for learning to stabilize an unknown LTI system $x_{t+1} = A x_t  + Bu_t$. It first excites the system in open loop to learn the dynamics matrices $(A,B)$ and then designs the stabilizer. However, the initial excitation phase needs to run the system in open loop for at least $n$ steps before learning $(A,B)$ where $n$ is the dimension of the state space, because it takes at least $n$ samples to fully explore the $n$ dimensional state space. As a result, the state norm blows up to the order of $2^{\tilde{O}(n)}$ as the system may be unstable in open loop. Such an exponential blow-up can be catastrophic and has been observed in multiple papers~\citep{Abbasi-Yadkori11,Chen07,Lale20,Perdomo21,Tsiamis2021}. Further, it has also been shown that all general-purpose control algorithms suffer a worst-case regret of $2^{\Omega(n)}$ \citep{Chen07}.

Despite the exponential blow-up lower bound in \citet{Chen07}, it is a worst-case bound and does not rule out better results for specific systems. This motivates the following question: \emph{is it possible to exploit instance-specific properties to learn to stabilize a noisy LTI system without suffering from the worst-case exponential blow-up in $n$?} This problem has two challenges. First, in order to avoid the exponential blow-up, one can only collect $o(n)$ samples, based on which we can only get partial information on the dynamics. With only partial information about the system dynamics, it is difficult to stabilize it. Second, the noise in each step of the system is amplified by the open loop unstable system, causing strong statistical dependencies between states, which explode exponentially in a single trajectory. 

To solve the first challenge, we take inspiration from the framework proposed in \citet{LTI}, which gave an algorithm that stabilizes a \emph{deterministic} LTI system with only $\Tilde{O}(k)$ state samples along a trajectory, where $k < n$ is the number of unstable eigenvalues of $A$. Therefore, \citet{LTI} offered an algorithm with state norm upper bounded by $2^{\tilde{O}(k)}$, which avoids the exponential blow-up $2^{\Tilde{O}(n)}$~\citep{Chen07,Tsiamis2021}. However, \citet{LTI} only solves the challenge in the much simplified \emph{noiseless and deterministic} system dynamics, as its methodology has difficulty decoupling the amplified noise from the system dynamics. In addition, \citet{LTI} assumes that the control matrix has the same dimension as the instability index $k$ and is invertible. In other words, the system is \emph{fully actuated} when restricted to the unstable subspace. This assumption is also unrealistic in applications, as the dimension of control input is problem-specific and may not be equal to $k$. Particularly, many real-world systems are under-actuated, meaning that the control dimension can be much less than $k$. However, \citet{LTI} hints at the possibility of stabilizing a general noisy LTI system with fewer data points. 

To solve the second challenge and address the limitations in \citet{LTI}, we need to determine a new method to approximate the unstable part of the system dynamics under stochastic noise and stabilize it with under-actuated control inputs. This is nontrivial as, for example, while some previous works have designed methods to approximate system dynamics from a noisy and blowing-up trajectory\citep{near_optimal_LDS, Simchowitz18}, these methods do not study how to separate the unstable part of the dynamics from the stable part and how to stabilize the system. The goal of this paper is to overcome these technical challenges and \emph{to learn-to-stabilize an unknown LTI system without the exponential blow-up state norm in noisy and under-actuated settings.} %extend the existing algorithm in \cite{LTI} to system dynamics with stochastic noises with an appropriate method of approximating system dynamics matrices in noisy settings.

\textbf{Contribution.} %\guannan{Update the contribution per our previous joint meeting} \ziyi{developed a novel algorithm} 
In this paper, we develop a novel model-based algorithm, LTS\textsubscript{0}-N, to stabilize an unknown LTI system. We design a new singular-value-decomposition(SVD)-based subspace estimation technique to estimate the ``unstable'' part of system dynamics under noise perturbations and stabilize it. Using this new technique, we develop an analytical framework with the Davis-Kahan Theorem to estimate the error of subspace estimation, based on which we show the approach stabilizes the unknown dynamical system with state norm bounded by $2^{O(k \log k + \log(n-k) + m - \log\gap)}$, where $m$ is the dimension of control input, and $\gap$ is a constant depending on the spectral properties of $A$. Note that this bound avoids the worst-case exponential blow-up in state dimension $\Theta(2^n)$ and outperforms the state-of-the-art for stabilizing unknown noisy systems \cite{Lale20,Chen07}. Further, despite the challenge caused by strong stochastic dependencies, the aforementioned bound achieves a similar guarantee as the norm bound in \citet{LTI} for noiseless systems. In addition, as an improvement to \citet{LTI}, we do not place any requirement on dimensions of system dynamics matrices and maintain the same complexity for under-actuated system dynamics. 

\textbf{Related Work.} Our work is mostly related to online learning and adaptation, learn-to-control with known stabilizing controllers, learning-to-stabilize on multiple trajectories, and learn-to-stabilize on a single trajectory. In addition, we will also briefly cover system identification.

\textit{Online learning and adaptation.} Adaptive control enjoys a long history of study~\citep{Astrom96,Sun01,Chen21}. Most classical adaptive control methods focus on asymptotic stability and do not provide finite sample analysis, and therefore do not study the exponential blow-up issue explicitly. The more recent work on non-asymptotic sample complexity of adaptive control has recongnized the exponential blow-up issue when a stabilizing controller is not known a priori~\citep{Chen07,Faradonbeh17,Lee23,Tsiamis2021,Tu18}. Specifically, the most typical strategy to stabilize an unknown dynamic system is to use past trajectory to estimate the system dynamics and then design the controller~\citep{Berberich20,Persis20,Liu23}. Therefore, those works need to run in an open loop for at least $O(n)$ steps before stabilizing, resulting in an exponential blow-up in the order of the state space dimension. Compared with those works, we can stabilize the system with fewer samples by identifying and stabilizing only the unstable subspace, thus avoiding the exponential blow-up. 

\textit{Learn to control with known stabilizing controller.} There is abundant literature on stabilizing LTI systems under stochastic noise~\citep{Bouazza21,converse_lyapunov, Kusii18,Li22}. One line of research uses model-free approaches to learn the optimal controllers ~\citep{Fazel19,Joao20,Li22, Wang22, Zhang20}. Those algorithms typically require a known stabilization controller as an initialization point for policy search. Another line of research utilizes model-based approaches, which require known stabilizing controllers to collect data for learning the system dynamics~\citep{Cohen19, Mania19, Plevrakis20,Zheng20}. Compared with those works, we focus on learn-to-stabilize, and the controller we obtain can serve as the initialization to existing learning-to-control works that require a known stabilizing controller. 

\textit{Learning-to-stabilize on multiple trajectories.} In addition to the aforementioned literatures, some works do not assume open-loop stability and learn the full system dynamics by learning from multiple trajectories. Before designing a stabilizing controller, they require a data complexity of $\widetilde{\Theta}(n)$~\citep{Dean20,Tu18,Zheng201}, which is larger than $\widetilde{O}(k)$ of our work. Recently, a model-free approach via the policy gradient method offers a novel perspective with the same complexity~\citep{Perdomo21}. Those works do not face the same exponential blow-up issue since they allow multiple trajectories, i.e., the state can be ``reset'' to $0$. Compared with their work, we focus on the more challenging setting of stabilizing on a single trajectory. 

\textit{Learning-to-stabilize on a single trajectory.} Learning to stabilize for a linear system in an infinite time horizon has long been studied in traditional control literatures~\citep{Lai86, Chen89, Lai91}. There have been algorithms incurring regret of $2^{O(n)}O(\sqrt{T})$ which relies on assumptions of observability and strictly stable transition matrices~\citep{Abbasi-Yadkori11,Ibrahimi12}. Some studies have improved the regret to $2^{\tilde{O}(n)} + \tilde{O}(\text{poly}(n)\sqrt{T})$ \citep{Chen07,Lale20}. Recently, \citet{LTI} proposed an algorithm that requires $\tilde{O}(k)$ samples but has assumptions on the dimension of $B$ and does not incorporate noise in the system dynamics. In this work, we propose an algorithm that has the same state norm bound as \citet{LTI} in a noisy and potentially under-actuated LTI system. 

\textit{System identification.} Existing literature in system identification focuses on determining system parameters \citep{Oymak18, near_optimal_LDS, Simchowitz18, Xing22}. Our approach also partially determines the system parameters for the construction of stabilizing controllers. Compared to those works, we do not just conduct the identification but also close the loop by stabilizing the system. Such results require additional characterization of the identification accuracy and its impact on closed-loop response.