Note that within this section, we assume finiteness of the state space ($|\mathcal{S}|<\infty$) and that the transition operator has rank $\tilde{d}$ for all time steps, that is, $\textnormal{rank}(\mathcal{P}_{h}^{\star})=\tilde{d}$ for all $h\in[H]$. Furthermore, we denote by \(\mathcal{X}_{h}^{\star} :=\{(s,a)\in\mathcal{S}\times\mathcal{A} |  d_{\mathcal{P}^{\star}, h}^{\pi^{\star}}(s,a) > 0\}\) the set of state-action pairs reachable by the optimal policy at time step \(h\in[H]\). The following lemma provides a condition that is necessary and sufficient for the existence of non-redundant UniSOFT representations in low-rank MDPs.

\begin{restatable}[Existence of good representations]{lemma}{unisoftex}\label{lemma:unisoft_existance}
Let $d\geq \tilde{d}$. Then, the following statements are equivalent:
        
        $(1)$ \(\textnormal{span}\{\mathcal{P}_{h}^{\star}(\cdot|s,a) |  (s,a)\in\mathcal{X}_{h}^{\star}\}=\mathbb{R}^{\tilde{d}}\) and $|\mathcal{X}_{h}^{\star}|\geq d$,
        
        $(2)$ there exists a non-redundant UniSOFT representation \(\langle\tilde{\phi}_{h}, \tilde{\mu}_{h}\rangle_{\mathbb{R}^{d}}=\mathcal{P}_{h}^{\star}\).
\end{restatable}
\begin{remark}
    Note that the result is agnostic to the choice of policy. This implies that in low-rank MDPs feature space coverage is equivalent to state space coverage.
\end{remark}
\begin{remark}
    In section \ref{sec:existance_unisoft}, we provide a similar result for the existence of (possibly redundant) UniSOFT features. This implies, given that UniSOFT feature maps are necessary for constant expected regret in MDPs with linear rewards \citep{papini2021reinforcement}, that to achieve constant expected regret in linear MDPs or low-rank MDPs with unknown rewards, the optimal policy must visit all states reachable by any policy with positive probability.
\end{remark}

Importantly, we see that the existence of good features $\phi$ is fully characterized by the ground-truth transition operator. That is, assuming the existence of non-redundant UniSOFT features, implicitly assumes that the optimal policy explores the whole reachable state space (Corollary \ref{cor:unisoft_negative}). 

Nevertheless, if $\mathcal{P}^{\star}$ admits a non-redundant UniSOFT representation, good $\alpha^{\star}$-approximate representations are abundant. The following lemma supports the $\alpha^{\star}$-expressiveness assumption (Assumption \ref{ass:expressivness}).

\begin{restatable}[]{lemma}{unisoftexmiss}\label{lemma:unisoft_existance_misspecification}
    Assume that Assumption \ref{ass:min_optimal_occupancy_exists} (minimal optimal occupancy) holds and that \(\mathcal{P}^{\star}\) admits a non-redundant UniSOFT representation. Then, there exists an \(\epsilon > 0\) such that for any $d\geq\tilde{d}$ the following holds:
    Let $\tilde{\alpha}<\alpha\leq\epsilon$ be arbitrary. There exist infinitely more $\alpha^{\star}$-approximate representations than $\tilde{\alpha}^{\star}$-approximate representations \(\langle\phi,\mu\rangle_{\mathbb{R}^{d}}\equiv\hat{\mathcal{P}}\) that are UniSOFT and non-redundant.
\end{restatable}

\begin{remark}
    On a high level, $\epsilon$ is upper bounded by the degree of linear independence between the (unknown) transition vectors of the optimal actions.
\end{remark}