\section{Overview of Results}
A natural question on Howard's policy iteration is that how many iteration it takes to find an optimal policy. In the following, we state a long-standing conjecture  on the upper bound for Howard's policy iteration. 
\begin{conjecture}[{\cite[Conjecture 6.1.1]{hansen2012worst}}]
    The number of iterations performed by Howard’s algorithm, when applied to a DMDP, is at most the number of edges.
\end{conjecture}
\citet{hansen2010lower} constructed a family of DMDPs with $n$ vertices and $m$ edges on which Howard's algorithm performs $m - n +1$ iterations. However, the size of DMDPs is $\calO(mn^2 \log n)$ due to exponential weights. In this work, we present a family of DMDPs with $2n$ vertices and $\calO(n^2)$ edges on which Howard's algorithm performs $\Omega(n^2)$ iterations to find an optimal policy. The weights are bounded by $\calO(n^2)$. Hence, the size of our DMDPs is $\calO(n^2 \log n)$, which improves the dependency on the number of edges from linear to constant. Our main result is stated as follows.

%\begin{tcolorbox}
    \begin{theorem}[Main Result]
    \label{thm:main-result}
        Let $n$ be a positive integer. There exists a DMDP with $2n$ vertices, $\frac{3n^2 + n}{2}$ edges, and size of $O(n^2\log n)$ on which Howard's algorithm performs $\frac{n^2 + 7n - 6}{2}$ iterations to find an optimal policy.
    \end{theorem}
%\end{tcolorbox}

