
\begin{abstract}
    Deterministic Markov Decision Processes (DMDPs) are a mathematical framework for decision-making where the outcomes and future possible actions are deterministically determined by the current action taken. DMDPs can be viewed as a finite directed weighted graph, where in each step, the controller chooses an outgoing edge. An objective is a measurable function on runs (or infinite trajectories) of the DMDP, and the value for an objective is the maximal cumulative reward (or weight) that the controller can guarantee. We consider the classical mean-payoff (aka limit-average) objective, which is a basic and fundamental objective.
    
    Howard's policy iteration algorithm is a popular method for solving DMDPs with mean-payoff objectives. Although Howard's algorithm performs well in practice, as experimental studies suggested, 
    the best known upper bound is exponential and the current known lower bound is as follows: For the input size $I$, the algorithm requires $\widetilde{\Omega}(\sqrt{I})$ iterations, where $\widetilde{\Omega}$ hides the poly-logarithmic factors, i.e., the current lower bound on iterations is sub-linear with respect to the input size.  Our main result is an improved lower bound for this fundamental algorithm where we show that for the input size $I$, the algorithm requires $\widetilde{\Omega}(I)$ iterations. 
    
    % \jakob{Are we still mentioning here that our example has similar $m$ and $n$ to the past example but reduces the weights from exponential to polynomial?}
    % \krish{Lets keep it crisp and succinct in abstract. We elaborate in Intro anyway.}
    
    % our understanding of its theoretical complexity is limited. To address this gap, we present a lower bound for Howard's algorithm, when applied on the DMDPs with mean-payoff objectives. We construct a family of DMDPs with \(2n\) vertices and \(\calO(n^2)\) edges, on which Howard's algorithm requires \(\mathcal{O}(n^2)\) iterations to find an optimal policy. It is noteworthy that our construction achieves a DMDP size of \(\mathcal{O}(n^2 \log n)\), significantly improving the dependency on the number of edges from linear to constant compared to existing lower bounds in the literature. 
\end{abstract}
