\begin{figure}
    \centering
    % 
    \subfigure[The relation between task, action and method in an HTN.]{\label{fig:generic-htn-domain}%
      \includegraphics[width=0.28\textwidth]{Image/generic HTN Domain.jpg}
      }%
    % \begin{subfigure}{0.28\textwidth}
    %     \centering
    %     \includegraphics[width=\textwidth]{Image/generic HTN Domain.jpg}
    %     \captionsetup{width=1.0\textwidth}
    %     \caption{}
    %     \label{fig:generic-htn-domain}
    % \end{subfigure}%
    % 
    \subfigure[A specific example of an HTN from the kitchen scenario.]{\label{fig:key-example-htn-domain}%
      \includegraphics[width=0.23\textwidth]{Image/key example partial HTN.jpg}
    }%
    % \begin{subfigure}{0.23\textwidth}
    %     \centering
    %     \includegraphics[width=\textwidth]{Image/key example partial HTN.jpg}
    %     \captionsetup{width=1.0\textwidth}
    %     \caption{}
    %     \label{fig:key-example-htn-domain}
    % \end{subfigure}%
    % 
    \subfigure[Illustration of an agent progressing through the plan for task ``Recipe1''.]{\label{fig:generic-htn-plan-execution}%
      \includegraphics[width=0.49\textwidth]{Image/generic HTN plan execution.jpg}
    }%    
    % \begin{subfigure}{0.49\textwidth}
    %     \centering
    %     \includegraphics[width=\textwidth]{Image/generic HTN plan execution.jpg}
    %     \captionsetup{width=0.8\textwidth}
    %     \caption{}
    %     \label{fig:generic-htn-plan-execution}
    % \end{subfigure}
    \caption{Planning with HTN.}
    \label{fig:htn-introduction}
\end{figure}

\section{Representation and Analysis of Hierarchical Tasks}\label{sec:htn intro}

The Hierarchical Task Network (HTN) is a classic representation of hierarchical tasks for planning \citep{Erol1994HTNExpressivity}. An HTN represents the state of the world as a set of atomic predicates. The domain is described by a set of \emph{task}s, \emph{method}s and \emph{action}s (\figureref{fig:htn-introduction}). A task represents an abstract goal, e.g., following a recipe. Methods describe different ways to complete a task by decomposing it into a set of (sub-)tasks and / or actions. Methods have a tree structure and organize tasks into different levels of abstraction. Methods are applicable when a set of preconditions are met. The same sub-task or action could be involved in several different task and in different decomposition of the same task. An action is a special task which cannot be further decomposed and is implemented physically through task-related movements, resulting in changes of the world state. Tasks, actions and preconditions of methods are represented as predicates acting upon symbolic arguments. Physical entities involved in a task can be explicitly represented by these symbolic arguments or implicitly bound to the predicate, e.g., fridge in ``OpenFridge''. To find a plan to achieve a certain task, an HTN planner recursively chooses and applies applicable methods to find a decomposition of the goal task. A plan is found when the process terminates with a sequence of actions.

\figureref{fig:generic-htn-plan-execution} shows an agent progressing through a plan for the task ``Recipe 1.'' The plan inherits a tree structure from the methods, where the root node represents the overarching goal, lower level vertices on the tree represent intermediate sub-goals, and leaf nodes correspond to actions. Although the tasks in a decomposition might be partially ordered, a plan contains totally-ordered sequences of tasks at each level, as shown by the arrows in the figure. At each step of the plan, an agent is simultaneously executing tasks at different levels (e.g. actions at the leaves and intermediate sub-goals at higher levels of the tree). Thus, the current steps along the plan corresponds to one branch of the tree, shown as grey boxes in the figure.

Under this formulation, an agent carrying out a plan could have difficulty with tasks at different levels of the current branch. Guidance should be provided according to the agent's intention, which we define as as a set of binary variables: one for each level indicating that the agent is either aware of, executing, or intending to execute the corresponding task. Ideally, guidance should be based on the agent's actual intention. For example, if the binary variable for the root node is 0, the agent might need reminding about which recipe they should be following. If the binary value for the root node is 1, but 0 for a lower level node, the agent might need a reminder about what the next step in the recipe should be. In practice, guidance is based on estimates of the agent's intention, since the intention is not directly observable from actions, except at the lowest level of the hierarchy and only if the agent is actually executing the action. As described above, this leads to ambiguities. These can be resolved, at least partially, by exploiting gaze, as we describe below. 

Planning techniques have been adopted previously to estimate human intention. This is often referred to as \emph{plan / goal / intention recognition as planning} \citep{Sohrabi2016PlanRevisited} \citep{Meneguzzi2021APlanning}. Recently, Singh et al. estimated a player's intention in a turn-based game using planning \citep{Singh2020CombiningRecognition}. They assumed the likelihood of past actions given a potential goal to be proportional to the similarity between the global optimal plan for that goal and the optimal plan for that goal given the action history as a prefix. They also showed that incorporating gaze improves accuracy and reduces computation cost by pruning expensive paths. Whereas they used planning in a non-hierarchical task to assign probabilities to different potential goals, we use planning to find the optimal current step in executing a single hierarchical task with a fixed goal. We use gaze to estimate intention at different levels of the hierarchy, whereas they use gaze to estimate intention to achieve different goals.

