\section{Conclusion}
\label{sec:conclusion}

We have proposed a statistics-based approach to distributional reinforcement learning that uses the simultaneous estimation of quantiles and expectiles of the action-value distribution. Previous work only estimated quantiles or expectiles separately. Our new approach presents the advantage of leveraging the efficiency of the expectile-based loss for both expectile and quantile estimation while solving the theoretical shortcomings of expectile-based distributional reinforcement learning, which often lead to a collapse of the expectile function in practice. 

We have shown on a toy environment how the dual optimization affects the statistics recovered in distributional RL: in short, the quantile function is estimated more accurately than with vanilla quantile regression and the expectile function remains consistent after several steps of temporal difference training. We have also benchmarked our approach at scale, on the Atari-5 benchmark. Our model, IEQN, matches the performance of the Huber-based IQN-1 and surpasses that of both expectile and quantile-based agents, demonstrating its effectiveness in practical scenarios.

We open possibilities for future research to use a distributional approach that performs well and does not collapse. For future work, we plan to investigate how the dual approach can be used in risk-aware decision-making problems, and how it performs when the goal is to optimize risk metrics such as (conditional) value-at-risk. Moreover, we plan to gather insights into what type of behavior is favored by the quantile and expectile loss, respectively.