Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Procedure planning, uncertainty estimation, visual reasoning, active learning
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Procedure planning involves the generation of a sequence of steps that bring a specific start state to the desired goal state. Both states are given as visual observations in the case of planning from instructional videos. This is a challenging task due to ambiguities in the visual representations of states and variations arising from multiple feasible plans. Existing approaches address these challenges by adopting strong visual representation learning methods and sophisticated reasoning mechanisms. However, the decision process is passive in the sense that both the visual observations and the reasoning process are fixed during the planning phase. In this paper, we propose an active procedure planning approach that takes account of uncertainties arising from imperfect visual observations and task plan variations. In particular, we develop quantitative metrics to evaluate task uncertainty and use them to guide the selection of additional visual observations. Empirical results show that visual observations driven by uncertainty-awareness lead to significantly higher performance gain compared to opportunistic visual observations. The findings are useful for developing trusted and explainable AI models for procedure planning.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1915
Loading