Abstract: Human teaching effort is a significant bottleneck for the broader applicability of interactive imitation learning. To reduce the number of required queries, existing methods employ active learning to query the human teacher only in uncertain, risky, or novel situations. However, during these queries, the novice's planned actions are not utilized despite containing valuable information, such as the novice's capabilities, as well as corresponding uncertainty levels. To this end, we allow the novice to say: "I plan to do this, but I am uncertain." We introduce the Action Inquiry DAgger (AIDA) framework, which leverages teacher feedback on the novice plan in three key ways: (1) S-Aware Gating (SAG): Adjusts the gating threshold to track sensitivity, specificity, or a minimum success rate; (2) Foresight Interactive Experience Replay (FIER), which recasts valid and relabeled novice action plans into demonstrations; and (3) Prioritized Interactive Experience Replay (PIER), which prioritizes replay based on uncertainty, novice success, and demonstration age. Together, these components balance query frequency with failure incidence, reduce the number of required demonstration annotations, improve generalization, and speed up adaptation to changing domains. We validate the effectiveness of AIDA through language-conditioned manipulation tasks in both simulation and real-world environments. Code, data, and videos are available at https://aida-paper.github.io.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: We have addressed the requested changes and comments from the reviewers and highlighted the changes in blue in the revised manuscript.
These changes include:
- Adding additional modes for SAG, i.e., specificity- and success-aware gating (throughout paper).
- Extending the related work section to include work from the affordance learning field and RLHF field (Section 2).
- Added a paragraph to clarify the research gap (Section 2).
- Clarified that we do not target end-to-end learning of visuomotor policies (Section 1 and Section 3).
- Clarified the CLIPort model and tasks (Section 5.2 and A.8)
- Added two more external baselines, i.e., SafeDAgger and ThriftyDAgger (Section 5.2, A.9, and A.10)
- Added a discussion of demonstration timings and teaching effort (A.6)
- Added an experiment where we compare AIDA with Where2Act (A.12)
Assigned Action Editor: ~Tim_Genewein1
Submission Number: 4616
Loading