Way Off-Policy Batch Deep Reinforcement Learning of Human Preferences in DialogDownload PDFOpen Website

Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, Rosalind Picard

23 Sept 2020 (modified: 05 May 2023)ICLR 2020Readers: Everyone
0 Replies

Loading