Toggle navigation
OpenReview
.net
Login
×
Go to
DBLP
homepage
Language Model Self-improvement by Reinforcement Learning Contemplation
Jing-Cheng Pang
,
Pengyuan Wang
,
Kaiyuan Li
,
Xiong-Hui Chen
,
Jiacheng Xu
,
Zongzhang Zhang
,
Yang Yu
Published: 01 Jan 2024, Last Modified: 09 Oct 2024
ICLR 2024
Everyone
Revisions
BibTeX
CC BY-SA 4.0
Loading