OpenReview
.net
OpenReview
.net
Login
OpenReview
.net
Login
Yuanzhao Zhai
Assistant Professor, National University of Defense Technology
Joined
April 2022
Names
Yuanzhao Zhai
(Preferred)
,
George Chia
Emails
****@126.com
(Confirmed)
,
****@nudt.edu.cn
(Confirmed)
Personal Links
Google Scholar
DBLP
ORCID
Career & Education History
Assistant Professor
National University of Defense Technology
(nudt.edu.cn)
2025
–
Present
PhD student
National University of Defense Technology
(nudt.edu.cn)
2022
–
2025
MS student
National University of Defense Technology
(nudt.edu.cn)
2019
–
2022
Undergrad student
Tianjin University
(tju.edu.cn)
2015
–
2019
Advisors, Relations & Conflicts
No relations added
Expertise
No areas of expertise listed
Publications
CoPE: A Framework for Optimizing Coordination between Planning and Execution in LLM-based Agents
Huanxi Liu
,
Kun Hu
,
Qiang Wang
,
Yuanzhao Zhai
,
Feng Dawei
,
Bo Ding
,
Huaimin Wang
ICML 2026 regular
Readers:
Everyone
Uncertainty-penalized reinforcement learning from human feedback with diversified reward LoRA ensembles
Yuanzhao Zhai
,
Yu Lei
,
Han Zhang
,
Yue Yu
,
Kele Xu
,
Dawei Feng
,
Bo Ding
,
Huaimin Wang
Information Processing & Management
Readers:
Everyone
Empowering Large Language Model Agent through Step-Level Self-Critique and Self-Training
Yuanzhao Zhai
,
Huanxi Liu
,
Zhuo Zhang
,
Tong Lin
,
Kele Xu
,
Cheng Yang
,
Dawei Feng
,
Bo Ding
,
Huaimin Wang
Crossref
Readers:
Everyone
Preference-Strength-Aware Self-Improving Alignment with Generative Preference Models
Yuanzhao Zhai
,
Zhuo Zhang
,
Cheng Yang
,
Kele Xu
,
Yue Yu
,
Wei Li
,
Hui Wang
,
Zenglin Xu
,
Dawei Feng
,
Bo Ding
,
Huaimin Wang
Crossref
Readers:
Everyone
NUCLEAR-NORM MAXIMIZATION FOR LOW-RANK UPDATES
Huanxi Liu
,
Yuanzhao Zhai
,
Kele Xu
,
Feng Dawei
,
Yiying Li
OpenReview Archive Direct Upload
Readers:
Everyone
Optimistic Model Rollouts for Pessimistic Offline Policy Optimization
Yuanzhao Zhai
,
Yiying Li
,
Zijian Gao
,
Xudong Gong
,
Kele Xu
,
Feng Dawei
,
Bo Ding
,
Huaimin Wang
OpenReview Archive Direct Upload
Readers:
Everyone
Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models
Yuanzhao Zhai
,
Tingkai Yang
,
Kele Xu
,
Dawei Feng
,
Cheng Yang
,
Bo Ding
,
Huaimin Wang
AAAI 2025
Readers:
Everyone
COPR: Continual Human Preference Learning via Optimal Policy Regularization
Han Zhang
,
Lin Gui
,
Yu Lei
,
Yuanzhao Zhai
,
Yehong Zhang
,
Zhuo Zhang
,
Yulan He
,
Hui Wang
,
Yue Yu
,
Kam-Fai Wong
,
Bin Liang
,
Ruifeng Xu
ACL (Findings) 2025
Readers:
Everyone
Correcting Large Language Model Behavior via Influence Function
Han Zhang
,
Zhuo Zhang
,
Yi Zhang
,
Yuanzhao Zhai
,
Hanyang Peng
,
Yu Lei
,
Yue Yu
,
Hui Wang
,
Bin Liang
,
Lin Gui
,
Ruifeng Xu
AAAI 2025
Readers:
Everyone
View all 44 publications
Co-Authors
Bin Liang
Bo Ding
Chao Chen
Cheng Yang
Chengkang Yao
Dawei Feng
Feng Dawei
Gong Xudong
Gongqian Zhou
Han Zhang
Hanyang Peng
Hengxing Cai
Hongda Jia
Huaimin Wang
Huanxi Liu
Hui Wang
Jie Luo
Jie Xu
Kam-Fai Wong
Kele Xu
Kun Hu
Lin Gui
Paul Honeine
Pengfei Zhang
Qiang Wang
View all 52 co-authors