Keywords: Editable-LLM, Reinforcement learning, Supervisory signal design
Abstract: We have innovatively designed an Editable-LLM that can constantly reflect and modify the generated content in real time, just like the human reflective process. To be more precise, we add a check mechanism based on the traditional generative large model, which implements the operation of adding, deleting, correcting and checking the generated text. The supervisory signal is provided by the text quality score after the simulation modification is completed. The idea is inspired by applications in the field of reinforcement learning, such as AlphaGo's decision to determine the optimal current decision based on the simulation of the final game as a supervisory signal. We will use the idea of reinforcement learning to guide us in improving traditional large language models.
Submission Number: 25
Loading