Abstract: Mainstream Large Language Models (LLMs) built on the GPT (Generative Pre-trained Transformer) architecture can only read and generate text in a left-to-right direction.
This limitation prevents these models from comprehensively processing the training data as a whole and from directly deriving suitable prompts from given responses.
Drawing inspiration from the global understanding capability of Bi-LSTM, we introduce Bi-GPT, an enhanced version of the standard GPT architecture that incorporates reverse generation capabilities.
Instead of altering the underlying architecture or adding any extra parameters, Bi-GPT utilizes dual learning with both forward and backward data streams to enable bidirectional generation.
To reduce the training cost, we design a two-stage pretraining strategy that can transform any existing LLM into the bidirectional version.
We train Bi-GPT with different scales and conduct a comprehensive set of experiments, including conventional forward response generation, reverse instruction generation, and token classification tasks, to thoroughly validate its capabilities.
The results show that the incorporation of bidirectional training data improves the forward generation capability (+8% on 5 datasets) and overall performance in token classification tasks.
Furthermore, Bi-GPT effectively bridges the gap between responses and prompts, allowing for the exploration of potential prompt and meta-prompt generation from a single instance.
In summary, Bi-GPT significantly expands the application scenarios and capabilities of GPT without adding any new parameters.
Paper Type: Long
Research Area: Language Modeling
Research Area Keywords: pre-training,prompting,fine-tuning
Contribution Types: Model analysis & interpretability, Theory
Languages Studied: english
Submission Number: 6780
Loading