Bidirectional GPT

Bidirectional GPT

ACL ARR 2025 February Submission6780 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Mainstream Large Language Models (LLMs) built on the GPT (Generative Pre-trained Transformer) architecture can only read and generate text in a left-to-right direction. This limitation prevents these models from comprehensively processing the training data as a whole and from directly deriving suitable prompts from given responses. Drawing inspiration from the global understanding capability of Bi-LSTM, we introduce Bi-GPT, an enhanced version of the standard GPT architecture that incorporates reverse generation capabilities. Instead of altering the underlying architecture or adding any extra parameters, Bi-GPT utilizes dual learning with both forward and backward data streams to enable bidirectional generation. To reduce the training cost, we design a two-stage pretraining strategy that can transform any existing LLM into the bidirectional version. We train Bi-GPT with different scales and conduct a comprehensive set of experiments, including conventional forward response generation, reverse instruction generation, and token classification tasks, to thoroughly validate its capabilities. The results show that the incorporation of bidirectional training data improves the forward generation capability (+8% on 5 datasets) and overall performance in token classification tasks. Furthermore, Bi-GPT effectively bridges the gap between responses and prompts, allowing for the exploration of potential prompt and meta-prompt generation from a single instance. In summary, Bi-GPT significantly expands the application scenarios and capabilities of GPT without adding any new parameters.

Paper Type: Long

Research Area: Language Modeling

Research Area Keywords: pre-training,prompting,fine-tuning

Contribution Types: Model analysis & interpretability, Theory

Languages Studied: english

Submission Number: 6780

Loading