Keywords: Long Context LLM, Retrieval-augmented generation
Abstract: In this work, we introduce ChatQA 2, an Llama 3.0-based model with a 128K
context window, designed to bridge the gap between open-source LLMs and
leading proprietary models (e.g., GPT-4-Turbo-2024-04-09) in long context un-
derstanding and retrieval-augmented generation (RAG) capabilities. These two
capabilities are complementary to each other and essential for LLMs to process
large volumes of information that cannot fit into a single prompt. We present
a detailed continued training recipe to extend the context window of Llama3-
70B-base from 8K to 128K tokens, along with a three-stage instruction tun-
ing process to enhance the model’s instruction-following, RAG performance,
and long-context understanding capabilities. Our results demonstrate that the
Llama3-ChatQA-2-70B model outperforms most existing state-of-the-art models,
including GPT-4-Turbo-2024-04-09, Qwen2-72B-Instruct, and Llama3.1-70B-
Instruct, on ultra-long tasks beyond 100K tokens, as well as on the RAG benchmark
using only a 4K context window, showing the strong long context capability across
varying sequence lengths. We further provide extensive comparisons between
direct long-context and RAG solutions using the same state-of-the-art long-context
LLMs. Interestingly, we find that the performance of strong long-context LLMs
using RAG improves when retrieving a larger number of chunks. With a large set
of top-k chunks, RAG consistently outperforms direct long-context solution using
the same state-of-the-art long-context models (e.g., Llama3-ChatQA-2-70B and
Qwen2-72B-Instruct) on both 32K and 128K benchmarks. We open-source the
model weights, training data, and the evaluation setup for the for the community:
https://chatqa2-project.github.io/
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8995
Loading