TL;DR: A hypernetwork that outputs LoRA matrices for any input docoment, allowing the target LLM to answer related questions without the document in context
Abstract: Long input sequences are central to in-context learning, document understanding, and multi-step reasoning in Large Language Models (LLMs).
However, the quadratic attention cost of Transformers makes inference memory-intensive and slow.
While context distillation (CD) can transfer information into model parameters, per-prompt distillation is impractical due to training costs and latency.
To address these limitations, we propose Doc-to-LoRA (D2L), a lightweight hypernetwork that meta-learns to perform approximate CD within a single forward pass.
Given an unseen prompt, D2L generates a LoRA adapter for a target LLM, enabling subsequent queries to be answered without re-consuming the original context, reducing latency and KV-cache memory consumption during target LLM inference.
On a long-context needle-in-a-haystack task, D2L successfully learns to map contexts into adapters that store the needle information, achieving near-perfect zero-shot accuracy at sequence lengths exceeding the target LLM’s native context window by more than 4x. On real-world QA datasets with limited compute, D2L outperforms standard CD while significantly reducing peak memory consumption and update latency.
We envision that D2L can facilitate rapid adaptation of LLMs, opening up the possibility of frequent knowledge updates and personalized chat behavior.
Code and checkpoints are available at https://github.com/SakanaAI/doc-to-lora.
Lay Summary: Large language models can answer questions about a document if you paste the document into the prompt. The problem is that long documents make the model slower, more expensive, and more memory-hungry. The model has to keep “looking at” the whole document every time you ask a question. This paper proposes Doc-to-LoRA (D2L) a method that lets a language model “absorb” a document into a small temporary add-on called a LoRA adapter. After that, the model can answer questions about the document without needing the document in the prompt again.
Link To Code: https://github.com/SakanaAI/doc-to-lora
Primary Area: General Machine Learning->Transfer, Multitask and Meta-learning
Keywords: large language models, meta-learning, context distillation, hypernetworks, adaptation, memory, personalization, efficiency
Originally Submitted PDF: pdf
Submission Number: 11583
Loading