Ladder Up, Memory Down: Low-Cost Fine-Tuning With Side Nets

Ladder Up, Memory Down: Low-Cost Fine-Tuning With Side Nets

ACL ARR 2026 January Submission2764 Authors

03 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: ladder-tuning, lst, peft, memory-effiency

Abstract: Fine-tuning large language models (LLMs) is often limited by the memory available on commodity GPUs. Parameter-efficient fine-tuning (PEFT) methods such as QLoRA reduce the number of trainable parameters, yet still incur high memory usage induced by the backward pass in the full model. We revisit Ladder Side Tuning (LST), a rarely explored PEFT technique that adds a lightweight side network, and show that it matches QLoRA's compute scaling slope while cutting peak memory by 50\%. Across different downstream benchmarks spanning natural language understanding, mathematical and LLM-critic tasks, LST has accuracy close to QLoRA, with a small trade-off, while offering improved memory feasibility. This efficiency enables fine-tuning of 7B-parameter models on a single 12 GB consumer GPU with 2k-token contexts, requiring no gradient checkpointing\textemdash conditions under which QLoRA exhausts memory. Beyond memory efficiency, we also establish scaling laws showing that LST has similar scaling slopes to QLoRA. We exploit Ladder's architectural flexibility by introducing xLadder, a depth-extended variant that increases effective depth via cross-connections and can shorten chain-of-thought (CoT) at fixed parameter count. Upon acceptance, we will open source our implementation~\url{https://anonymous.4open.science/r/arr_repo-64AD}.

Paper Type: Long

Research Area: LLM Efficiency

Research Area Keywords: peft, llm effiency, memory effiency, comodity gpu

Contribution Types: Approaches to low-resource settings, Approaches low compute settings-efficiency

Languages Studied: English

Submission Number: 2764

Loading