Towards Practical Tool Usage for Continually Learning Large Language Models

Towards Practical Tool Usage for Continually Learning Large Language Models

TMLR Paper3040 Authors

21 Jul 2024 (modified: 15 Nov 2024)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large language models (LLMs) have demonstrated an innate ability to solve complex language-based tasks. Nevertheless, additional insights have suggested that they lack the capacity to adjust for either stored information or task-solving skills becoming outdated, as their knowledge, stored directly within their parameters, remains static in time after pre-training. As a way to counter this, tool use can help by offloading some of this work to systems which LLMs can access through an interface, allowing the LLM to solve tasks without needing to store specialized knowledge. However these LLMs may still need to adapt to nonstationary environments, as new tools can emerge and existing tools can change, but we hypothesize that they may inherently be better suited for continual learning (CL) as they rely less on parametric memory for directly solving tasks and instead focus on learning when to apply pre-defined tools. To verify this, we develop synthetic arithmetic benchmarks and follow this by aggregating existing NLP tasks to form a more realistic testing scenario. While we demonstrate scaling model size is not an explicit solution to learning and adapting to tasks presented sequentially, regardless of whether or not the models tool usage, continual learning techniques can enable tool-augmented LLMs to both adapt faster while forgetting less, highlighting their potential as continual learners.

Submission Length: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Greg_Durrett1

Submission Number: 3040

Loading