LoRA-Guard: Parameter-Efficient Guardrail Adaptation for Content Moderation of Large Language Models

LoRA-Guard: Parameter-Efficient Guardrail Adaptation for Content Moderation of Large Language Models

ACL ARR 2024 June Submission3102 Authors

15 Jun 2024 (modified: 03 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Guardrails have emerged as an alternative to safety alignment for content moderation of large language models (LLMs). Existing model-based guardrails have not been designed for resource-constrained computational portable devices, such as mobile phones, more and more of which are running LLM-based applications locally. We introduce LoRA-Guard, a parameter-efficient guardrail adaptation method that relies on knowledge sharing between LLMs and guardrail models. LoRA-Guard extracts language features from the LLMs and adapts them for the content moderation task using low-rank adapters, while a dual-path design prevents any performance degradation on the generative task. We show that LoRA-Guard outperforms existing approaches with 100-1000x lower parameter overhead while maintaining accuracy, enabling on-device content moderation.

Paper Type: Short

Research Area: Efficient/Low-Resource Methods for NLP

Research Area Keywords: LLM Guardrails, Parameter-Efficient Fine-Tuning, Resource-Constrained Settings

Contribution Types: Approaches to low-resource settings, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models

Languages Studied: English

Submission Number: 3102

Loading