TeleLoRA: Teleporting Alignment across Large Language Models for Trojan Mitigation

Published: 05 Mar 2025, Last Modified: 05 Mar 2025ICLR 2025 Workshop Weight Space Learning PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: tiny / short paper (up to 4 pages)
Keywords: Trojan detection, Trojan mitigation, backdoor mitigation, permutation symmetry, meta learning, hyper networks, weight space learning
TL;DR: We designed a permutation symmetric LoRA weight generator, that transfers SFT alignment across multiple LLMs for mitigating LLM-specific backdoors.
Submission Number: 6
Loading