TeleLoRA: Teleporting Alignment across Large Language Models for Trojan Mitigation
Track: tiny / short paper (up to 4 pages)
Keywords: Trojan detection, Trojan mitigation, backdoor mitigation, permutation symmetry, meta learning, hyper networks, weight space learning
TL;DR: We designed a permutation symmetric LoRA weight generator, that transfers SFT alignment across multiple LLMs for mitigating LLM-specific backdoors.
Submission Number: 6
Loading