Safe at the Margins: A General Approach to Safety Alignment in Low-Resource English Languages – A Singlish Case Study
Keywords: Responsible AI, LLMs, Safety Alignment, Preference Learning, Singlish, Low-Resource Languages
TL;DR: We provide an effective general framework for LLM safety alignment in low-resource English creoles with Singlish as a case study, obtaining a 99% reduction in Singlish toxicity with generalizable safety gains and consistent benchmark performance.
Abstract: Ensuring the safety of Large Language Models (LLMs) in diverse linguistic settings remains challenging, particularly for low-resource languages. Existing safety alignment methods are English-centric, limiting their effectiveness. We systematically compare Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Kahneman-Tversky Optimization (KTO) for aligning SEA-Lion-v2.1-Instruct, a Llama 3-8B variant, to reduce toxicity in Singlish. Our results show that SFT+KTO achieves superior safety alignment with higher sample efficiency than DPO. Additionally, we introduce KTO-S, which enhances stability via improved KL divergence regularization. Our approach reduces Singlish toxicity by 99%, generalizes to TOXIGEN, and maintains strong performance on standard LLM benchmarks, providing a scalable framework for safer AI deployment in multilingual contexts.
Submission Number: 7
Loading