Korean Culture into LLM Alignment: From Refusal to Cultural Coherence

Published: 01 Jun 2026, Last Modified: 01 Jun 2026Culture x AI 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Culture AI, Safety Alignment, Sociolegal Coherence, Korean LLM
TL;DR: A Korean sociolegal-grounded DPO pipeline that shifts LLM cultural alignment from what to refuse toward what to say, improving safety without hurting general capability.
Abstract: Cultural-aspect work on large language models is dominated by a negative target: which outputs to suppress. We argue that a constructive counterpart is also needed, a working definition of what a culturally coherent response is rather than only what it must avoid, and instantiate it for Korean. Building on the culturally adaptive red-teaming benchmark CAGE, we transplant its Korean harm taxonomy into an alignment-data pipeline whose centrepiece is a Korean-culturally-adapted safe-response policy: a per-category guideline grounded in Korean legal frameworks, social norms, and interpretive conventions, against which three frontier models each produce a candidate response. DPO fine-tuning on the resulting triplets improves CAGE safe rate across six open-weight LLMs without measurable degradation on Korean general-capability benchmarks, and qualitative outputs show fine-tuned models naming Korean statutes and institutional procedures rather than issuing flat refusals.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 43
Loading