FairSISA: Ensemble Post-Processing to Improve Fairness of Unlearning in LLMs

Published: 23 Oct 2023, Last Modified: 28 Nov 2023SoLaR PosterEveryoneRevisionsBibTeX
Keywords: Fairness; Unlearning; NLP
TL;DR: This work examines an interplay between unlearning and fairness for LLMs, focusing on a popular unlearning framework known as SISA.
Abstract: Training large language models (LLMs) is a costly endeavour in terms of time and computational resources. The large amount of training data used during the unsupervised pre-training phase makes it difficult to verify all data and, unfortunately, undesirable data may be ingested during training. Re-training from scratch is impractical and has led to the creation of the \textit{unlearning} discipline where models are modified to ``unlearn" undesirable information without retraining. However, any modification can alter the behaviour of LLMs, especially on key dimensions such as \textit{fairness}. This is the first work that examines this interplay between unlearning and fairness for LLMs. In particular, we focus on a popular unlearning framework known as SISA [Bourtoule et al., 2021], which creates an ensemble of models trained on disjoint shards. We evaluate the performance-fairness trade-off for SISA, and empirically demsontrate that SISA can indeed reduce fairness in LLMs. To remedy this, we propose post-processing bias mitigation techniques for ensemble models produced by SISA. Through experimental results, we demonstrate the efficacy of our post-processing framework called \textit{FairSISA}.
Submission Number: 106