\begin{abstract}
    Text-to-image diffusion models, such as Stable Diffusion, have demonstrated remarkable capabilities in generating high-quality and diverse images from natural language prompts. However, recent studies reveal that these models often replicate and amplify societal biases, particularly along demographic attributes like gender and race. In this paper, we introduce \textbf{FairImagen}, a post-hoc debiasing framework that operates on prompt embeddings to mitigate such biases without retraining or modifying the underlying diffusion model. Our method integrates Fair Principal Component Analysis (FairPCA) to project CLIP-based embeddings into a subspace that minimizes group-specific information while preserving semantic content. We further enhance debiasing effectiveness with empirical noise injection and propose a unified cross-demographic projection that avoids the over-pruning issues of sequential attribute correction. Extensive experiments across gender, race, and intersectional settings demonstrate that FairImagen significantly improves fairness with minimal compromise in image quality or prompt fidelity. Our framework outperforms existing post-hoc methods and offers a simple, scalable, and model-agnostic solution for equitable text-to-image generation.
\end{abstract}
  
\section{Introduction}

Recent advances in  text-to-image generation have led to the widespread adoption of models such as Stable Diffusion~\cite{rombach2022high,esser2024scaling}, DALL·E~\cite{ramesh2021zero,ramesh2022hierarchical}, Imagen~\cite{saharia2022photorealistic}, and Parti~\cite{yu2022scaling}, which are capable of producing photorealistic and diverse images conditioned on natural language prompts. These models typically leverage powerful vision-language encoders such as CLIP~\cite{radford2021learning} to align textual and visual modalities, enabling open-ended image generation from arbitrary user input. Due to their creative flexibility, scalability, and accessibility, these systems are increasingly integrated into applications across design, content creation, and interactive media.

However, studies have shown that these generative models often replicate and even amplify social biases present in the training data~\citep{friedrich2023FairDiffusion,naik2023social,wan2024survey,zhang2023iti,shukla2025biasconnect}. For example, prompts like ``a photo of a CEO'' or ``a nurse'' typically yield images depicting white males and females, respectively, reflecting gender and racial stereotypes. These biases pose serious concerns regarding fairness, representation, and downstream harms, particularly as generative models are integrated into public-facing systems.

To mitigate such biases, researchers have proposed a variety of debiasing techniques. As summarized in \Cref{tab:method_comparison}, these methods fall into three main categories: \textbf{Prompt-based}, \textbf{Fine-tuning-based}, and \textbf{Post-hoc editing}. Prompt-based approaches~\cite{friedrich2023FairDiffusion,sakurai2025fairt2i,bansal2022well} modify the input prompt to influence the model's output, but often require heuristic rewriting or manually curated prompts for each image. Fine-tuning-based methods~\cite{li2023fair,zhang2023iti,shen2023finetuning} retrain or adapt parts of the model to encode fairness objectives, but they are computationally intensive and require access to model internals. Post-hoc editing methods~\cite{li2024self,tanjim2024discovering,zhang2023iti} modify the prompt embedding at inference time without updating model weights, offering a lightweight and deployment-friendly alternative. Nevertheless, each category exhibits trade-offs in terms of fidelity, interpretability, and generalizability.


In this paper, we focus on \textbf{post-hoc editing} methods due to their simplicity and compatibility with off-the-shelf diffusion models. Prior approaches such as SDID~\cite{li2024self} and TBIE~\cite{tanjim2024discovering} demonstrate the feasibility of manipulating prompt embeddings to mitigate demographic bias. SDID constructs a gender direction by subtracting CLIP embeddings of hand-crafted prompt pairs (e.g., “a photo of a man” vs. “a photo of a woman”), and then injects or subtracts this vector during generation. However, this approach heavily relies on the assumption that group bias is linearly separable and can be corrected via a single direction. This makes it brittle when applied to more nuanced prompts or to prompts involving multiple demographic dimensions, such as race and gender simultaneously.
TBIE improves over SDID by applying PCA on CLIP embeddings of gender-related words to identify bias directions in a data-driven way. However, it still performs linear subtraction along a few principal components without an explicit optimization criterion. As a result, the debiasing process can be overly aggressive—removing not just demographic cues but also essential semantic information—leading to semantic drift, loss of prompt fidelity, and unnatural image generation. Furthermore, both methods offer limited control over the trade-off between fairness and content alignment, and neither generalizes well to intersectional prompts or unseen groups.
To overcome these limitations, we propose a theoretically principled framework based on FairPCA, which explicitly optimizes for semantic preservation while minimizing group-dependent variance. Our method is capable of handling multi-demographic settings in a unified manner, provides interpretable control over the amount of bias removal, and generalizes robustly across diverse prompt scenarios.


We propose a novel post-hoc debiasing framework called \textbf{FairImagen}, which integrates Fair Principal Component Analysis (FairPCA)~\cite{kleindessner2023efficient} into the Stable Diffusion pipeline. Our method operates in three stages: first, we extract CLIP-based prompt embeddings; second, we apply a fairness-aware projection using FairPCA to remove group-dependent directions from both pooled and token-level embeddings; finally, we synthesize images from the transformed embeddings using a modified Stable Diffusion decoder. To further enhance robustness, we introduce empirical noise injection and a unified cross-demographic debiasing scheme that jointly mitigates intersectional bias. Unlike existing post-hoc approaches, FairImagen requires no manual design of bias directions, is fully compatible with off-the-shelf diffusion models, supports multiple demographic attributes simultaneously, and preserves visual quality while effectively reducing bias.

Our contributions are summarized as follows:
\begin{itemize}
    \item We introduce a post-hoc fairness framework that integrates FairPCA with diffusion-based text-to-image generation, enabling bias mitigation without model retraining.
    \item We propose empirical noise injection to obscure residual demographic signals and improve fairness-performance trade-offs.
    \item We develop a cross-demographic debiasing formulation that handles multiple protected attributes in a unified projection space, avoiding over-pruning from sequential projections.
    \item We conduct extensive quantitative and qualitative evaluations across gender, race, and joint debiasing tasks, demonstrating that our method outperforms existing post-hoc baselines.
\end{itemize}


\begin{table*}[t]
    \centering
    \scriptsize
    \resizebox{\textwidth}{!}{
    \begin{tabular}{l|c|c|c}
    \toprule
    \textbf{Criteria} & \textbf{Prompt-based} & \textbf{Fine-tuning-based} & \textbf{Post-hoc editing} \\
    \hline
    Training-free & \textcolor{green}{\checkmark} & \textcolor{red}{\ding{55}} & \textcolor{green}{\checkmark} \\
    \hline
    Black-box compatible & \textcolor{green}{\checkmark} & \textcolor{red}{\ding{55}} & \textcolor{green}{\checkmark} \\
    \hline
    Low human effort & \textcolor{red}{\ding{55}} & \textcolor{green}{\checkmark} & \textcolor{green}{\checkmark} \\
    \hline
    Low computational cost & \textcolor{green}{\checkmark} & \textcolor{red}{\ding{55}} & \textcolor{green}{\checkmark} \\
    \hline
    Generalizable to new prompts & \textcolor{red}{\ding{55}} & \textcolor{green}{\checkmark} & \textcolor{green}{\checkmark} \\
    \hline
    Strong bias mitigation & \textcolor{red}{\ding{55}} & \textcolor{green}{\checkmark} & \textcolor{green}{\checkmark} \\
    \hline
    Preserves prompt fidelity & \textcolor{green}{\checkmark} & \textcolor{green}{\checkmark} & \textcolor{red}{\ding{55}} \\
    \hline
    Easy deployment & \textcolor{red}{\ding{55}} & \textcolor{red}{\ding{55}} & \textcolor{green}{\checkmark} \\
    \bottomrule
    \end{tabular}
    }
    \caption{Comparison of prompt-based, fine-tuning-based, and post-hoc editing methods for debiasing text-to-image generation.}
    \label{tab:method_comparison}
\end{table*}


\section{Related Works}
Existing debiasing methods for text-to-image generation can be categorized into three types: Prompt-based, Fine-tuning-based, and Training-free methods.

\textbf{Prompt-based methods} mitigate bias by modifying the input prompts. \citet{friedrich2023FairDiffusion} propose Fair Diffusion using fairness-guided prompts constructed from demographic opposites. \citet{sakurai2025fairt2i} utilize LLMs to automatically detect and revise biased prompts. \citet{bansal2022well} and \citet{chuang2023debiasing} examine ethical interventions and latent direction projection. \citet{kim2023stereotyping} and \citet{al2024equiprompt} develop learned fairness prompts, and \citet{bianchi2023easily} assess the impact of biased prompts at scale. These methods are flexible but often rely on heuristic or external guidance for each single image which have limitations such as being opaque and laborious \citep{bansal2022well,zhang2023iti}.

\textbf{Fine-tuning based methods} update model parameters to enforce fairness. \citet{li2023fair} introduce Fair Mapping by training a linear projection layer. \citet{zhang2023iti} align prompt embeddings with fair visual examples. \citet{shen2023finetuning} apply a distributional alignment loss for fairness. \citet{kim2023stereotyping}, \citet{orgad2023editing}, and \citet{gandikota2024unified} propose fine-tuning specific modules or applying concept editing. \citet{parihar2024balancing} incorporate interpretable latent directions and population-level optimization, respectively. These methods provide effective bias mitigation but often require costly model access and retraining.

\textbf{Post-hoc editing methods} avoid parameter updates and modify inference behavior. \citet{zhang2023iti} and \citet{li2024self} manipulate prompt embeddings with CLIP-based or interpretable directions. \citet{tanjim2024discovering} use PCA to subtract biased components. \citet{friedrich2023FairDiffusion} employ classifier-free guidance alternations. \citet{sadat2023cads} explore sampling noise perturbation and conditioning annealing to reveal underrepresented concepts. Post-hoc filtering is also employed in commercial systems \citep{ramesh2021zero}. These methods are deployment-friendly, take advantage of both prompt- and model-based strategies, and avoid the shortcomings of requiring extensive retraining or heavy prompt engineering.






\section{Method}

We propose a fairness-aware text-to-image generation framework that integrates FairPCA\cite{kleindessner2023efficient} with Stable Diffusion\cite{rombach2022high,esser2024scaling}. Our framework aims to reduce social bias by modifying prompt embeddings before image synthesis, while preserving semantic fidelity. It consists of three main components: (1) Prompt Embedding Extractor, (2) Fair Representation Transformer, and (3) Image Generator. Additionally, we propose an empirical noise injection scheme and a unified multi-demographics debiasing method to address limitations of naive multi-attribute correction.

\subsection{Prompt Embedding Extraction}

Given a prompt \( p \), we first encode it using a pre-trained CLIP model~\cite{radford2021learning}. Let \( \{w_1, \dots, w_T\} \) be the tokenized prompt, where \( T \) is the number of tokens. The encoder outputs a token-level embedding matrix \( E_p \in \mathbb{R}^{T \times D} \), where \( D \) is the embedding dimension, and a pooled embedding \( \bar{E}_p \in \mathbb{R}^D \). The pooled embedding is computed as the mean of the token embeddings: $\bar{E}_p = \frac{1}{T} \sum_{t=1}^{T} E_p[t]$. These representations are extracted from the Stable Diffusion text encoder. 
Let $\mathcal{P} = \{p_1, \dots, p_n\}$ denote a set of prompts, each associated with protected attribute labels $a_i \in \mathcal{A}$. For each attribute $a$, we organize the pooled embeddings by group:

$$
X = \{\bar{E}_{p_i}\}_{i=1}^n \in \mathbb{R}^{n \times D}, \quad Z = \{z_i\}_{i=1}^n \in \{0,1\}^{n \times G}
$$

where $z_i$ is a one-hot group indicator for the attribute $a_i$, and $G = |\mathcal{A}|$. These grouped embeddings are used to estimate the bias direction and define fairness-aware projections.

\subsection{Fair Representation Transformer}

We adopt the FairPCA algorithm\cite{kleindessner2023efficient} to project the pooled and token-level embeddings into a subspace that minimizes information about protected attributes while preserving semantic variance.

Let $P \in \mathbb{R}^{D \times d}$ be a projection matrix with $d < D$. The FairPCA objective minimizes:
\begin{equation}
\min_{P^\top P = I} -\operatorname{Tr}(P^\top \Sigma_X P) + \lambda \|Z^\top X P\|_F^2
\label{eq:fairpca}
\end{equation}
where $\Sigma_X = \frac{1}{n} X^\top X$ is the covariance matrix and $\lambda$ is a trade-off parameter.
To eliminate group-dependent components, we compute the bias matrix $B = Z^\top X \in \mathbb{R}^{G \times D}$, and solve PCA within the null space $\mathcal{N}(B)$. The resulting projection $P$ preserves directions uncorrelated with group membership. 
Once obtained, we apply this projection to the embeddings:
$$
\bar{E}_p' = P P^\top \bar{E}_p, \quad E_p' = E_p P P^\top
$$

An optional renormalization step rescales the vectors to maintain their original norms.

\subsection{Empirical Noise Injection}

To further mitigate residual bias in the embedding space, we introduce an empirical noise injection mechanism that perturbs representations along estimated group-dependent directions. Let \( \mathcal{G} \) denote the set of protected groups (e.g., \( \mathcal{G} = \{\text{Male}, \text{Female}\} \)), and let \( g \in \mathcal{G} \) be a particular group. For each group \( g \), we compute its empirical bias direction as
\[
\nu_g = \frac{1}{|X^{(g)}|} \sum_{\bar{E}_p \in X^{(g)}} \bar{E}_p - \bar{E},
\]
where \( X^{(g)} \) is the set of pooled embeddings belonging to group \( g \), and \( \bar{E} \) is the overall mean embedding across all groups.

We define an empirical distribution \( \mathcal{D}_g \) as the set of scalar projections of group-specific embeddings onto the bias direction:
\[
\mathcal{D}_g = \left\{ \nu_g^\top \bar{E}_p : \bar{E}_p \in X^{(g)} \right\}.
\]
Each value \( \delta \in \mathcal{D}_g \) represents the magnitude of projection of an embedding onto the bias direction \( \nu_g \), quantifying how strongly that embedding aligns with group-specific attributes. To inject noise, we sample \( \delta \sim \mathcal{D}_g \) and apply the perturbation:
\[
\bar{E}_p'' = \bar{E}_p' + \epsilon \cdot \delta \cdot \nu_g,
\]
where \( \epsilon \) is a tunable noise scale parameter. This procedure introduces controlled variability along biased directions to obscure protected group information while preserving semantic structure.

We consider several variants of this perturbation: (i) \emph{random noise}, where \( \delta \) is drawn independently per instance; (ii) \emph{mean noise}, where \( \delta \) is set to the average of \( \mathcal{D}_g \); and (iii) \emph{signed noise}, where \( \delta \in \{-1, +1\} \) is sampled uniformly. These variants offer trade-offs between stochasticity and consistency in the noise injection process.

\subsection{Multi-Demographics Debiasing}

The FairPCA framework~\cite{kleindessner2023efficient} supports debiasing with respect to multiple protected attributes by jointly encoding them into a multi-dimensional attribute matrix. Specifically, it minimizes group information from all attributes simultaneously by applying a single projection based on a stacked group indicator matrix. However, when applied to the image generation setting, we observe that a naive extension of this approach—such as fitting separate projection objectives for each attribute or independently estimating and removing group directions—leads to suboptimal results. Such implementations force the resulting features to become orthogonal to each group direction individually, which restricts the embedding space to directions unrelated to any protected group. As a consequence, the model is only able to retain information aligned with one group at a time, leading to degraded contextual fidelity and loss of important visual details in the generated images.

To overcome this, we propose a unified cross-demographic debiasing method that constructs a single attribute space based on the Cartesian product of all group combinations. For example, if the gender attribute has two groups (Male, Female) and the race attribute has three groups (White, Asian, Black), we define a joint attribute space with six composite groups:

\[
\mathcal{A}_{\text{joint}} = \{ \text{White Male}, \text{White Female}, \text{Asian Male}, \text{Asian Female}, \text{Black Male}, \text{Black Female} \}
\]

Each embedding is assigned a single group indicator corresponding to one of these composite classes. We then apply Fair Representation Transformer once over this joint attribute space.

Our cross-demographic debiasing approach addresses this issue by learning a single projection that jointly considers all group combinations. This unified formulation allows the model to preserve richer semantic content while mitigating multiple demographic biases simultaneously, avoiding the over-pruning effects of repeated or isolated group-wise projections.

\subsection{Image Generator}

After debiasing, we pass the transformed embeddings into a customized Stable Diffusion pipeline\cite{rombach2022high,esser2024scaling}, which supports external prompt embeddings. Specifically, we input:

$$
I_p = \mathcal{G}(\bar{E}_p'', E_p')
$$

where $\mathcal{G}(\cdot)$ denotes the generation function, and $\bar{E}_p''$ and $E_p'$ are the pooled and token-level debiased embeddings.
We generate images using classifier-free guidance with scale $w = 7.0$ and $T = 28$ diffusion steps. Images are generated in batches (e.g., 12 per prompt), stitched, and evaluated with fairness and perceptual quality metrics.






\section{Experiments}
\subsection{Experimental Settings}
Our model is implemented in PyTorch. During feature extraction, embeddings are cached per demographic group. Projection matrices are estimated offline and reused. Image generation is parallelized across GPUs using our modified pipeline, which extends HuggingFace's \texttt{StableDiffusion3Pipeline} to accept external embeddings and apply FairPCA at inference time.

\subsection{Dataset} We extend  Winobias \fzh{cite} dataset to 120 professions.


\subsection{Evaluation}

We report four scalar metrics: accuracy, gender\fzh{we not only contain gender} fairness, MUSIQ, and their arithmetic mean. Gender fairness is computed with DeepFace \cite{serengil2024lightface}: we count male and female faces in every generated image and score the distribution with \(1-\tfrac12\lVert p-\tfrac12\mathbf 1\rVert_1\), following the normalized-deviation formulation of Teo et al.\ \cite{teo_measuring_2021} (1 is perfectly balanced, 0 maximally skewed). Accuracy is CLIPScore \cite{hessel_clipscore_2022}, the cosine similarity between the prompt and the image embedding from CLIP ViT-B/16; we apply the paper’s 2.5 scale factor, cap at 100, and then divide by 100 so the final value lies in \([0,1]\). MUSIQ \cite{ke_musiq_2021}, a no-reference perceptual-quality model trained on millions of aesthetic ratings, is likewise divided by 100 for the same range. The simple average of the three metrics (mean) provides a single headline figure that jointly reflects semantic alignment, demographic equity, and visual fidelity.



\subsection{Comparison Models}
\textbf{Base} is the vanilla Stable Diffusion model, which directly generates images from the pr without any fairness intervention.

\textbf{FairPrompt} follows \cite{friedrich2023FairDiffusion,sakurai2025fairt2i} by using human-designed prompts for each image. We evenly apply different prompts corresponding to protected groups for each individual image. This serves as an upper-bound performance baseline, as each prompt is specifically tailored for fairness.

\textbf{ForcePrompt} explicitly includes fairness-related instructions in the prompt, directing the Stable Diffusion model to generate fair representations.

\textbf{CDA} (Counterfactual Data Augmentation) \cite{zmigrod2019counterfactual,webster2020measuring} replaces gendered words with their counterfactual counterparts, such as replacing “man” with “woman.” We follow the CDA methodology to construct counterfactual samples and augment the dataset.

\textbf{TBIE} (Text-Based Image Editing) \cite{tanjim2024discovering} applies PCA to gender-related words and performs debiasing along the identified principal components.

\textbf{SDID} \cite{li2024self} computes a gender vector using the difference between gender-specific and gender-neutral embeddings, and injects this vector into the prompt embedding.

\textbf{SDID-AVG} extends the SDID \cite{li2024self} model by computing neutral embeddings through averaging the embeddings within each protected group.

\textbf{ITI-GEN} \cite{zhang2023iti} extracts gender-related CLIP embeddings from images and adds them to the prompt embeddings prior to image generation.


\subsection{Experimental Results}

\subsubsection{Main Experiments}
We apply our model, as well as other baseline models, to generation tasks involving debiasing with respect to gender (\Cref{tab:gendermain}), race (\Cref{tab:racemain}), and both gender and race simultaneously (\Cref{tab:genderracemain}). The results show that: (1) our proposed FPCA model outperforms all postprocessing baseline models in terms of fairness scores across all three scenarios, demonstrating its effectiveness in mitigating bias in various contexts; (2) our proposed FPCA model also outperforms all postprocessing models in terms of average (AVG) scores, indicating that it achieves the best balance among fairness, accuracy, and image quality; and (3) our model consistently outperforms all postprocessing baselines when debiasing both gender and race simultaneously, highlighting its strong capability in addressing multi-attribute bias mitigation. (4) Our proposed FPCA model slightly lags behind other models in terms of Accuracy and MUSIQ. However, given the substantial improvement in Fairness, this trade-off is justified and considered worthwhile. (5) FairPrompt achieves the best performance across all experiments. It should be noted, however, that this model relies on manually designed prompts tailored to each individual image, which is both time-consuming and labor-intensive. As such, it serves primarily as an upper bound to illustrate the best possible performance a model can achieve on this task.

\begin{table*}[t]
    \centering
    \scriptsize
    \scriptsize
    \begin{minipage}[t]{0.3\textwidth}
        \begin{tabular}{@{~}l@{~}@{~}l@{~}@{~}l@{~}@{~}l@{~}@{~}l@{~}}
        \toprule
         & Fairness & Accuracy & MUSIQ & Avg \\
        \midrule
        Base & 0.076 & 0.818 & 0.616 & 0.503 \\
        FairPrompt & 0.621 & 0.795 & 0.607 & 0.674 \\
        ForcePrompt & 0.167 & 0.797 & 0.622 & 0.529 \\
        CDA & 0.318 & 0.811 & 0.592 & 0.574 \\
        TBIE & 0.424 & 0.803 & 0.604 & 0.611 \\
        SDID & 0.333 & 0.814 & 0.601 & 0.583 \\
        SDID-AVG & 0.394 & 0.804 & 0.582 & 0.593 \\
        ITI & 0.152 & 0.803 & 0.58 & 0.512 \\
        FPCA & 0.455 & 0.798 & 0.567 & 0.606 \\
        \bottomrule
        \end{tabular}
        \caption{Gender debiasing results.}
        \label{tab:gendermain}

    \end{minipage}%
    \quad
    \begin{minipage}[t]{0.3\textwidth}

        \centering
        \scriptsize
        
        \begin{tabular}{@{~}l@{~}@{~}l@{~}@{~}l@{~}@{~}l@{~}@{~}l@{~}}
        \toprule
         & Fairness & Accuracy & MUSIQ & Avg \\
        \midrule
        Base & 0.155 & 0.819 & 0.615 & 0.529 \\
        FairPrompt & 0.491 & 0.793 & 0.598 & 0.627 \\
        ForcePrompt & 0.236 & 0.789 & 0.605 & 0.544 \\
        CDA & 0.264 & 0.819 & 0.60 & 0.561 \\
        TBIE & 0.282 & 0.812 & 0.568 & 0.554 \\
        SDID & 0.336 & 0.817 & 0.571 & 0.575 \\
        SDID-AVG & 0.273 & 0.814 & 0.597 & 0.561 \\
        ITI & 0.227 & 0.799 & 0.534 & 0.52 \\
        FPCA & 0.373 & 0.798 & 0.588 & 0.586 \\
        \bottomrule
        \end{tabular}
        \caption{Race debiasing results.}
        \label{tab:racemain}
    \end{minipage}
    \quad
    \begin{minipage}[t]{0.35\textwidth}
    \begin{tabular}{@{~}l@{~}@{~}L{1cm}@{~}@{~}L{1cm}@{~}@{~}l@{~}@{~}l@{~}@{~}l@{~}}
    \toprule
     & Gender Fairness & Race Fairness & Accuracy & MUSIQ & Avg \\
    \midrule
    Base & 0.0758 & 0.136 & 0.819 & 0.616 & 0.504 \\
    FairPrompt & 0.652 & 0.482 & 0.778 & 0.60 & 0.676 \\
    ForcePrompt & 0.348 & 0.264 & 0.807 & 0.609 & 0.588 \\
    CDA & 0.212 & 0.245 & 0.809 & 0.601 & 0.541 \\
    TBIE & 0.318 & 0.273 & 0.806 & 0.586 & 0.57 \\
    SDID & 0.136 & 0.255 & 0.813 & 0.597 & 0.516 \\
    SDID-AVG & 0.318 & 0.227 & 0.805 & 0.575 & 0.566 \\
    ITI & 0.212 & 0.209 & 0.782 & 0.52 & 0.505 \\
    FPCA & 0.439 & 0.345 & 0.792 & 0.556 & 0.596 \\
    \bottomrule
    \end{tabular}
    
    \caption{Gender \& race debiasing results.}
    \label{tab:genderracemain}
    \end{minipage}
\end{table*}


\subsubsection{Effect of Hidden Dimension on FPCA Performance}

To investigate the impact of dimensionality reduction on the effectiveness of our proposed FPCA method, we conduct experiments varying the number of retained principal components (i.e., hidden dimensions) from 250 to 2000. \Cref{fig:fpca_dim} visualizes the performance trends across three debiasing scenarios: (a) gender debiasing, (b) race debiasing, and (c) joint gender and race debiasing. For each setting, we report the Fairness, Accuracy, MUSIQ (image quality), and overall average score. The results demonstrate a clear trade-off between fairness and other metrics as dimensionality varies. Notably, reducing the number of components tends to improve fairness scores, particularly in gender and race separately, but at the cost of reduced Accuracy and MUSIQ. In contrast, larger hidden dimensions preserve visual and semantic fidelity better, but may reintroduce bias. The joint debiasing setting (\Cref{fig:fpca_dim}c) further reveals the challenge of balancing fairness across multiple attributes simultaneously, with Fairness metrics for gender and race sometimes diverging. Overall, these results underscore the importance of selecting an appropriate dimensionality in FPCA to achieve desirable fairness-utility trade-offs.


\begin{figure}[t]
    \centering
    \begin{minipage}[t]{0.32\textwidth}
        \centering
        \includegraphics[width=\linewidth]{imgs/fpca_gender_hdim.pdf}
        \caption*{(a) Gender Debiasing}
    \end{minipage}
    \hfill
    \begin{minipage}[t]{0.32\textwidth}
        \centering
        \includegraphics[width=\linewidth]{imgs/fpca_race_hdim.pdf}
        \caption*{(b) Race Debiasing}
    \end{minipage}
    \hfill
    \begin{minipage}[t]{0.32\textwidth}
        \centering
        \includegraphics[width=\linewidth]{imgs/fpca_genderrace_hdim.pdf}
        \caption*{(c) Gender + Race Debiasing}
    \end{minipage}
    \vspace{-1em}
    \caption{Visualization of debiasing results across different protected attributes wrt different hidden dimensions.}
    \label{fig:fpca_dim}
\end{figure}

\subsubsection{Effect of Empirical Noise on FPCA Debiasing}

We further examine the influence of the empirical noise parameter (\texttt{enoise}) in the FPCA framework, which controls the magnitude of perturbation added to simulate empirical distributions aligned with different protected attributes. As \texttt{enoise} increases, the sampled directions more closely follow gender- or race-specific variations, enabling stronger debiasing effects. \Cref{fig:fpca_enoise} presents the performance across gender, race, and joint gender+race debiasing tasks under varying noise levels from 0.0 to 1.0. We observe that higher \texttt{enoise} values significantly improve fairness metrics, particularly in the joint debiasing setting, where both gender and race fairness steadily increase. However, this comes with a modest trade-off in Accuracy and MUSIQ, suggesting a balance must be struck between fairness gains and fidelity preservation. These results validate the effectiveness of controlled empirical noise injection in sampling representative latent directions that better approximate the real distribution of protected attributes.

\begin{figure}[t]
    \centering
    \begin{minipage}[t]{0.32\textwidth}
        \centering
        \includegraphics[width=\linewidth]{imgs/fpca_gender_enoise.pdf}
        \caption*{(a) Gender Debiasing}
    \end{minipage}
    \hfill
    \begin{minipage}[t]{0.32\textwidth}
        \centering
        \includegraphics[width=\linewidth]{imgs/fpca_race_enoise.pdf}
        \caption*{(b) Race Debiasing}
    \end{minipage}
    \hfill
    \begin{minipage}[t]{0.32\textwidth}
        \centering
        \includegraphics[width=\linewidth]{imgs/fpca_genderrace_enoise.pdf}
        \caption*{(c) Gender + Race Debiasing}
    \end{minipage}
    \vspace{-1em}
    \caption{Visualization of debiasing results across different protected attributes wrt different enoise.}
    \label{fig:fpca_enoise}
\end{figure}


\subsubsection{Qualitative Case Study}

To qualitatively assess the visual impact of our debiasing strategies, we examine the prompt “Generate a photo of a face of a CEO” under four FPCA configurations: Base (no debiasing), FPCA-Gender, FPCA-Race, and FPCA-Gender+Race. As shown in \Cref{fig:ceo_fpca}, the Base model outputs an overwhelmingly white male distribution, reflecting common stereotypes. Applying FPCA-Gender (\Cref{fig:ceo_fpca}b) increases the representation of women while preserving overall visual consistency. FPCA-Race (\Cref{fig:ceo_fpca}c) yields a more racially diverse set of outputs, including Black and Asian individuals in executive portrayals. When both gender and race debiasing are applied simultaneously (\Cref{fig:ceo_fpca}d), the resulting outputs display substantially more diversity across both dimensions. However, we also observe that increased diversity comes with side effects: background elements, styles, and image aesthetics become more varied and occasionally inconsistent. This is attributed to the stronger empirical noise added during the FPCA-Gender+Race process, which pushes the latent representation further from the original prompt embedding. While this enhances demographic fairness, it also affects the visual context of the generation, illustrating the trade-off between fairness enforcement and content stability.


\begin{figure}[t]
    \centering
    \begin{minipage}[t]{0.24\linewidth}
        \centering
        \includegraphics[width=\linewidth]{imgs/base_CEO.jpeg}
        \caption*{(a) Base}
    \end{minipage}
    \hfill
    \begin{minipage}[t]{0.24\linewidth}
        \centering
        \includegraphics[width=\linewidth]{imgs/fpca_gender_CEO.jpeg}
        \caption*{(b) FPCA-Gender}
    \end{minipage}
    \hfill
    \begin{minipage}[t]{0.24\linewidth}
        \centering
        \includegraphics[width=\linewidth]{imgs/fpca_race_CEO.jpeg}
        \caption*{(c) FPCA-Race}
    \end{minipage}
    \hfill
    \begin{minipage}[t]{0.24\linewidth}
        \centering
        \includegraphics[width=\linewidth]{imgs/fpca_genderrace_CEO.jpeg}
        \caption*{(d) FPCA-Gender+Race}
    \end{minipage}
    \caption{Generated results for the prompt ``Generate a photo of a face of a CEO'' under different FPCA debiasing settings.}
    \label{fig:ceo_fpca}
\end{figure}

\subsubsection{Evaluation on Occupations with Man/White Dominance}

To further evaluate the debiasing performance across different methods, we focus on a subset of occupation prompts that are predominantly biased toward male and white representations in the baseline model. \Cref{fig:genderproportions} shows the gender proportions generated by each model, ordered by the male ratio. We observe that baseline methods such as Base and ITI generate overwhelmingly male-dominated outputs, while methods like FPCA and FairPrompt substantially improve gender balance by increasing the proportion of women. Similarly, \Cref{fig:raceproportions} presents race proportions across methods for occupations biased toward white individuals. The Base model exhibits significant white dominance, whereas FPCA and FairPrompt produce outputs with noticeably more racial diversity, including increased representation of Black, Asian, and Latino Hispanic individuals. These results indicate that FPCA effectively mitigates demographic bias in highly skewed occupational prompts, achieving fairness levels comparable to FairPrompt while maintaining model-agnostic and training-free characteristics.

\begin{figure}[t]
    \centering
    \begin{minipage}[t]{0.32\textwidth}
        \centering
        \includegraphics[width=\linewidth]{imgs/gender_proportions.pdf}
        \caption{Gender proportions for male-biased occupations. Methods are ordered by male ratio.}
        \label{fig:genderproportions}
    \end{minipage}
    \hfill
    \begin{minipage}[t]{0.32\textwidth}
        \centering
        \includegraphics[width=\linewidth]{imgs/race_proportions.pdf}
        \caption{Race proportions for white-biased occupations. Methods are ordered by white ratio.}
        \label{fig:raceproportions}
    \end{minipage}
    \hfill
    \begin{minipage}[t]{0.32\linewidth}
        \centering
        \includegraphics[width=\linewidth]{imgs/gender_history.pdf}
        \caption{Gender history.}
        \label{fig:genderhistory}
    \end{minipage}
\end{figure}


\begin{figure}[t]
    \centering
    \begin{minipage}[t]{0.32\linewidth}
        \centering
        \includegraphics[width=\linewidth]{imgs/blacksmith_base.jpeg}
        \caption*{(a) Base}
    \end{minipage}
    \hfill
    \begin{minipage}[t]{0.32\linewidth}
        \centering
        \includegraphics[width=\linewidth]{imgs/blacksmith_fairpmpt.jpeg}
        \caption*{(b) FairPrompt}
    \end{minipage}
    \hfill
    \begin{minipage}[t]{0.32\linewidth}
        \centering
        \includegraphics[width=\linewidth]{imgs/blacksmith_fpca.jpeg}
        \caption*{(c) FPCA}
    \end{minipage}
    \caption{Generated results of the prompt “a middle ages blacksmith” using three different methods.}
    \label{fig:blacksmith_examples}
\end{figure}

\subsubsection{Robustness to Gender-Determined Prompts}

We further evaluate whether FPCA can preserve fidelity when the input prompt has a strongly determined gender association, particularly in the case of historical figures or occupations where gender is typically fixed. Specifically, we examine prompts such as “a middle ages blacksmith”, “the Pope”, “the King of France”, and etc. which conventionally imply male representations. \Cref{fig:genderhistory} shows that across these historically gender-fixed prompts, most models still generate predominantly male outputs, with FairPrompt slightly increasing the proportion of women even in male-dominant contexts. \Cref{fig:blacksmith_examples} provides qualitative comparisons of the generated images for the blacksmith prompt across Base, FairPrompt, and FPCA. While FairPrompt injects female representations regardless of prompt semantics, FPCA respects the inherent gender preference encoded in the prompt embedding and yields mostly male depictions. This behavior highlights a desirable property of FPCA: when the prompt has a strong and justified gender bias, FPCA does not override it unnecessarily. Thus, FPCA adapts to prompt intent while still being effective for less explicitly gendered contexts.

\section{Conclusion}

In this work, we present \textbf{FairImagen}, a novel post-hoc debiasing framework for text-to-image generation that integrates Fair Principal Component Analysis into the Stable Diffusion pipeline. Our method modifies prompt embeddings to mitigate demographic biases without requiring model retraining or prompt rewriting. Through a fairness-aware projection, empirical noise injection, and a unified cross-demographic formulation, FairImagen achieves strong bias reduction while preserving visual fidelity and prompt alignment. Extensive experiments across gender, race, and intersectional attributes demonstrate that our approach outperforms existing post-hoc baselines on both fairness and utility metrics. By offering a training-free, model-agnostic, and extensible solution, FairImagen paves the way for more equitable and controllable generative systems.