FACA-GEN: Investigating Bias and Generalization in Active Learning for Genomics AI

Published: 05 Mar 2025, Last Modified: 24 Apr 2025MLGenX 2025 TinyPapersEveryoneRevisionsBibTeXCC BY 4.0
Track: Tiny paper track (up to 4 pages)
Abstract:

In the rapidly evolving field of Genomics AI, fairness and generalization are critical challenges, especially when AI systems rely on Active Learning (AL) to optimize data selection. Traditional AL methods, while effective in selecting informative samples, often overlook fairness considerations, leading to biased models that fail to generalize across diverse populations. This paper introduces Fairness-Aware Causal Active Learning for Genomics AI (FACA-GEN), a novel framework that integrates fairness-aware AL, Causal Representation Learning (CRL), and Reinforcement Learning (RL) to address these issues. FACA-GEN dynamically selects training samples while optimizing for both fairness and causal validity, ensuring that models do not rely on biased proxies like race or ethnicity. We employ multi-objective optimization to balance informativeness, fairness, and causal validity, using RL to adaptively adjust fairness constraints over time. Additionally, we introduce Causal Consistency Loss to enforce the learning of true genetic markers and mitigate shortcut biases. Our approach actively selects samples based on informativeness, fairness, and causal relevance, overcoming bias and shortcut learning prevalent in genomics AI. Through experiments on genomics datasets, we demonstrate that FACA-GEN significantly improves model fairness and generalization, offering a more robust and equitable solution for AI-driven biology. The results show significant improvements in fairness metrics (Demographic Parity, Equalized Odds) and causal validity compared to existing methods.

Submission Number: 91
Loading