Reproducing FACTER: Fairness via Conformal Thresholding and Prompt Repair

Published: 21 Jun 2026, Last Modified: 21 Jun 2026Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Fayyazi et al. (2025) recently proposed FACTER, a model-agnostic framework designed to jointly enforce fairness and statistical coverage in LLM-based recommendation through conformal thresholding and iterative prompt repair. In this work, we conduct a reproducibility study of the FACTER framework across diverse architectures and dataset sparsity levels, evaluating both the original open-ended generation task and a constrained re-ranking extension. Under the strict reproduction, we observe a divergence in recommendation utility, which we trace to underspecified target-set evaluation in the original study. We then use the constrained re-ranking setting to evaluate FACTER when the candidate set is fixed, and introduce a static Fair Zero-Shot baseline to isolate the contribution of the iterative prompt repair loop. Our analysis shows that FACTER consistently reduces adaptive-threshold violation counts, but that these reductions are not consistently reflected under the fixed threshold or in global fairness metrics. In the constrained ranking setting, static fairness instructions achieve comparable semantic-parity outcomes to FACTER's dynamic repair loop, suggesting that the additional online repair mechanism provides limited benefit in this formulation. All code and reproduction artifacts are available at https://github.com/oscar-omlf/facter-repr.
Submission Type: Regular submission (no more than 12 pages of main content)
Code: https://github.com/oscar-omlf/facter-repr
Assigned Action Editor: ~changjian_shui1
Submission Number: 7666
Loading