Causal Feature Learning in the Social Sciences

Jingzhou Huang, Jiuyao Lu, Alexander Williams Tolbert

Published: 2025, Last Modified: 12 May 2025CoRR 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Variable selection poses a significant challenge in causal modeling, particularly within the social sciences, where constructs often rely on inter-related factors such as age, socioeconomic status, gender, and race. Indeed, it has been argued that such attributes must be modeled as macro-level abstractions of lower-level manipulable features, in order to preserve the modularity assumption essential to causal inference. This paper accordingly extends the theoretical framework of Causal Feature Learning (CFL). Empirically, we apply the CFL algorithm to diverse social science datasets, evaluating how CFL-derived macrostates compare with traditional microstates in downstream modeling tasks.