Extracting Social Determinants of Health with Large Language Models: A Survey of Clinical NLP Methods, Ethics, and Deployment

Extracting Social Determinants of Health with Large Language Models: A Survey of Clinical NLP Methods, Ethics, and Deployment

ACL ARR 2025 July Submission460 Authors

28 Jul 2025 (modified: 23 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Despite accounting for almost half of health outcome variance, social determinants of health (SDOH), encompassing socioeconomic, environmental, and behavioral factors, remain challenging to extract from clinical text. We present the first comprehensive survey of LLM-driven SDOH extraction, examining how large language models can address this critical extraction challenge while introducing new ethical considerations. Synthesizing over 80 peer-reviewed studies to chart the field's evolution from rule-based systems to modern generative models, our analysis reveals that transformer-based approaches consistently outperform earlier machine learning methods, with parameter-efficient techniques like prompt tuning and retrieval-augmented generation making these advances feasible under clinical resource constraints. However, we identify critical gaps: most research lacks essential bias audits, privacy protections, and hallucination controls required for clinical deployment. While emerging ethical frameworks show promise, their adoption remains limited. We consolidate best practices for reproducible SDOH extraction and highlight key challenges, including multilingual coverage, cross-institutional generalization, and cost-effective deployment. This survey provides both a technical road-map and an ethical framework for advancing SDOH extraction toward safe, responsible clinical integration.

Paper Type: Long

Research Area: Information Extraction

Research Area Keywords: clinical NLP, healthcare applications, information extraction, document-level extraction, bias/fairness evaluation, model bias/unfairness mitigation, ethics

Contribution Types: Surveys

Languages Studied: Engllish

Reassignment Request Area Chair: This is not a resubmission

Reassignment Request Reviewers: This is not a resubmission

A1 Limitations Section: This paper has a limitations section.

A2 Potential Risks: Yes

A2 Elaboration: Section 3

B Use Or Create Scientific Artifacts: Yes

B1 Cite Creators Of Artifacts: Yes

B1 Elaboration: Yes, Reference Section

B2 Discuss The License For Artifacts: Yes

B2 Elaboration: Section 4.1 and Appendix

B3 Artifact Use Consistent With Intended Use: Yes

B3 Elaboration: Section 1.2

B4 Data Contains Personally Identifying Info Or Offensive Content: Yes

B4 Elaboration: Section 3.2

B5 Documentation Of Artifacts: Yes

B5 Elaboration: Appendix A

B6 Statistics For Data: N/A

C Computational Experiments: Yes

C1 Model Size And Budget: Yes

C1 Elaboration: Appendix C

C2 Experimental Setup And Hyperparameters: Yes

C2 Elaboration: Appendix C

C3 Descriptive Statistics: N/A

C4 Parameters For Packages: N/A

D Human Subjects Including Annotators: No

D1 Instructions Given To Participants: N/A

D2 Recruitment And Payment: N/A

D3 Data Consent: N/A

D4 Ethics Review Board Approval: N/A

D5 Characteristics Of Annotators: N/A

E Ai Assistants In Research Or Writing: Yes

E1 Information About Use Of Ai Assistants: Yes

E1 Elaboration: 7

Author Submission Checklist: yes

Submission Number: 460

Loading