Extracting Social Determinants of Health with Large Language Models: A Survey of Clinical NLP Methods, Ethics, and Deployment
Abstract: Despite accounting for almost half of health outcome variance, social determinants of health (SDOH), encompassing socioeconomic, environmental, and behavioral factors, remain challenging to extract from clinical text. We present the first comprehensive survey of LLM-driven SDOH extraction, examining how large language models can address this critical extraction challenge while introducing new ethical considerations. Synthesizing over 80 peer-reviewed studies to chart the field's evolution from rule-based systems to modern generative models, our analysis reveals that transformer-based approaches consistently outperform earlier machine learning methods, with parameter-efficient techniques like prompt tuning and retrieval-augmented generation making these advances feasible under clinical resource constraints. However, we identify critical gaps: most research lacks essential bias audits, privacy protections, and hallucination controls required for clinical deployment. While emerging ethical frameworks show promise, their adoption remains limited. We consolidate best practices for reproducible SDOH extraction and highlight key challenges, including multilingual coverage, cross-institutional generalization, and cost-effective deployment. This survey provides both a technical road-map and an ethical framework for advancing SDOH extraction toward safe, responsible clinical integration.
Paper Type: Long
Research Area: Information Extraction
Research Area Keywords: clinical NLP, healthcare applications, information extraction, document-level extraction, bias/fairness evaluation, model bias/unfairness mitigation, ethics
Contribution Types: Surveys
Languages Studied: Engllish
Reassignment Request Area Chair: This is not a resubmission
Reassignment Request Reviewers: This is not a resubmission
A1 Limitations Section: This paper has a limitations section.
A2 Potential Risks: Yes
A2 Elaboration: Section 3
B Use Or Create Scientific Artifacts: Yes
B1 Cite Creators Of Artifacts: Yes
B1 Elaboration: Yes, Reference Section
B2 Discuss The License For Artifacts: Yes
B2 Elaboration: Section 4.1 and Appendix
B3 Artifact Use Consistent With Intended Use: Yes
B3 Elaboration: Section 1.2
B4 Data Contains Personally Identifying Info Or Offensive Content: Yes
B4 Elaboration: Section 3.2
B5 Documentation Of Artifacts: Yes
B5 Elaboration: Appendix A
B6 Statistics For Data: N/A
C Computational Experiments: Yes
C1 Model Size And Budget: Yes
C1 Elaboration: Appendix C
C2 Experimental Setup And Hyperparameters: Yes
C2 Elaboration: Appendix C
C3 Descriptive Statistics: N/A
C4 Parameters For Packages: N/A
D Human Subjects Including Annotators: No
D1 Instructions Given To Participants: N/A
D2 Recruitment And Payment: N/A
D3 Data Consent: N/A
D4 Ethics Review Board Approval: N/A
D5 Characteristics Of Annotators: N/A
E Ai Assistants In Research Or Writing: Yes
E1 Information About Use Of Ai Assistants: Yes
E1 Elaboration: 7
Author Submission Checklist: yes
Submission Number: 460
Loading