EmplifAI: a Fine-grained Dataset for Japanese Empathetic Medical Dialogues in 28 Emotion Labels

ACL ARR 2025 July Submission628 Authors

28 Jul 2025 (modified: 19 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: This paper introduces EmplifAI, a Japanese empathetic dialogue dataset designed to support patients coping with chronic medical conditions. They often experience a wide range of positive and negative emotions (e.g., hope and despair) that shift across different stages of disease management. EmplifAI addresses this complexity by providing situation-based dialogues grounded in 28 fine-grained emotion categories, adapted and validated from the GoEmotions taxonomy. The dataset includes 280 medically contextualized situations and 4,125 two-turn dialogues, collected through crowdsourcing and expert review. To evaluate emotional alignment with the empathetic dialogues, we assessed model predictions on the situation-dialogue pairs using BERTScore across multiple large language models (LLMs), achieving F1 scores of ≤ 0.84. Fine-tuning a baseline Japanese LLM (LLM-jp-3.1-13b-instruct4) with EmplifAI led to notable improvements in fluency, general empathy and emotion specific empathy, as measured by LLM-as-a-Judge evaluation. These findings suggest that EmplifAI serves as a strong foundation for developing culturally and medically attuned empathetic dialogue systems in Japanese.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: language resources, datasets for low resource languages, benchmarking, NLP datasets, metrics
Contribution Types: Data resources, Data analysis
Languages Studied: Japanese
Reassignment Request Area Chair: This is not a resubmission
Reassignment Request Reviewers: This is not a resubmission
Data: zip
A1 Limitations Section: This paper has a limitations section.
A2 Potential Risks: Yes
A2 Elaboration: We have provided the IRB policy number (removed for peer-review) in Section 3 and discussed IRB's comments on the low-risk nature of the study.
B Use Or Create Scientific Artifacts: Yes
B1 Cite Creators Of Artifacts: N/A
B2 Discuss The License For Artifacts: N/A
B3 Artifact Use Consistent With Intended Use: N/A
B4 Data Contains Personally Identifying Info Or Offensive Content: No
B4 Elaboration: As we have explained in Section 3, the data is collected anonymously using crowdsourcing platforms.
B5 Documentation Of Artifacts: N/A
B6 Statistics For Data: Yes
B6 Elaboration: Statistics for the dataset is provided in Section 3.5 EmplifAI Dataset Statistics
C Computational Experiments: Yes
C1 Model Size And Budget: Yes
C1 Elaboration: Section 4 and 5
C2 Experimental Setup And Hyperparameters: Yes
C2 Elaboration: Section 4 and 5
C3 Descriptive Statistics: Yes
C3 Elaboration: Section 4.2, 5.3 and 5.4
C4 Parameters For Packages: N/A
D Human Subjects Including Annotators: Yes
D1 Instructions Given To Participants: Yes
D1 Elaboration: Section 3.3 and Figure 1
D2 Recruitment And Payment: Yes
D2 Elaboration: Section 3.3
D3 Data Consent: N/A
D4 Ethics Review Board Approval: Yes
D4 Elaboration: Section 3 (policy number removed for peer-review)
D5 Characteristics Of Annotators: Yes
D5 Elaboration: Section 3.3 and 3.4. We mainly described the data reviewers' background, as we do not have crowd workers' demographic information.
E Ai Assistants In Research Or Writing: Yes
E1 Information About Use Of Ai Assistants: Yes
E1 Elaboration: We have described how the AI assistants are used in evaluating and generating the synthesized data in Secion 5 and in the optional Section 8.1
Author Submission Checklist: yes
Submission Number: 628
Loading