MultiLifeQA: A Multidimensional Lifestyle Question Answering Benchmark for Comprehensive Health Reasoning with LLMs

Ye Tian; Zihao Wang; Onat Gungor; Xiaoran Fan; Tajana Rosing

MultiLifeQA: A Multidimensional Lifestyle Question Answering Benchmark for Comprehensive Health Reasoning with LLMs

Ye Tian, Zihao Wang, Onat Gungor, Xiaoran Fan, Tajana Rosing

19 Sept 2025 (modified: 03 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large language models, Personalized Health Analytics, Dataset and Benchmark

Abstract: Recent advances in wearable devices and mobile sensing technologies have enabled the continuous collection of multimodal lifestyle data. However, transforming these heterogeneous signals into coherent and interpretable insights for health management remains a fundamental challenge. These difficulties arise both at the data level, where signals are fragmented and lack a unified structure, and at the modeling level, where existing methods are often limited to single domains and short-term tasks. Large language models (LLMs) have demonstrated strong potential for complex reasoning, yet systematic benchmarks to evaluate their cross-dimensional and long-horizon reasoning abilities in lifestyle health are still lacking. We propose MultiLifeQA, the first large-scale QA dataset and benchmark for multidimensional lifestyle health reasoning. MultiLifeQA spans four lifestyle dimensions (diet, activity, sleep, and emotion) and contains 22,573 questions across single-user and multi-user scenarios. The tasks are grouped into five categories, spanning from simple fact retrieval to complex cross-dimensional temporal reasoning, providing a comprehensive evaluation of model reasoning capabilities. We establish two prompt evaluation methods: context and database-augmented, along with fine-grained metrics that evaluate query validity, execution quality, and final answer accuracy. Extensive experiments on eight open-source and three proprietary LLMs highlight both the capabilities and limitations of current models in long-term, multidimensional health reasoning. By addressing this gap, MultiLifeQA establishes a standardized benchmark that advances LLMs toward more integrated health analytics and personalized interventions. The code and datasets are publicly available at https://anonymous.4open.science/r/MultilifeQA-05D2.

Primary Area: datasets and benchmarks

Submission Number: 21750

Loading