Keywords: privacy, solid, large language models, mobile, data
TL;DR: Privacy driven Solid integration with large language models hosted on mobile devices for data curation
Abstract: Privacy risks surrounding personal data use are increasingly acute in data-rich environments such as mobile
devices, where large volumes of sensitive data are routinely collected and repurposed for centralized analytics
and AI training. Despite growing awareness of these risks, users lack practical, privacy-first mechanisms for
interacting with their on-device data and selectively sharing it with federated learning systems or data-sharing
platforms. The sheer diversity and scale of personal data, combined with the effort required to manually classify,
curate, and manage it according to individual privacy preferences, often leads users to default to coarse-grained or
bulk consent. This paper presents an alternative approach in which data classification, user privacy preferences
and ongoing data curation are done with a locally deployed large language model acting as a trusted advisor. By
combining on-device perception with natural language interaction, users can express nuanced sharing intentions
while retaining control over what data leaves their device. We integrate this approach with Solid pods as the
data-sharing backend, leveraging their decentralized and user-owned storage model to support fine-grained,
auditable, and revocable access control. Together, these components enable a privacy-first data-sharing workflow
that avoids reliance on centralized, data-extractive cloud infrastructures.
Submission Number: 1
Loading