Data Dowsing: Determining Data Collection Priorities

Published: 05 Nov 2025, Last Modified: 05 Nov 2025NLDL 2026 AbstractsEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM, model saturation, foundation models
TL;DR: Data Dowsing is a method to determine which content is most needed to improve foundation models.
Abstract: This work proposes a novel framework, data dowsing to determine which data is needed to improve LLMs. This framework is based on estimating influence and imposing simplifications based on concept domains to circumvention computational intractability.
Serve As Reviewer: ~Christian_Salomonsen1
Submission Number: 27
Loading