Combatting Dimensional Collapse in LLM Pre-Training Data via Submodular File Selection

Published: 2025, Last Modified: 07 Dec 2025ICLR 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Loading