The Information Potential of Books

Frederic Kaplan

Published: 02 Jul 2025, Last Modified: 02 Mar 2026ZenodoEveryoneRevisionsCC BY-SA 4.0

Abstract: For practical and legal reasons, Large Language Models are primarily trained on contemporary, web-based texts and not on the vast array of content found in published books. As a consequence, their competence does not capture the rich diversity of knowledge that libraries have worked to preserve and make accessible. Because of this epistemic gap, libraries can potentially play a crucial role in the development of future versions of these models. In this presentation, I will discuss a computational strategy designed to effectively quantify and utilize the knowledge contained within books, addressing the opportunities and challenges for libraries in this process.

External IDs:doi:10.5281/zenodo.16098037