The Information Potential of Books

Published: 02 Jul 2025, Last Modified: 02 Mar 2026ZenodoEveryoneRevisionsCC BY-SA 4.0
Abstract: For practical and legal reasons, Large Language Models are primarily trained on contemporary, web-based texts and not on the vast array of content found in published books. As a consequence, their competence does not capture the rich diversity of knowledge that libraries have worked to preserve and make accessible. Because of this epistemic gap, libraries can potentially play a crucial role in the development of future versions of these models. In this presentation, I will discuss a computational strategy designed to effectively quantify and utilize the knowledge contained within books, addressing the opportunities and challenges for libraries in this process.
Loading