Simulating Textual Transmission With Natural Language Processing Techniques

Fernando Aguilar-Canto, Hiram Calvo

Published: 2026, Last Modified: 26 May 2026IEEE Access 2026EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Two main objectives of Computational Textual Criticism are the development of algorithms for tree and text reconstruction under conditions of imperfect copying. Despite recent developments in the field, few comparative studies or benchmarks have been performed, particularly in the case of text reconstruction. On the other hand, recent advancements in Natural Language Processing (NLP) have begun to impact various aspects of the humanities and social sciences. In this paper, we incorporate various NLP techniques (including Large Language Models (LLMs)) to simulate text transmission and benchmark different tree and text reconstruction algorithms. In addition, for text reconstruction, we incorporate LLMs to improve the final result. Our results show that the Unweighted Pair Group Method with Arithmetic Mean (UPGMA) or Neighbor Joining (NJ) methods combined with the Levenshtein metric achieved superior comparative results for tree reconstruction. Moreover, for text reconstruction, we found that the Simple Majority Rule (SMR), UR, and Roos-Heikkilä-Myllymäki (RHM) methods yielded consistent results, and in most cases, the incorporation of LLMs improved the final output.
Loading