Detecting Structured Language Alternations in Historical Documents by Combining Language Identification with Fourier Analysis

Published: 25 Jan 2024, Last Modified: 20 Jun 2024LaTeCH@EACL2024EveryoneCC BY 4.0
Abstract: In this study, we present a generalizable workflow to identify documents in a historic language with a nonstandard language and script combination, Armeno-Turkish. We introduce the task of detecting distinct patterns of multilinguality based on the frequency of structured language alternations within a document.
Loading