ArmEpiC – Armenian Epigraphic Corpus (ArtsakhEpiC Sub-Corpus, v1.0)

Hamest Tamrazyan, Gayane Hovhannisyan, Arsen Harutyunyan, Emanuela Boros

Published: 10 Jan 2026, Last Modified: 15 Jan 2026ZenodoEveryoneRevisionsCC BY-SA 4.0

Abstract: ArmEpiC: Methodology and Data Description Abstract ArmEpiC (Armenian Epigraphic Corpus) is a digital scholarly dataset comprising diplomatically transcribed Armenian lapidary inscriptions encoded in TEI/EpiDoc (v9.7), together with a system of authority files designed to preserve epigraphic evidence while enabling analytical interoperability. The dataset is intended for reuse by epigraphers, historians, linguists, and digital heritage researchers requiring transparent, machine-readable epigraphic data. Scope of the Dataset The Zenodo deposit includes ten TEI/EpiDoc inscription files, authority files (ListPlace, ListMonument, ListSubMonument, ListMaterial, ListPreservation, ListScript, ListAbbreviationType, ListChronology, ListBibl), this methodology document, a README, and a licensing statement. Conceptual Separation of Evidence and Interpretation ArmEpiC enforces a strict separation between epigraphic evidence, editorial observation, and interpretive layers. The diplomatic transcription constitutes the primary evidentiary layer; all analytical and interpretive interventions are explicitly encoded and remain reversible. Diplomatic Transcription Policy Original orthography is preserved, lineation follows the stone, and no silent normalization is introduced. Editorial intervention is restricted to explicit expansion of abbreviations, explicit supply of omitted letters, and explicit marking of damage or loss. Graphic Phenomena and Linguistic Structure Ligatures are treated as graphic phenomena and do not determine linguistic segmentation. Ligatures across word boundaries are encoded graphically while preserving separate lexical units. Abbreviations and Omitted Letters A strict distinction is maintained between abbreviations (intentional and conventional) and omitted letters (context-driven loss). Ambiguous cases are flagged rather than silently resolved. Honorific and graphic abbreviations are distinguished analytically via a controlled vocabulary. Word Segmentation and Lemmatization Each lexical unit is encoded as an independent word. Lemmatization is an analytical layer supplied in normalized Classical Armenian and does not imply correction of the original spelling. Names, Prosopography, and Places Personal names are encoded structurally without imposing prosopographic identification. Place names are preserved as attested and linked to external authorities via ListPlace. Dating and Chronology Dates are recorded as transmitted in the inscription, with...

External IDs:doi:10.5281/zenodo.18198117