Analyzing Finetuned Vision Models for Mixtec Codex Interpretation

Anonymous

Analyzing Finetuned Vision Models for Mixtec Codex Interpretation

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone

TL;DR: Figures from digitized Mixtec codices can be faithfully classified using finetuned vision models.

Abstract: Throughout history, pictorial record-keeping has been used to document events, stories, and concepts. Examples include the Foggini-Mestikawi Cave, the Bayeaux Tapestry, and the Tzolkʼin Maya Calendar. The pre-Columbian Mixtec society also recorded many works through graphical media called codices that depict both stories and real events. Mixtec codices are unique because the depicted scenes are highly structured within and across documents. As a first effort toward translation, we created two binary classification tasks: gender and pose. The composition of figures within a codex is essential for understanding the codex's narrative. We labeled a dataset with around 1300 figures drawn from three codices of varying qualities. We finetuned the VGG-16 and ViT-16 models, measured their performance, and compared learned features with expert opinions found in literature. The results show that when finetuned, both VGG and ViT perform well, with the transformer-based architecture (ViT) outperforming the CNN-based architecture (VGG) at higher learning rates. We are releasing this work to allow collaboration with the Mixtec community and domain scientists.

Paper Type: short

Research Area: Special Theme (conference specific)

Contribution Types: Model analysis & interpretability, Approaches to low-resource settings

Languages Studied: Mixtec

0 Replies

Loading