LEGATO: Large-scale End-to-end Generalizable Approach to Typeset OMR

Guang Yang; Victoria Ebert; Nazif Can Tamer; Luiza Amador Pozzobon; Noah A. Smith

LEGATO: Large-scale End-to-end Generalizable Approach to Typeset OMR

Guang Yang, Victoria Ebert, Nazif Can Tamer, Luiza Amador Pozzobon, Noah A. Smith

Published: 23 Sept 2025, Last Modified: 08 Nov 2025AI4MusicEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Optical Music Recognition; Computer Vision; Multimodal Learning

Abstract: We propose Legato, a new end-to-end model for optical music recognition (OMR). Legato is the first large-scale pretrained OMR model capable of recognizing full-page or multi-page typeset music scores and the first to generate documents in ABC notation, a concise, human-readable format for symbolic music. Bringing together a pretrained vision encoder with an ABC decoder trained on a dataset of more than 214K images, our model exhibits the strong ability to generalize across various typeset scores. We conduct comprehensive experiments on a range of datasets and metrics and demonstrate that Legato outperforms the previous state of the art. On our most representative dataset, we observe a 47.6\% absolute error reduction on the standard metric OMR-NED.

Track: Paper Track

Confirmation: Paper Track: I confirm that I have followed the formatting guideline and anonymized my submission.

Submission Number: 48

Loading