Abstract: We report the creation of the first high-quality corpus of Aromanian - an endangered Romance language spoken in the Balkans - and the equivalent sentence-aligned translations into Romanian, English, and French. The corpus is released publicly using several orthographic standards and consists in short stories collected in the ‘70s in Romania. Additionally, we provide an corpus-based analysis of Aromanian linguistic particularities and the overall demographic and political context which impacts the contemporary development of the language.
Loading