emLam - a Hungarian Language Modeling baselineDownload PDFOpen Website

Published: 01 Jan 2017, Last Modified: 22 Feb 2024CoRR 2017Readers: Everyone
Abstract: This paper aims to make up for the lack of documented baselines for Hungarian language modeling. Various approaches are evaluated on three publicly available Hungarian corpora. Perplexity values comparable to models of similar-sized English corpora are reported. A new, freely downloadable Hungar- ian benchmark corpus is introduced.
0 Replies

Loading