Abstract: High-quality automated poetry generation systems are currently only available for a small subset of languages. We introduce a new model for generating poetry in Czech, a heavily inflected Slavic language with rather regular orthography and prosody. We find that appropriate tokenization is crucial, showing that tokenization methods based on syllables or individual characters instead of subwords prove superior in generating poetic strophes. We also demonstrate that guiding the generation process by explicitly specifying strophe parameters within the poem text can improve the effectiveness of the model. We further enhance the results by introducing Forced Generation, adding explicit specifications of meter and verse parameters at inference time based on the already generated text. We evaluate a range of setups, showing that our proposed approach achieves high accuracies in several aspects of formal quality of the generated poems.
Paper Type: long
Research Area: Generation
Contribution Types: NLP engineering experiment, Approaches to low-resource settings
Languages Studied: Czech
0 Replies
Loading