Keywords: NLP, Crossword Solving, Evaluation, Language Models, Italian, Shared Task
Abstract: Cruciverb-IT is the first shared task on Italian crossword solving, held at EVALITA 2026. The task comprises two subtasks: (1) answering individual crossword clues given the expected answer length, and (2) autonomously solving complete crossword grids of varying sizes. We release a dataset of approximately 410,000 Italian clue-answer pairs along with automatically generated crossword grids ranging from size 5×5 to 13×13. Five teams participated in the evaluation, submitting a total of 17 system runs. The best-performing system on Subtask 1 achieved 69\% accuracy at rank 1 and 0.72 MRR using a retrieval-augmented LLM approach, while the top system on Subtask 2 reached an average character accuracy of 92\%, fully solving 34\% of grids by means of a fine-tuned encoder-decoder model paired with a constraint-driven depth first search and ranking heuristics. Results show that while modern approaches achieve strong performance on individual clues and smaller grids, solving larger crosswords remains an open problem, with full match performance decreasing rapidly for grids larger than 5x5.
Source: zip
Ceur: pdf
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 8
Loading