Adaptive Originality Filtering: Rejection‑Based Prompting and RiddleScore for Culturally Grounded Multilingual Riddle Generation

Published: 24 Sept 2025, Last Modified: 07 Nov 2025NeurIPS 2025 Workshop GenProCCEveryoneRevisionsBibTeXCC BY 4.0
Track: Short paper
Keywords: Generation, Multilinguality, Creativity and Figurative Language, Prompting and In-Context Learning, Multilingual generation, Cross-lingual evaluation, Cross-lingual alignment, Low-resource language modeling, Figurative language across languages, Typologically diverse generation, Riddle generation, Figurative text generation, Controlled generation, Constraint-based generation, Template rejection prompting, Lexical diversity in generation, Cultural conditioning in generation, Metaphor-rich generation, Automatic evaluation of creativity, Composite evaluation metric, Novelty/diversity/fluency metrics, Human-aligned scoring metrics, Semantic alignment metrics, RiddleScore, Self-BLEU, Distinct-n, BERTScore, Evaluation of figurative language, Evaluation of LLM creativity, Adaptive Originality Filtering (AOF), Semantic rejection sampling, Prompt-based learning, Zero-shot/few-shot, Rejection-based filtering, Chain-of-thought prompting, Self-refine prompting methods, Iterative prompting loops, Prompting for creativity, GPT-4o, LLaMA 3.1, DeepSeek Reasoning, Fine-tuning LLMs, LLM evaluation across cultures, Prompting in LLMs, Poetic generation, Metaphorical abstraction, Symbolism in language generation, Literary device generation, Culturally grounded generation, Ambiguity and misdirection in text, BiRdQA, Multilingual riddle datasets, Figurative QA benchmarks, Creative reasoning benchmarks
TL;DR: We present Adaptive Originality Filtering, a prompting method that enforces novelty and cultural fidelity in multilingual riddle generation, boosting lexical diversity and metaphorical richness across five languages.
Abstract: Language models are increasingly tested on multilingual creativity, demanding culturally grounded, abstract generations. Standard prompting methods often produce repetitive or shallow outputs. We introduce Adaptive Originality Filtering (AOF), a prompting strategy that enforces novelty and cultural fidelity via semantic rejection. To assess quality, we propose RiddleScore, a metric combining novelty, diversity, fluency, and answer alignment. AOF improves Distinct-2 (0.915 in Japanese), reduces Self-BLEU (0.177), and raises RiddleScore (up to +57.1% in Arabic). Human evaluations confirm fluency, creativity, and cultural fit gains. However, improvements vary: Arabic shows greater RiddleScore gains than Distinct-2; Japanese sees similar changes. Though focused on riddles, our method may apply to broader creative tasks. Overall, semantic filtering with composite evaluation offers a lightweight path to culturally rich generation—without fine-tuning.
Submission Number: 10
Loading