Abstract: Reasoning over text is a challenging task, especially if the reasoning requires aggregating information from long context and multiple steps to reach the correct answer. We introduce Complex Question Answering dataset (Complex Q&A Corpus) and its annotation procedure in the Polish language. Our dataset features human-annotated reasoning across extended documents. The questions within this dataset are carefully prepared and undergo rigorous cross-examination. Each complex question is accompanied by auxiliary questions that highlight specific text fragments and information necessary to formulate the final answer. We have identified the main reasoning patterns from our dataset annotation and human evaluation. We also proposed automatic evaluation procedure through the LLM-as-a-Judge paradigm and evaluated the performance of current state-of-the-art models.
External IDs:dblp:conf/iccci/WojtasikDOP25
Loading