Hyper-multi-step: The Truth Behind Difficult Long-context Tasks

Yijiong Yu, Ma Xiufa, Fang Jianwei, Zhi Xu, Su Guangyao, Wang Jiancheng, Yongfeng Huang, Zhixiao Qi, Wei Wang, Weifeng Liu, Ran Chen, Ji Pei

Published: 2024, Last Modified: 14 May 2025CoRR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Long-context language models (LCLMs), characterized by their extensive context window, are becoming popular. However, despite they are nearly perfect at standard long-context retrieval tasks, our evaluations demonstrate they are not good at 2 basic cases, "multi-matching retrieval," and "logic-based retrieval", which are beyond LCLMs' ability boundary. But we find they can be well addressed with a sufficient number of reasoning steps, guided by specific CoT prompts, indicating the necessity of combining long-context tasks with CoT methods for more advanced long context handling. However, current CoT methods are too time-consuming, when the context is very long, which means efficient long-context handling still has a long way to go.