Stress Testing Generalization: How Minor Modifications Undermine Large Language Model Performance.

Guangxiang Zhao, Saier Hu, Xiaoqi Jian, Jinzhu Wu, Yuhan Wu 0001, Change Jia, Lin Sun 0010, Xiangzheng Zhang

21 Jan 2026CoRR 2025EveryoneCC BY-SA 4.0
Loading