Abstract: Highlights•Word order is reexamined by 4 tasks, 3 strategies, 3 languages and 5 models.•The tested datasets includes TruthfulQA, MGSM, XWinoGrande and WiQueen.•The word order perturbation strategies include Random, Rotate and Adjacent.•Both English, Chinese, and French dataset are tested on ChatGPT, Claude and LLaMA.
Loading