D-ProtoCoT: Prototype-Based Path Selection for Chain-of-Thought Reasoning

D-ProtoCoT: Prototype-Based Path Selection for Chain-of-Thought Reasoning

ACL ARR 2026 January Submission5767 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Chain-of-Thought Reasoning, Large Language Models, Contrastive Learning, Reasoning Path Selection, Prototype-based Aggregation, Self-Consistency

Abstract: Chain-of-Thought (CoT) prompting enables large language models to solve complex reasoning tasks by generating explicit multi-step reasoning paths. To improve robustness, self-consistency aggregates multiple sampled CoT paths via majority voting over final answers. However, this strategy operates purely at the answer level and implicitly assumes uniform reasoning quality, overlooking substantial variation among reasoning paths that yield the same answer. In this work, we propose D-ProtoCoT, an inference-time framework for selecting high-quality reasoning paths based on representation-level alignment. D-ProtoCoT first constructs a contrastively aligned embedding space using weak supervision from gold answer matching, in which representations of correct reasoning paths are encouraged to cluster while incorrect ones are separated. At inference time, a dynamic reasoning prototype is formed by aggregating multiple sampled paths in this space, and the reasoning path most aligned with the prototype is selected. D-ProtoCoT does not modify the underlying language model and requires no additional annotation beyond gold answers for representation alignment. Experiments on CommonsenseQA, GSM8K, and StrategyQA with LLaMA-3.1-8B-Instruct and Qwen3-8B show that D-ProtoCoT consistently outperforms self-consistency across most settings, achieving up to 23.6\% absolute improvement on StrategyQA. Further analysis demonstrates that representation alignment is essential, as naive centroid-based selection with frozen embeddings yields substantially inferior performance, indicating that semantic alignment provides a more reliable signal than answer frequency for reasoning path selection.

Paper Type: Long

Research Area: Mathematical, Symbolic, Neurosymbolic, and Logical Reasoning

Research Area Keywords: Reasoning, Interpretability and Analysis of Language Models, Prompting, Efficient Inference, Representation Learning

Contribution Types: Model analysis & interpretability, Approaches low compute settings-efficiency

Languages Studied: English

Submission Number: 5767

Loading