RePanda: Pandas-powered Tabular Verification and Reasoning

ACL ARR 2025 February Submission7653 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Fact-checking tabular data is essential for ensuring the accuracy of structured information in domains such as journalism, finance, and scientific research. However, existing methods often rely on black-box models with opaque reasoning. We introduce RePanda, a structured fact verification approach that translates claims into executable pandas queries, enabling interpretable and verifiable reasoning. To train RePanda, we construct PanTabFact, a structured dataset derived from TabFact, where claims are paired with executable queries generated using DeepSeek-Chat and refined through automated error correction. Fine-tuning DeepSeek-coder-7B-instruct-v1.5 on PanTabFact, RePanda achieves 84.09\% accuracy on TabFact. To assess Out-of-Distribution (OOD) generalization, we create a dataset named WikiFact from WikiTableQuestions by transforming question-answer pairs into factual claims. Without additional fine-tuning, RePanda achieves 84.72\% accuracy on WikiFact, significantly outperforming all other baselines and demonstrating strong OOD robustness. Beyond fact verification, RePanda extends to tabular question answering by generating executable queries that retrieve precise answers. To support this, we introduce PanWiki, a dataset mapping WikiTableQuestions to pandas queries. Fine-tuning on PanWiki, RePanda achieves 75.1\% accuracy in direct answer retrieval. These results highlight the effectiveness of structured execution-based reasoning for tabular verification and question answering.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: fact checking
Languages Studied: English
Submission Number: 7653
Loading