Abstract: Generating code to solve question-answering (QA) problems can help scale up analyses or workflows that may be too time-consuming or complex to do manually. This can be especially useful in scientific applications, where large datasets and complex analyses are common. Natural language approaches converting texts describing processes to code showed initial success in reasoning over such textual problems. The limitations of existing text-to-code models are evident when attempting to solve problems that require knowledge beyond what is presented in the input text. We propose a novel domain-agnostic model to address the problem by leveraging domain-specific and open-source code libraries. Our model learns to inject this knowledge into the code generation process given a textual QA problem. Our study demonstrates that our proposed method is a competitive alternative to current state-of-the-art models, with the added capability of solving problems beyond the scope of their capacity with high accuracy. We also present two datasets, one focusing on chemistry problems and the other on astronomy, for the benefit of the scientific community.
Paper Type: long
Research Area: Question Answering
Contribution Types: Approaches to low-resource settings, Publicly available software and/or pre-trained models, Data resources
Languages Studied: Python , English
Consent To Share Submission Details: On behalf of all authors, we agree to the terms above to share our submission details.
0 Replies
Loading