ATHENA++: Natural Language Querying for Complex Nested SQL Queries

Jaydeep Sen, Chuan Lei, Abdul Quamar, Fatma Ozcan, Vasilis Efthymiou, Ayushi Dalmia, Greg Stager, Ashish Mittal, Diptikalyan Saha, Karthik Sankaranarayanan

Published: 21 Aug 2020, Last Modified: 03 Oct 2024PVLDB volume 13, issue 11EveryoneCC BY 4.0

Abstract: Natural Language Interfaces to Databases (NLIDB) systems eliminate the requirement for an end user to use complex query languages like SQL, by translating the input natural language (NL) queries to SQL automatically. Although a significant volume of research has focused on this space, most state-of-the-art systems can at best handle simple select-project-join queries. There has been little to no research on extending the capabilities of NLIDB systems to handle complex business intelligence (BI) queries that often involve nesting as well as aggregation. In this paper, we present ATHENA++, an end-to-end system that can answer such complex queries in natural language by translating them into nested SQL queries. In particular, ATHENA++ combines linguistic patterns from NL queries with deep domain reasoning using ontologies to enable nested query detection and generation. We also introduce a new benchmark data set (FIBEN), which consists of 300 NL queries, corresponding to 237 distinct complex SQL queries on a database with 152 tables, conforming to an ontology derived from standard financial ontologies (FIBO and FRO). We conducted extensive experiments comparing ATHENA++ with two state-ofthe-art NLIDB systems, using both FIBEN and the prominent Spider benchmark. ATHENA++ consistently outperforms both systems across all benchmark data sets with a wide variety of complex queries, achieving 88.33% accuracy on FIBEN benchmark, and 78.89% accuracy on Spider benchmark, beating the best reported accuracy results on the dev set by 8%.