OverpassNL: A Community-Generated Dataset and Real-World Semantic Parser for OpenStreetMapDownload PDF

Anonymous

03 Sept 2022 (modified: 05 May 2023)ACL ARR 2022 September Blind SubmissionReaders: Everyone
Abstract: We present OverpassNL, a complex dataset that pairs queries to the OpenStreetMap (OSM) database with natural language questions. It is based on nearly 10,000 queries issued by OSM users and developers in the Overpass query language. The Overpass queries were translated into suitable natural language forms by 15 trained computational linguistics students. The resulting dataset can be used as training data for real-world semantic parsing. The complexity of OverpassNL stems from both the nature of real-world queries and the expansive underlying OSM database. While existing semantic parsing datasets such as Spider (Yu et al., 2018) use formulaic synthetic queries and achieve complexity by combining multiple simple underlying databases, there is no natural split into database schemata in OSM (Yu et al., 2018) nor does Overpass provide a clear structure for slot-filling (Yao et al., 2019). The complexity of the task is shown by the mere 21% execution accuracy achieved by a generic neural semantic parser. We enhance the model by using different types of additional information and by training data augmentation, thereby increasing the performance to 36% execution accuracy.
Paper Type: long
0 Replies

Loading