Configuration 1: xlnet on fasttext with cosine distance
Configuration 2: bert on fasttext with cosine distance
Testing on: easy queries for the flight_delay schema
Entries below have greater num_guesses with a threshold of 3
--------------------------------------------------
--------------------------------------------------

Query (#1): Predict the average departure delay for each airline tomorrow
Ground Truth (filter): NONE

Annotation 1: []
Annotation 2: []

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#2): Predict the average airline delay for each airline tomorrow
Ground Truth (filter): NONE

Annotation 1: []
Annotation 2: []

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#3): Predict the average departure delay for each airline where elapsed time is more than five hours for tomorrow
Ground Truth (filter): ELAPSED TIME

Annotation 1: [{'text': 'elapsed time', 'confidence': 1.0}]
Annotation 2: [{'text': '##apsed time', 'confidence': 0.9996646642684937}, {'text': 'five', 'confidence': 0.4217315912246704}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#4): Predict the number of airline that will not have a flight tomorrow
Ground Truth (filter): NONE

Annotation 1: []
Annotation 2: []

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#5): Predict the average arrival delay for each airline tomorrow
Ground Truth (filter): NONE

Annotation 1: []
Annotation 2: []

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#6): Predict the total departure delay for each airline that will be delayed more than five minutes by air system next week
Ground Truth (filter): AIR SYSTEM DELAY

Annotation 1: []
Annotation 2: [{'text': 'delayed', 'confidence': 0.48936280608177185}]

Configuration 1 took 2 attempt(s) to get the correct answer
Configuration 2 took 2 attempt(s) to get the correct answer

--------------------------------------------------

Query (#7): Predict the average weather delay for each airline where id of aircraft is 5 for next week
Ground Truth (filter): FLIGHT NUMBER

Annotation 1: [{'text': 'id of aircraft', 'confidence': 1.0}]
Annotation 2: [{'text': 'id of aircraft', 'confidence': 0.9999996622403463}]

Configuration 1 took 2 attempt(s) to get the correct answer
Configuration 2 took 2 attempt(s) to get the correct answer

--------------------------------------------------

Query (#8): Predict the average weather delay for each airline where departure delay is less than 5 munites for next week
Ground Truth (filter): DEPARTURE DELAY

Annotation 1: [{'text': 'where departure delay', 'confidence': 0.8563900589942932}]
Annotation 2: [{'text': 'departure delay', 'confidence': 0.9998307228088379}, {'text': 'muni', 'confidence': 0.8181426823139191}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#9): Predict the average security delay for each airline where departure delay is more than ten munites tomorrow
Ground Truth (filter): DEPARTURE DELAY

Annotation 1: [{'text': 'departure delay', 'confidence': 0.9942135810852051}]
Annotation 2: [{'text': 'departure delay', 'confidence': 0.9998649060726166}, {'text': 'muni', 'confidence': 0.6907109916210175}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#10): Predict the average delay caused by late aircraft for each airline where elapsed time is more than seven hours for next week
Ground Truth (filter): ELAPSED TIME

Annotation 1: [{'text': 'elapsed time', 'confidence': 1.0}]
Annotation 2: [{'text': '##apsed time', 'confidence': 0.9999912778536478}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#11): Predict total number of airports with cancelled flight count more than five for tomorrow
Ground Truth (filter): CANCELLED

Annotation 1: []
Annotation 2: []

Configuration 1 took 2 attempt(s) to get the correct answer
Configuration 2 took 2 attempt(s) to get the correct answer

--------------------------------------------------

Query (#12): Predict total departure delay for each origin airports with security delay more than five minutes for tomorrow
Ground Truth (filter): SECURITY DELAY

Annotation 1: []
Annotation 2: [{'text': 'security delay', 'confidence': 0.9784820973873138}]

Configuration 1 took 2 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#13): Predict average departure delay for each origin airports with security delay less than five minutes for tomorrow
Ground Truth (filter): SECURITY DELAY

Annotation 1: []
Annotation 2: [{'text': 'security delay', 'confidence': 0.9792670011520386}]

Configuration 1 took 2 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#14): Predict average air system delay for each origin airports for all Emirates airways flights next week
Ground Truth (filter): NONE

Annotation 1: []
Annotation 2: []

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#15): Predict total weather delay for each origin airports with aircraft delay more than ten minutes for tomorrow
Ground Truth (filter): LATE AIRCRAFT DELAY

Annotation 1: []
Annotation 2: [{'text': 'aircraft delay', 'confidence': 0.7504622042179108}]

Configuration 1 took 2 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#16): Predict average departure delay for each origin airports with delay caused by aircraft is less than five minutes for next week
Ground Truth (filter): LATE AIRCRAFT DELAY

Annotation 1: [{'text': 'caused by aircraft', 'confidence': 0.9999999205271403}]
Annotation 2: [{'text': 'delay caused', 'confidence': 0.8717659413814545}, {'text': 'aircraft is', 'confidence': 0.9991546273231506}]

Configuration 1 took 2 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#17): Predict total aircraft delay for each origin airports where scheduled departure time is after 13:00 for tomorrow
Ground Truth (filter): SCHEDULED DEPARTURE HOUR

Annotation 1: [{'text': 'scheduled departure time', 'confidence': 1.0}]
Annotation 2: [{'text': 'departure time', 'confidence': 0.9999988079071045}, {'text': 'after', 'confidence': 0.9908882975578308}, {'text': ': 00', 'confidence': 0.9128034114837646}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 2 attempt(s) to get the correct answer

--------------------------------------------------

Query (#18): Predict average elapsed time for each origin airports with security delay more than five minutes for next week
Ground Truth (filter): SECURITY DELAY

Annotation 1: []
Annotation 2: [{'text': 'security delay', 'confidence': 0.9990723431110382}]

Configuration 1 took 2 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

