Configuration 1: xlnet on fasttext with cosine distance
Configuration 2: bert on fasttext with cosine distance
Testing on: hard queries for the flight_delay schema
Entries below have greater num_guesses with a threshold of 3
--------------------------------------------------
--------------------------------------------------

Query (#1): predict the average delay caused by weather for each airline where flight will start in next two days
Ground Truth (filter): DATE

Annotation 1: []
Annotation 2: [{'text': 'will start', 'confidence': 0.8114347755908966}]

Configuration 1 took 2 attempt(s) to get the correct answer
Configuration 2 took 2 attempt(s) to get the correct answer

--------------------------------------------------

Query (#2): predict the average delay caused by weather for each flight of Emirates airline where flight will start in next two days
Ground Truth (filter): DATE

Annotation 1: []
Annotation 2: [{'text': 'in', 'confidence': 0.3591545820236206}]

Configuration 1 took 2 attempt(s) to get the correct answer
Configuration 2 took 2 attempt(s) to get the correct answer

--------------------------------------------------

Query (#3): predict the average delay caused by the tornedo for each flight of Quatar airline where flight will start in next two days
Ground Truth (filter): DATE

Annotation 1: [{'text': 'flight', 'confidence': 0.9499793648719788}]
Annotation 2: [{'text': 'of Qua', 'confidence': 0.9406005342801412}, {'text': 'will start', 'confidence': 0.7854796648025513}]

Configuration 1 took 2 attempt(s) to get the correct answer
Configuration 2 took 2 attempt(s) to get the correct answer

--------------------------------------------------

Query (#4): predict the average delay due to bad weather for each flight of Quatar Airlines where flight will start in next two days
Ground Truth (filter): DATE

Annotation 1: [{'text': 'flight', 'confidence': 0.9999962449073792}]
Annotation 2: [{'text': 'of Qua', 'confidence': 0.9567640821139017}, {'text': 'start', 'confidence': 0.9605323076248169}]

Configuration 1 took 2 attempt(s) to get the correct answer
Configuration 2 took 2 attempt(s) to get the correct answer

--------------------------------------------------

Query (#5): I want to know the average delay for aircraft for the flights of Air Emirates which will start next week
Ground Truth (filter): DATE

Annotation 1: []
Annotation 2: [{'text': 'flights', 'confidence': 0.9564255475997925}]

Configuration 1 took 2 attempt(s) to get the correct answer
Configuration 2 took 2 attempt(s) to get the correct answer

--------------------------------------------------

Query (#6): can you tell me the average aircraft delay for Quatar Airlines where tail number starts from 4500 and will start within next week
Ground Truth (filter): TAIL NUMBER

Annotation 1: [{'text': 'tail number starts', 'confidence': 0.9999991854031881}, {'text': '4', 'confidence': 0.621338427066803}]
Annotation 2: [{'text': 'number', 'confidence': 0.9999996423721313}, {'text': 'from 4500', 'confidence': 0.9349560340245565}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 2 attempt(s) to get the correct answer

--------------------------------------------------

Query (#7): Predict the average elapsed time for all flights which will start from Dallas Airport and expected elapsed time is less than six hours
Ground Truth (filter): ELAPSED TIME

Annotation 1: [{'text': 'elapsed time', 'confidence': 1.0}]
Annotation 2: [{'text': 'start', 'confidence': 0.9993060827255249}, {'text': 'elapsed time', 'confidence': 0.9010847955942154}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#8): I want to predict the expected elapsed time of all Air Emirates Airlines flights where flight number is in between 3400 and 3500 and they will start tomorrow
Ground Truth (filter): FLIGHT NUMBER

Annotation 1: [{'text': 'flight number', 'confidence': 1.0}]
Annotation 2: [{'text': 'number', 'confidence': 0.9999986290931702}, {'text': 'in between 3400', 'confidence': 0.7628612369298935}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#9): predict how many flights will get cancelled which was supposed to start from New York for next week
Ground Truth (filter): NONE

Annotation 1: [{'text': 'get cancelled which', 'confidence': 0.9994409680366516}]
Annotation 2: []

Configuration 1 took 2 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#10): predict how many flights of Quatar Airlines will get cancelled for covid-19 for next month
Ground Truth (filter): NONE

Annotation 1: []
Annotation 2: [{'text': '- 19', 'confidence': 0.536309227347374}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 2 attempt(s) to get the correct answer

--------------------------------------------------

Query (#11): I wanna know how many flights will be cancelled for each airline for the tornedo that is coming within tomorrow
Ground Truth (filter): NONE

Annotation 1: [{'text': 'tornedo that', 'confidence': 0.9454457461833954}]
Annotation 2: [{'text': 'tornedo that', 'confidence': 0.9999784976243973}, {'text': 'coming', 'confidence': 0.9651370048522949}]

Configuration 1 took 2 attempt(s) to get the correct answer
Configuration 2 took 2 attempt(s) to get the correct answer

--------------------------------------------------

