Configuration 1: xlnet on glove with cosine distance
Configuration 2: bert on glove with cosine distance
Testing on: hard queries for the flight_delay schema
Entries below have greater num_guesses with a threshold of 3
--------------------------------------------------
--------------------------------------------------

Query (#1): predict the average delay caused by weather for each airline where flight will start in next two days
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}, {'text': 'caused', 'confidence': 0.971455454826355}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#2): predict the average delay caused by weather for each flight of Emirates airline where flight will start in next two days
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'the', 'confidence': 0.9999996423721313}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999997615814209}, {'text': 'caused', 'confidence': 0.800511360168457}]

Configuration 1 took 2 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#3): predict the average delay caused by the tornedo for each flight of Quatar airline where flight will start in next two days
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'the', 'confidence': 0.9999999403953552}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}, {'text': 'caused', 'confidence': 0.7428648471832275}]

Configuration 1 took 2 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#4): predict the average delay due to bad weather for each flight of Quatar Airlines where flight will start in next two days
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'the', 'confidence': 0.9999998807907104}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}, {'text': 'due', 'confidence': 0.9846199154853821}, {'text': 'bad', 'confidence': 0.9999951124191284}]

Configuration 1 took 2 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#5): I want to know the average delay for aircraft for the flights of Air Emirates which will start next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'the', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 2 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#6): can you tell me the average aircraft delay for Quatar Airlines where tail number starts from 4500 and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#7): Predict the average elapsed time for all flights which will start from Dallas Airport and expected elapsed time is less than six hours
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999997615814209}, {'text': 'Dallas', 'confidence': 0.9754207134246826}, {'text': 'expected', 'confidence': 0.993593692779541}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#8): I want to predict the expected elapsed time of all Air Emirates Airlines flights where flight number is in between 3400 and 3500 and they will start tomorrow
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'expected', 'confidence': 1.0}]
Annotation 2: [{'text': 'expected', 'confidence': 0.9999995827674866}]

Configuration 1 took 3 attempt(s) to get the correct answer
Configuration 2 took 3 attempt(s) to get the correct answer

--------------------------------------------------

Query (#9): predict how many flights will get cancelled which was supposed to start from New York for next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'predict how many flights will get cancelled which was supposed to start from New York for next week', 'confidence': 1}]
Annotation 2: [{'text': 'many', 'confidence': 0.6943861842155457}]

Configuration 1 took 4 attempt(s) to get the correct answer
Configuration 2 took 4 attempt(s) to get the correct answer

--------------------------------------------------

Query (#10): predict how many flights of Quatar Airlines will get cancelled for covid-19 for next month
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'predict how many flights of Quatar Airlines will get cancelled for covid-19 for next month', 'confidence': 1}]
Annotation 2: [{'text': 'predict how many flights of Quatar Airlines will get cancelled for covid-19 for next month', 'confidence': 1}]

Configuration 1 took 5 attempt(s) to get the correct answer
Configuration 2 took 5 attempt(s) to get the correct answer

--------------------------------------------------

Query (#11): I wanna know how many flights will be cancelled for each airline for the tornedo that is coming within tomorrow
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'I wanna know how many flights will be cancelled for each airline for the tornedo that is coming within tomorrow', 'confidence': 1}]
Annotation 2: [{'text': 'many', 'confidence': 0.9999741315841675}]

Configuration 1 took 5 attempt(s) to get the correct answer
Configuration 2 took 4 attempt(s) to get the correct answer

--------------------------------------------------

Query (#12): predict the average change is duration of flights for the Quatar Airlines which will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#13): predict the average change is duration of flights for the Emirates Airlines which will have tail number greater than 1k and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#14): predict the average departure delay for the Emirates Airlines which will have tail number greater than 1k and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#15): predict the average arrival delay for the Emirates Airlines which will have tail number greater than 1k and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#16): predict the average air system delay for the Emirates Airlines which will have tail number greater than 1k and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#17): predict the average security delay for the Emirates Airlines which will have tail number greater than 1k and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average security', 'confidence': 0.9999984502792358}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#18): predict the average airline delay for the Emirates Airlines which will have tail number greater than 1k and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#19): predict the average late aircraft delay for the Hainan Airlines which will have tail number greater than 1k and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average late', 'confidence': 0.999999463558197}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#20): predict the average change is duration of flights for the Quatar Airlines which will have scheduled departure hour after 4 PM is central time and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#21): predict the average departure delay for the Quatar Airlines which will have scheduled departure hour after 4 PM is central time and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#22): predict the average arrival delay for the British Airways which will have scheduled departure hour after 4 PM is central time and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999997615814209}, {'text': 'central', 'confidence': 0.7382172346115112}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#23): predict the average air system delay for the Singapore Airlines which will have scheduled departure hour after 4 PM is central time and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#24): predict the average security delay for the Emirates Airlines which will have scheduled departure hour after 4 PM is central time and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average security', 'confidence': 0.9999988377094269}, {'text': 'central', 'confidence': 0.7579755187034607}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#25): predict the average airline delay for the Emirates Airlines which will have scheduled departure hour after 4 PM is central time and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999997615814209}, {'text': 'central', 'confidence': 0.7697942852973938}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#26): predict the average late aircraft delay for the Emirates Airlines which will have scheduled departure hour after 4 PM is central time and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average late', 'confidence': 0.9999995231628418}, {'text': 'central', 'confidence': 0.37133902311325073}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#27): predict the average change is duration of flights for the Quatar Airlines which will have scheduled time is in between 7 Am and 11 Am in central time and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#28): predict the average departure delay for the Quatar Airlines which will have scheduled time is in between 7 Am and 11 Am in central time  and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#29): predict the average arrival delay for the British Airways which will have scheduled time is in between 7 Am and 11 Am in central time  and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#30): predict the average air system delay for the Singapore Airlines which will have scheduled time is in between 7 Am and 11 Am in central time  and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#31): predict the average security delay for the Emirates Airlines which will have scheduled time is in between 7 Am and 11 Am in central time  and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average security', 'confidence': 0.9999987185001373}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#32): predict the average airline delay for the Emirates Airlines which will have scheduled time is in between 7 Am and 11 Am in central time  and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#33): predict the average late aircraft delay for the Emirates Airlines which will have scheduled time is in between 7 Am and 11 Am in central time  and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average late', 'confidence': 0.9999994337558746}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#34): predict the average change is duration of flights for the Quatar Airlines which will have elapsed time more than four hours and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#35): predict the average departure delay for the Quatar Airlines which will have elapsed time more than four hours and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#36): predict the average arrival delay for the British Airways which will have elapsed time more than four hours and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#37): predict the average air system delay for the Singapore Airlines which will have flight duration more than four hours and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#38): predict the average security delay for the Emirates Airlines which will have elapsed time more than four hours and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average security', 'confidence': 0.9999980330467224}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#39): predict the average airline delay for the Emirates Airlines which will have elapsed time more than four hours and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#40): predict the average late aircraft delay for the Emirates Airlines which will have elapsed time more than four hours and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average late', 'confidence': 0.999999463558197}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#41): predict the average change is duration of flights for the Quatar Airlines with security delay more than five minutes for next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#42): predict the average departure delay for the Quatar Airlines which will have security delay more than four minutes and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#43): predict the average arrival delay for the British Airways which will have security delay more than four minutes and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#44): predict the average air system delay for the Singapore Airlines which will have security delay more than four minutes and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#45): predict the average security delay for the Emirates Airlines which will have security delay more than four minutes and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average security', 'confidence': 0.9999980628490448}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#46): predict the average airline delay for the Emirates Airlines which will have security delay more than four minutes and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#47): predict the average late aircraft delay for the Emirates Airlines which will have security delay more than four minutes and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average late', 'confidence': 0.9999994337558746}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#48): predict the average length of the flights which will start from Atlanta International Airport with scheduled departure time between 6 AM and 3 PM for next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#49): predict the average change is duration of flights which will start from Atlanta International Airport which will have elapsed time more than four hours and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#50): predict the average departure delay for flights which will start from Atlanta International Airport which will have elapsed time more than four hours and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#51): predict the average arrival delay for flights which will start from Atlanta International Airport which will have elapsed time more than four hours and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#52): predict the average air system delay for flights which will start from Atlanta International Airport which will have flight duration more than four hours and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#53): predict the average security delay for flights which will start from Atlanta International Airport which will have elapsed time more than four hours and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average security', 'confidence': 0.9999980032444}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#54): predict the average airline delay for flights which will start from Atlanta International Airport which will have elapsed time more than four hours and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#55): predict the average late aircraft delay for flights which will start from Atlanta International Airport which will have elapsed time more than four hours and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average late', 'confidence': 0.9999994933605194}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#56): predict the average change is duration of flights which will start from Atlanta International Airport with security delay more than five minutes for next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#57): predict the average departure delay for flights which will start from Atlanta International Airport which will have security delay more than four minutes and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#58): predict the average arrival delay for flights which will start from Atlanta International Airport which will have security delay more than four minutes and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#59): predict the average air system delay for flights which will start from Atlanta International Airport which will have security delay more than four minutes and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#60): predict the average security delay for flights which will start from Atlanta International Airport which will have security delay more than four minutes and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average security', 'confidence': 0.9999983310699463}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#61): predict the average airline delay for flights which will start from Atlanta International Airport which will have security delay more than four minutes and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#62): predict the average late aircraft delay for flights which will start from Atlanta International Airport which will have security delay more than four minutes and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average late', 'confidence': 0.9999995231628418}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#63): predict the average change is duration of flights which will start from Atlanta International Airport which will have tail number greater than 1k and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#64): predict the average departure delay for flights which will start from Atlanta International Airport which will have tail number greater than 1k and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#65): predict the average arrival delay for flights which will start from Atlanta International Airport which will have tail number greater than 1k and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#66): predict the average air system delay for flights which will start from Atlanta International Airport which will have tail number greater than 1k and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#67): predict the average security delay for flights which will start from Atlanta International Airport which will have tail number greater than 1k and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average security', 'confidence': 0.9999984204769135}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#68): predict the average airline delay for flights which will start from Atlanta International Airport which will have tail number greater than 1k and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#69): predict the average late aircraft delay for flights which will start from Atlanta International Airport which will have tail number greater than 1k and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average late', 'confidence': 0.9999995529651642}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#70): predict the average change is duration of flights which will start from Atlanta International Airport which will have scheduled departure hour after 4 PM is central time and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#71): predict the average departure delay for flights which will start from Atlanta International Airport which will have scheduled departure hour after 4 PM is central time and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#72): predict the average arrival delay for flights which will start from Atlanta International Airport which will have scheduled departure hour after 4 PM is central time and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#73): predict the average air system delay for flights which will start from Atlanta International Airport which will have scheduled departure hour after 4 PM is central time and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#74): predict the average security delay for flights which will start from Atlanta International Airport which will have scheduled departure hour after 4 PM is central time and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average security', 'confidence': 0.9999987781047821}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#75): predict the average airline delay for flights which will start from Atlanta International Airport which will have scheduled departure hour after 4 PM is central time and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#76): predict the average late aircraft delay for flights which will start from Atlanta International Airport which will have scheduled departure hour after 4 PM is central time and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average late', 'confidence': 0.9999995231628418}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#77): predict the average change is duration of flights for flights which will start from Atlanta International Airport which will have scheduled time is in between 7 Am and 11 Am in central time and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999997615814209}, {'text': 'central', 'confidence': 0.8418452739715576}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#78): predict the average departure delay for flights which will start from Atlanta International Airport which will have scheduled time is in between 7 Am and 11 Am in central time  and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#79): predict the average arrival delay for flights which will start from Atlanta International Airport which will have scheduled time is in between 7 Am and 11 Am in central time  and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#80): predict the average air system delay for flights which will start from Atlanta International Airport which will have scheduled time is in between 7 Am and 11 Am in central time  and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#81): predict the average security delay for flights which will start from Atlanta International Airport which will have scheduled time is in between 7 Am and 11 Am in central time  and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average security', 'confidence': 0.9999980628490448}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#82): predict the average airline delay for flights which will start from Atlanta International Airport which will have scheduled time is in between 7 Am and 11 Am in central time  and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#83): predict the average late aircraft delay for flights which will start from Atlanta International Airport which will have scheduled time is in between 7 Am and 11 Am in central time  and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average late', 'confidence': 0.9999993443489075}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#84): predict the average change is duration of flights which will land in Atlanta International Airport which will have elapsed time more than four hours and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#85): predict the average departure delay for flights which will land in Atlanta International Airport which will have elapsed time more than four hours and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#86): predict the average arrival delay for flights which will land in Atlanta International Airport which will have elapsed time more than four hours and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#87): predict the average air system delay for flights which will land in Atlanta International Airport which will have flight duration more than four hours and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#88): predict the average security delay for flights which will land in Atlanta International Airport which will have elapsed time more than four hours and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average security', 'confidence': 0.9999981820583344}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#89): predict the average airline delay for flights which will land in Atlanta International Airport which will have elapsed time more than four hours and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#90): predict the average late aircraft delay for flights which will land in Atlanta International Airport which will have elapsed time more than four hours and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average late', 'confidence': 0.9999994933605194}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#91): predict the average change is duration of flights which will land in Atlanta International Airport with security delay more than five minutes for next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#92): predict the average departure delay for flights which will land in Atlanta International Airport which will have security delay more than four minutes and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#93): predict the average arrival delay for flights which will land in Atlanta International Airport which will have security delay more than four minutes and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#94): predict the average air system delay for flights which will land in Atlanta International Airport which will have security delay more than four minutes and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#95): predict the average security delay for flights which will land in Atlanta International Airport which will have security delay more than four minutes and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average security', 'confidence': 0.9999984204769135}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#96): predict the average airline delay for flights which will land in Atlanta International Airport which will have security delay more than four minutes and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#97): predict the average late aircraft delay for flights which will land in Atlanta International Airport which will have security delay more than four minutes and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average late', 'confidence': 0.9999995231628418}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#98): predict the average change is duration of flights which will land in Atlanta International Airport which will have tail number greater than 1k and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#99): predict the average departure delay for flights which will land in Atlanta International Airport which will have tail number greater than 1k and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#100): predict the average arrival delay for flights which will land in Atlanta International Airport which will have tail number greater than 1k and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#101): predict the average air system delay for flights which will land in Atlanta International Airport which will have tail number greater than 1k and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#102): predict the average security delay for flights which will land in Atlanta International Airport which will have tail number greater than 1k and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average security', 'confidence': 0.9999985098838806}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#103): predict the average airline delay for flights which will land in Atlanta International Airport which will have tail number greater than 1k and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#104): predict the average late aircraft delay for flights which will land in Atlanta International Airport which will have tail number greater than 1k and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average late', 'confidence': 0.9999995529651642}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#105): predict the average change is duration of flights which will land in Atlanta International Airport which will have scheduled departure hour after 4 PM is central time and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#106): predict the average departure delay for flights which will land in Atlanta International Airport which will have scheduled departure hour after 4 PM is central time and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#107): predict the average arrival delay for flights which will land in Atlanta International Airport which will have scheduled departure hour after 4 PM is central time and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#108): predict the average air system delay for flights which will land in Atlanta International Airport which will have scheduled departure hour after 4 PM is central time and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#109): predict the average security delay for flights which will land in Atlanta International Airport which will have scheduled departure hour after 4 PM is central time and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average security', 'confidence': 0.9999989569187164}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#110): predict the average airline delay for flights which will land in Atlanta International Airport which will have scheduled departure hour after 4 PM is central time and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#111): predict the average late aircraft delay for flights which will land in Atlanta International Airport which will have scheduled departure hour after 4 PM is central time and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average late', 'confidence': 0.9999995231628418}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#112): predict the average change is duration of flights for flights which will land in Atlanta International Airport which will have scheduled time is in between 7 Am and 11 Am in central time and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999997615814209}, {'text': 'International', 'confidence': 0.5886748433113098}, {'text': 'central', 'confidence': 0.8516249656677246}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#113): predict the average departure delay for flights which will land in Atlanta International Airport which will have scheduled time is in between 7 Am and 11 Am in central time  and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#114): predict the average arrival delay for flights which will land in Atlanta International Airport which will have scheduled time is in between 7 Am and 11 Am in central time  and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#115): predict the average air system delay for flights which will land in Atlanta International Airport which will have scheduled time is in between 7 Am and 11 Am in central time  and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#116): predict the average security delay for flights which will land in Atlanta International Airport which will have scheduled time is in between 7 Am and 11 Am in central time  and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average security', 'confidence': 0.9999982714653015}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#117): predict the average airline delay for flights which will land in Atlanta International Airport which will have scheduled time is in between 7 Am and 11 Am in central time  and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#118): predict the average late aircraft delay for flights which will land in Atlanta International Airport which will have scheduled time is in between 7 Am and 11 Am in central time  and will start within next week
Ground Truth (aggregator): AVERAGE

Annotation 1: [{'text': 'average', 'confidence': 1.0}]
Annotation 2: [{'text': 'average late', 'confidence': 0.9999993443489075}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#119): predict the total security delay for the Emirates Airlines which will have tail number greater than 1k and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total security', 'confidence': 0.9999896287918091}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#120): predict the total air system delay for the Emirates Airlines which will have tail number greater than 1k and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#121): predict the total delay due to bad weather for each flight of Emirates Airlines where flight will start next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'the', 'confidence': 0.9999998807907104}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999997615814209}, {'text': 'due', 'confidence': 0.9936028718948364}, {'text': 'bad', 'confidence': 0.9999940395355225}]

Configuration 1 took 3 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#122): predict total cancelled flights for each airline for next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total cancelled', 'confidence': 0.9999985098838806}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#123): predict the total departure delay for the Emirates Airlines which will have tail number greater than 1k and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#124): predict the total airline delay for the Emirates Airlines which will have tail number greater than 1k and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#125): predict the total late aircraft delay for the Emirates Airlines which will have tail number greater than 1k and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total late', 'confidence': 0.999999076128006}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#126): predict the total departure delay for the Emirates Airlines which will have scheduled departure hour after 4 PM is central time and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999997615814209}, {'text': 'central', 'confidence': 0.7338559627532959}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#127): predict the total air system delay for the Quatar Airlines which will have scheduled departure hour after 4 PM is central time and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#128): predict the total security delay for the Cathay Pacific Airways which will have scheduled departure hour after 4 PM is central time and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total security', 'confidence': 0.9999958872795105}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#129): predict the total airline delay for the Cathay Pacific Airways which will have scheduled departure hour after 4 PM is central time and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#130): predict the total late aircraft delay for the Cathay Pacific Airways which will have scheduled departure hour after 4 PM is central time and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total late', 'confidence': 0.9999991953372955}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#131): predict the total departure delay for the Emirates Airlines which will have scheduled time is in between 7 Am and 11 Am in central time  and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#132): predict the total air system delay for the Quatar Airlines which will have scheduled time is in between 7 Am and 11 Am in central time  and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#133): predict the total security delay for the Cathay Pacific Airways which will have scheduled time is in between 7 Am and 11 Am in central time  and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total security', 'confidence': 0.9999950528144836}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#134): predict the total airline delay for the Cathay Pacific Airways which will have scheduled time is in between 7 Am and 11 Am in central time  and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#135): predict the total late aircraft delay for the Cathay Pacific Airways which will have scheduled time is in between 7 Am and 11 Am in central time  and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total late', 'confidence': 0.9999988675117493}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#136): predict the total departure delay for the Emirates Airlines which will have flight duration more than five hours and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#137): predict the total air system delay for the Quatar Airlines which will have flight duration more than seven hours and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#138): predict the total security delay for the Cathay Pacific Airways which will have elapsed time more than eight hours and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total security', 'confidence': 0.999986857175827}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#139): predict the total airline delay for the Cathay Pacific Airways which will have flight duration more than five hours and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#140): predict the total late aircraft delay for the Cathay Pacific Airways which will have flight duration more than eleven hours and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total late', 'confidence': 0.9999991655349731}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#141): predict the total departure delay for the Emirates Airlines security delay more than five minutes for next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#142): predict the total air system delay for the Quatar Airlines which will have security delay more than seven minutes and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#143): predict the total security delay for the Cathay Pacific Airways which will have security delay more than eight minutes and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total security', 'confidence': 0.9999935925006866}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#144): predict the total airline delay for the Cathay Pacific Airways with security delay more than five minutes for next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#145): predict the total late aircraft delay for the Cathay Pacific Airways which will have security delay more than eleven minutes and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total late', 'confidence': 0.9999991655349731}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#146): predict the total departure delay for flights which will start from Atlanta International Airport which will have flight duration more than five hours and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#147): predict the total air system delay for flights which will start from Atlanta International Airport will have flight duration more than seven hours and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#148): predict the total security delay for flights which will start from Atlanta International Airport which will have elapsed time more than eight hours and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total security', 'confidence': 0.9999758899211884}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#149): predict the total airline delay for flights which will start from Atlanta International Airport which will have flight duration more than five hours and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#150): predict the total late aircraft delay for flights which will start from Atlanta International Airport which will have flight duration more than eleven hours and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total late', 'confidence': 0.9999992251396179}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#151): predict the total departure delay for flights which will start from Atlanta International Airport security delay more than five minutes for next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999998211860657}, {'text': 'International', 'confidence': 0.4658524990081787}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#152): predict the total air system delay for flights which will start from Atlanta International Airport which will have security delay more than seven minutes and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#153): predict the total security delay for flights which will start from Atlanta International Airport which will have security delay more than eight minutes and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total security', 'confidence': 0.9999683499336243}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#154): predict the total airline delay for flights which will start from Atlanta International Airport with security delay more than five minutes for next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#155): predict the total late aircraft delay for flights which will start from Atlanta International Airport which will have security delay more than eleven minutes and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total late', 'confidence': 0.9999992251396179}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#156): predict the total departure delay for flights which will start from Atlanta International Airport which will have tail number greater than 1k and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#157): predict the total air system delay for flights which will start from Atlanta International Airport which will have tail number greater than 1k and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#158): predict the total security delay for flights which will start from Atlanta International Airport which will have tail number greater than 1k and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total security', 'confidence': 0.9999795854091644}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#159): predict the total airline delay for flights which will start from Atlanta International Airport swhich will have tail number greater than 1k and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#160): predict the total late aircraft delay for flights which will start from Atlanta International Airport which will have tail number greater than 1k and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total late', 'confidence': 0.9999991953372955}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#161): predict the total departure delay for flights which will start from Atlanta International Airport which will have scheduled departure hour after 4 PM is central time and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#162): predict the total air system delay for flights which will start from Atlanta International Airport which will have scheduled departure hour after 4 PM is central time and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#163): predict the total security delay for flights which will start from Atlanta International Airport which will have scheduled departure hour after 4 PM is central time and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total security', 'confidence': 0.9999890625476837}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#164): predict the total airline delay for flights which will start from Atlanta International Airport which will have scheduled departure hour after 4 PM is central time and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#165): predict the total late aircraft delay for flights which will start from Atlanta International Airport which will have scheduled departure hour after 4 PM is central time and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total late', 'confidence': 0.9999991953372955}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#166): predict the total departure delay for flights which will start from Atlanta International Airport which will have scheduled time is in between 7 Am and 11 Am in central time  and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#167): predict the total air system delay for flights which will start from Atlanta International Airport which will have scheduled time is in between 7 Am and 11 Am in central time  and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#168): predict the total security delay for flights which will start from Atlanta International Airport which will have scheduled time is in between 7 Am and 11 Am in central time  and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total security', 'confidence': 0.9999693334102631}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#169): predict the total airline delay for flights which will start from Atlanta International Airport which will have scheduled time is in between 7 Am and 11 Am in central time  and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#170): predict the total late aircraft delay for flights which will start from Atlanta International Airport which will have scheduled time is in between 7 Am and 11 Am in central time  and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total late', 'confidence': 0.9999987185001373}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#171): predict the total departure delay for flights which will land in Atlanta International Airport which will have flight duration more than five hours and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 0.9999999403953552}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#172): predict the total air system delay for flights which will land in Atlanta International Airport will have flight duration more than seven hours and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#173): predict the total security delay for flights which will land in Atlanta International Airport which will have elapsed time more than eight hours and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total security', 'confidence': 0.9999819397926331}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#174): predict the total airline delay for flights which will land in Atlanta International Airport which will have flight duration more than five hours and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#175): predict the total late aircraft delay for flights which will land in Atlanta International Airport which will have flight duration more than eleven hours and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total late', 'confidence': 0.9999992251396179}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#176): predict the total departure delay for flights which will land in Atlanta International Airport security delay more than five minutes for next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999998211860657}, {'text': 'International', 'confidence': 0.5632345080375671}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#177): predict the total air system delay for flights which will land in Atlanta International Airport which will have security delay more than seven minutes and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#178): predict the total security delay for flights which will land in Atlanta International Airport which will have security delay more than eight minutes and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total security', 'confidence': 0.9999750852584839}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#179): predict the total airline delay for flights which will land in Atlanta International Airport with security delay more than five minutes for next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#180): predict the total late aircraft delay for flights which will land in Atlanta International Airport which will have security delay more than eleven minutes and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total late', 'confidence': 0.9999991953372955}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#181): predict the total departure delay for flights which will land in Atlanta International Airport which will have tail number greater than 1k and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#182): predict the total air system delay for flights which will land in Atlanta International Airport which will have tail number greater than 1k and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#183): predict the total security delay for flights which will land in Atlanta International Airport which will have tail number greater than 1k and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total security', 'confidence': 0.999983549118042}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#184): predict the total airline delay for flights which will land in Atlanta International Airport which will have tail number greater than 1k and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999998211860657}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#185): predict the total late aircraft delay for flights which will land in Atlanta International Airport which will have tail number greater than 1k and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total late', 'confidence': 0.9999991953372955}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#186): predict the total departure delay for flights which will land in Atlanta International Airport which will have scheduled departure hour after 4 PM is central time and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#187): predict the total air system delay for flights which will land in Atlanta International Airport which will have scheduled departure hour after 4 PM is central time and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#188): predict the total security delay for flights which will land in Atlanta International Airport which will have scheduled departure hour after 4 PM is central time and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total security', 'confidence': 0.9999912977218628}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#189): predict the total airline delay for flights which will land in Atlanta International Airport which will have scheduled departure hour after 4 PM is central time and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#190): predict the total late aircraft delay for flights which will land in Atlanta International Airport which will have scheduled departure hour after 4 PM is central time and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total late', 'confidence': 0.9999991953372955}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#191): predict the total departure delay for flights which will land in Atlanta International Airport which will have scheduled time is in between 7 Am and 11 Am in central time  and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#192): predict the total air system delay for flights which will land in Atlanta International Airport which will have scheduled time is in between 7 Am and 11 Am in central time  and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#193): predict the total security delay for flights which will land in Atlanta International Airport which will have scheduled time is in between 7 Am and 11 Am in central time  and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total security', 'confidence': 0.9999779164791107}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#194): predict the total airline delay for flights which will land in Atlanta International Airport which will have scheduled time is in between 7 Am and 11 Am in central time  and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total', 'confidence': 0.9999997615814209}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

Query (#195): predict the total late aircraft delay for flights which will land in Atlanta International Airport which will have scheduled time is in between 7 Am and 11 Am in central time  and will start within next week
Ground Truth (aggregator): TOTAL

Annotation 1: [{'text': 'total', 'confidence': 1.0}]
Annotation 2: [{'text': 'total late', 'confidence': 0.9999987185001373}]

Configuration 1 took 1 attempt(s) to get the correct answer
Configuration 2 took 1 attempt(s) to get the correct answer

--------------------------------------------------

