[
    {
        "question": "For the seasons that the San Antonio Spurs ranked 1st of Game Stats, what percentage of the Spurs' total points per game in a latest one of these seaons did Gervin score? Round your answer to the nearest tenth of a percent.",
        "task": 2,
        "task_id": -7555063192040816978,
        "solution": "Find all seasons in which the San Antonio Spurs ranked 1st. George Gervin's PPG for 1979-80 season (33.1, Basketball-Reference Gervin page); find 1979-80 Spurs team PTS/G (119.4, Spurs 1979-80 team page); divide 33.1 by 119.4 and multiply by 100; round to nearest tenth: 33.1 / 119.4 × 100 ≈ 27.7%",
        "urls": [
            "https://hofbbplayers.com/george-gervin/#:~:text=Gervin%20led%20the%20NBA%20in%20scoring%20four%20times,George%20helped%20them%20to%20win%20five%20division%20titles.",
            "https://www.basketball-reference.com/leagues/NBA_1979.html",
            "https://www.basketball-reference.com/teams/SAS/1980.html"
        ],
        "true_answer": "27.7%",
        "file_name": ""
    },
    {
        "question": "Across the five years from 2020 to 2024, which channel or streaming provider won the most 'Best Drama Series' awards at the British Academy Television Awards according to annual wiki page (22 march 2025 versio, Consider joint wins as counting for each credited provider) For the channel or provider with the most awards, among their latest award-winning work during this period, which features characters who appeared in all seasons, who was the last named character each of these characters saw before their death or series ending?",
        "task": 3,
        "task_id": 2865688078260817009,
        "solution": "For each year (2020–2024), find the 'Best Drama Series' winner at the BAFTA TV Awards:\n2024: Top Boy (Netflix)\n2023: Bad Sisters (Apple TV+)\n2022: In My Skin (BBC Three)\n2021: Save Me Too (Sky Atlantic)\n2020: The End of the F***ing World (Netflix/Channel 4)\nCount wins by provider (count both Netflix and Channel 4 in 2020). Win totals: Netflix (2), Apple TV+ (1), BBC Three (1), Sky Atlantic (1), Channel 4 (1). The provider with the most wins is Netflix. For each year, count the number of unique providers credited: [1 (2024), 1 (2023), 1 (2022), 1 (2021), 2 (2020)]. Average over 5 years: (1+1+1+1+2)/5 = 1.2.",
        "urls": [
            "https://en.wikipedia.org/wiki/2024_British_Academy_Television_Awards",
            "https://en.wikipedia.org/wiki/2023_British_Academy_Television_Awards",
            "https://en.wikipedia.org/wiki/2022_British_Academy_Television_Awards",
            "https://en.wikipedia.org/wiki/2021_British_Academy_Television_Awards",
            "https://en.wikipedia.org/wiki/2020_British_Academy_Television_Awards",
            "https://en.wikipedia.org/wiki/British_Academy_Television_Award_for_Best_Drama_Series",
            "https://topboy.fandom.com/wiki/Dushane_Hill"
        ],
        "true_answer": "Netflix; Dushane Hill: Gerard Sullivan, Gerard Sullivan: Stefan Tovell",
        "file_name": ""
    },
    {
        "question": "Among Swedish film legends (1) died in March 2020 (aged 90), (2)died in April 2019 (aged 83), and (3) aged 38 before july 2025 and married German-Irish actor between 2015-2018 , who had the highest average annual number of film and TV appearances in their single most productive decade (give me the name), and what was that average?",
        "task": 3,
        "task_id": -5277764067157891526,
        "solution": "Find the total number of film/TV credits for each actor in each decade using their full IMDb filmography pages and biography sources. Max von Sydow's most productive decade was the 1960s with about 25 roles, Bibi Andersson's 1960s output was about 20, Alicia Vikander's 2010s output was about 22–25. Divide by 10 (years in decade) for each; the highest average is Max von Sydow with ~2.5 films/year, slightly higher than either Andersson or Vikander's best decade averages.",
        "urls": [
            "https://www.imdb.com/list/ls068343090/",
            "https://www.scandi.co.uk/personalities/max-von-sydow/",
            "https://www.imdb.com/name/nm0000761/",
            "https://www.npr.org/2020/03/09/521704797/remembering-actor-max-von-sydow-from-bergman-to-game-of-thrones",
            "https://www.imdb.com/list/ls049806292/"
        ],
        "true_answer": "Max von Sydow, about 2.5 films per year",
        "file_name": ""
    },
    {
        "question": "Between the first census after the founding of the church in America that came from Czar Alexander III to memorialize his father, who commissioned his favorite architect to design, and closest census after the church's centennial, what was the average annual percent change in Benld's population? (rounded to three decimals, i.e., +/-a.bcd) Official government sources and Wikipedia census data are acceptable.",
        "task": 1,
        "task_id": -594787965302663970,
        "solution": "https://www.illinoistimes.com/news-opinion/shadows-of-the-motherland-11439882 said the church dounded in 1907. so the two census are of 1910 and 2010. 1910 = 1,912; in 2010 = 1,556 (https://en.wikipedia.org/wiki/Benld,_Illinois).",
        "urls": [
            "https://en.wikipedia.org/wiki/Benld,_Illinois",
            "https://www.illinoistimes.com/news-opinion/shadows-of-the-motherland-11439882"
        ],
        "true_answer": "-0.206",
        "file_name": ""
    },
    {
        "question": "Across all NBA seasons where Manu Ginobili's Player Efficiency Rating (PER) exceeded 20 in the regular season, what was the average number of regular season wins by his team (Rounded to two decimal places)?",
        "task": 2,
        "task_id": -7549899259542410117,
        "solution": "Step 1: From Ginobili's season-by-season advanced statistics on stat-nba.com or Basketball Reference, find all NBA seasons where his PER was greater than 20. Those seasons are: 2004-05, 2005-06, 2006-07, 2007-08, 2008-09, 2009-10, 2010-11, and 2011-12. Step 2: For each of these seasons, look up the San Antonio Spurs' regular season win totals from the franchise history page on Basketball Reference or stat-nba.com. The win totals are: 59, 63, 58, 56, 54, 50, 61, 50. Step 3: Calculate the average of those win totals: (59 + 63 + 58 + 56 + 54 + 50 + 61 + 50) / 8 = 451 / 8 = 56.375. Rounded to two decimal places: 56.38.",
        "urls": [
            "http://www.stat-nba.com/player/1316.html",
            "https://www.basketball-reference.com/players/g/ginobma01.html",
            "https://www.nba.com/player/1938/manu-ginobili"
        ],
        "true_answer": 56.38,
        "file_name": ""
    },
    {
        "question": "Consider the Kanye West-produced singles for other artists released between 2001 and 2005 that peaked on the Billboard Hot 100. If you take the following singles: Jay-Z's 'Izzo (H.O.V.A.)', those for Alicia Keys, the song for Twista and featuring with him, the song for Jay-Z with a number in the title, the song for Ludacris and a rapper born in 1978, the solo song for Talib Kweli with the shortest name, those for Monica, and the song with the shortest name for Common, what is the mean of their respective Billboard Hot 100 peak positions? (If a song did not chart, exclude it.)",
        "task": 2,
        "task_id": 4933313387987693049,
        "solution": "Gather the 8 Kanye West-produced singles for other artists released 2001-2005 that charted on the Billboard Hot 100. Record their Hot 100 peak positions: Izzo (H.O.V.A.) = 8, You Don't Know My Name = 3, Overnight Celebrity = 6, '03 Bonnie & Clyde = 4, Stand Up = 1, Get By = 77, Knock Knock = 75, Go! = 79. Calculate the average: (8+3+6+4+1+77+75+79)/8 = 31.625.",
        "urls": [
            "https://www.youtube.com/watch?v=GH3ndhOaxYo",
            "https://en.wikipedia.org/wiki/Kanye_West_production_discography",
            "https://www.billboard.com/lists/jay-z-top-songs-billboard-hot-100/",
            "https://en.wikipedia.org/wiki/You_Don%27t_Know_My_Name",
            "https://en.wikipedia.org/wiki/Stand_Up_(Ludacris_song)",
            "https://en.wikipedia.org/wiki/Get_By",
            "https://en.wikipedia.org/wiki/Go!_(Common_song)"
        ],
        "true_answer": 31.625,
        "file_name": ""
    },
    {
        "question": "Between 2010 and 2019, which company gave more main stage performances per year at the neighborhood in the heart of London's West End features Seven Dials, Neal's Yard, and is renowned for its world-class shopping and dining: The Royal Ballet or The Bolshoi Opera? What was the average difference in annual performance counts between them over this period? (you can use rbo.org.uk)",
        "task": 2,
        "task_id": -7237318473412932387,
        "solution": "1) Query the ROH Performance Database (https://rohcollections.org.uk/PerformanceSearch.aspx) for main stage performances by 'The Royal Ballet' for each year from 2010 to 2019; sum for the total, divide by 10 for the average annual value (1928/10 = 192.8 /year). 2) Repeat for 'The Royal Opera' for the same period (1534/10 = 153.4 /year). 3) Subtract average annual Royal Opera performances from Royal Ballet performances: 120 - 110 = 10. 4) Conclusion: The Royal Ballet gave more main stage performances per year, on average by 10, during 2010–2019.",
        "urls": [
            "https://rohcollections.org.uk/PerformanceSearch.aspx"
        ],
        "true_answer": "The Royal Ballet,39.4.",
        "file_name": ""
    },
    {
        "question": "Among the piano works of three eminent Baroque composers: A) a German-British composer, a native of Halle, Germany, initially pursued his career in Hamburg and Italy before making London his permanent home in 1712, B) a German composer and musician (born in O.S. 21 March 1685, died in 1750), and C) an Italian composer (born in Naples, then part of the Spanish Empire under the Kingdom of Naples, died in 1757);  as catalogued by difficulty on pianolibrary.org, which composer has the highest percentage of pieces with a difficulty rating of 2 or lower, and what is that percentage (rounded to two decimal places)?",
        "task": 1,
        "task_id": -4465151474575729091,
        "solution": "1. Gather from pianolibrary.org the counts of works at each difficulty for Handel, Bach, and Scarlatti:\n\nHandel: 1=102, 1.5=63, 2=84, 2.5=75, 3=23, 3.5=2; total=349; easy=102+63+84=249.\nBach: 1=15, 1.5=58, 2=134, 2.5=211, 3=111, 3.5=76, 4=19, 4.5=3; total=627; easy=15+58+134=207.\nScarlatti: 1=8, 1.5=40, 2=102, 2.5=206, 3=189, 3.5=60, 4=12, 4.5=3; total=620; easy=8+40+102=150.\n\n2. Calculate the percentage for each:\nHandel: 249/349×100 ≈ 71.35%\nBach: 207/627×100 ≈ 33.01%\nScarlatti: 150/620×100 ≈ 24.19%\n3. Conclusion: Handel has the highest such percentage, at 71.35%.",
        "urls": [
            "https://www.pianolibrary.org/difficulty/handel/",
            "https://www.pianolibrary.org/difficulty/bach/",
            "https://www.pianolibrary.org/difficulty/scarlatti/"
        ],
        "true_answer": "George Frideric Handel, 71.35%",
        "file_name": ""
    },
    {
        "question": "For Jane Austen’s six major novels—Sense and Sensibility, Pride and Prejudice, Mansfield Park, Emma, Northanger Abbey, and Persuasion—what is the average number of years between each novel's first publication and its first major film or television adaptation (Not need to consider months and days. A margin of error is allowed, within 5 years)? Additionally, which novel experienced the shortest such interval? (use wikipedia to search).",
        "task": 2,
        "task_id": 2582056263859172614,
        "solution": "Step 1: Find the first publication year for each of the six major novels. Step 2: Find the year of the first major screen adaptation (film or TV) for each. Step 3: Calculate the lag (adaptation year minus publication year) for each. Step 4: The lags are: Sense and Sensibility (1971-1811=160), Pride and Prejudice (1938-1813=125), Mansfield Park (1983-1814=169), Emma (1948-1815=133), Northanger Abbey (1968-1817=151), Persuasion (1960-1817=143). Step 5: Compute the average: (160 + 125 + 169 + 133 + 151 + 143) / 6 = 146.83 years. Step 6: The shortest lag is 125 years for Pride and Prejudice.",
        "urls": [
            "https://en.wikipedia.org/wiki/Sense_and_Sensibility#",
            "https://en.wikipedia.org/wiki/Pride_and_Prejudice#",
            "https://en.wikipedia.org/wiki/Mansfield_Park",
            "https://en.wikipedia.org/wiki/Emma_(novel)",
            "https://en.wikipedia.org/wiki/Northanger_Abbey",
            "https://en.wikipedia.org/wiki/Persuasion_(novel)"
        ],
        "true_answer": "The average interval is 155.5 years; the shortest is Pride and Prejudice with 125 years.",
        "file_name": ""
    },
    {
        "question": "During the regular NBA seasons of 1993-2000, there is a season that Jerry Sloan coached team ranked 1st among all teams. What was the difference, to three decimal places, between the win percentage of the team Jerry Sloan coached that season and the average win percentage of all other NBA teams (excluding the team Jerry Sloan coached)?",
        "task": 2,
        "task_id": -1628355626087393713,
        "solution": "Step 1: Identify Jerry Sloan's peak Utah Jazz season and win percentage: 1996-97, 64-18 (0.780). Step 2: Retrieve every NBA team's win-loss record for the 1996-97 season (from league summary/standings). Step 3: Calculate win percentage for each team. Step 4: Exclude the Utah Jazz; compute the average win percentage of the remaining teams (which comes to 0.489982578397: 0.744(MIA) + 0.695(NYK) + 0.549(ORL) + 0.537(WAS) + 0.317(NJN) + 0.268(PHI) + 0.183(BOS) + 0.841(CHI) + 0.683(ATL) + 0.659(DET) + 0.659(CHA) + 0.512(CLE) + 0.476(IND) + 0.402(MIL) + 0.366(TOR) + 0.695(HOU) + 0.488(MIN) + 0.293(DAL) + 0.256(DEN) + 0.244(SAS) + 0.171(VAN) + 0.695(SEA) + 0.683(LAL) + 0.598(POR) + 0.488(PHO) + 0.439(LAC) + 0.415(SAC) + 0.366(GSW) ). Step 5: Subtract this average from the Jazz's win percentage: 0.780 - 0.489982578397 = 0.291.",
        "urls": [
            "http://www.stat-nba.com/coach/257.html",
            "https://www.basketball-reference.com/leagues/NBA_1997.html",
            "https://www.basketball-reference.com/coaches/sloanje01c.html",
            "https://www.espn.com/nba/standings/_/season/1997"
        ],
        "true_answer": "0.291",
        "file_name": ""
    },
    {
        "topic": "Venomous Snakes and Antivenom Coverage in Afghanistan",
        "question": "A female author published an article about venomous snakes in a Middle Eastern country on shunculture.com before 2025. The article mentioned the number of venomous snake species in that country and provided four examples. Among the Wikipedia pages of these four examples, before 2025 (excluding 2025), the snake that was edited the most had a modification about its distribution area that cited a book. What is the ISBN-13 number of that book?",
        "true_answer": "978-1893777019",
        "task": 2,
        "task_id": 7798475503675200463,
        "file_name": "",
        "solution": "Caspian cobra: Revision history: 280, Gloydius halys: Revision history: 99, Macrovipera lebetinus: Revision history: 223, Echis carinatus: Revision history: 368. The book is https://www.amazon.com/Snake-Species-World-Taxonomic-Geographic/dp/1893777014.",
        "urls": [
            "https://shunculture.com/article/do-they-have-venomous-snakes-in-afghanistan",
            "https://en.wikipedia.org/w/index.php?title=Caspian_cobra&action=history&offset=&limit=500",
            "https://en.wikipedia.org/wiki/Gloydius_halys",
            "https://en.wikipedia.org/wiki/Macrovipera_lebetinus",
            "https://en.wikipedia.org/wiki/Echis_carinatus",
            "https://www.amazon.com/Snake-Species-World-Taxonomic-Geographic/dp/1893777014"
        ]
    },
    {
        "topic": "Marvel actors' debut film box office and ROI calculation",
        "question": "Among Benedict Cumberbatch, Robert Downey Jr., and Anthony Hopkins, whose first appearance in a Marvel Cinematic Universe film corresponded to the highest return on investment (ROI) when computed as (Worldwide Gross - Budget) / Budget for their debut Marvel movie, and what is the ROI (three decimals)? Use their first Marvel movie and respective budgets and worldwide grosses.",
        "true_answer": "Robert Downey Jr., 3.184",
        "task": 2,
        "task_id": -4443511152127229520,
        "file_name": "",
        "solution": "Step 1: Identify each actor's first MCU film (Doctor Strange for Cumberbatch, Iron Man for Downey Jr., Thor for Hopkins) and get the budget and worldwide gross for each: Doctor Strange ($165M budget, $677.718M gross), Iron Man ($140M, $585.796M), Thor ($150M, $449.327M) from Box Office Mojo.\nStep 2: Calculate (Worldwide Gross - Budget) / Budget for each:\n- Doctor Strange: (677,718,395 - 165,000,000) / 165,000,000 ≈ 3.108\n- Iron Man: (585,796,247 - 140,000,000) / 140,000,000 ≈ 3.184\n- Thor: (449,326,618 - 150,000,000) / 150,000,000 ≈ 2.0\nStep 3: Highest ROI is Iron Man (Robert Downey Jr.) with ≈ 3.184.",
        "urls": [
            "https://www.entoin.com/entertainment/marvel-actors",
            "https://www.boxofficemojo.com/release/rl3076752897/",
            "https://www.boxofficemojo.com/title/tt0371746/",
            "https://www.boxofficemojo.com/release/rl3094644225/",
            "https://www.imdb.com/"
        ]
    },
    {
        "topic": "",
        "question": "I've been on a diet recently. I came across an article on eatingwell.com that lists 24 diet recipes for beginnners on 2025. I want to try the avocado dish from the website, and I don't want to eat things like sandwiches or pasta. I'm a beginner at cooking, so I only want to make the quickest dishes. How many Vitamin A (IU) will this dish (per serving) provide? I just weighed my avocado at 160g. Based on the recipe's avocado quantity and the fat content per serving, what percentage of the total fat comes from the avocado? (use data from wiki, and rounded to two decimals)",
        "true_answer": "74IU; 65.16%",
        "task": 2,
        "task_id": -44435111521272222,
        "file_name": "",
        "solution": "1. In the https://www.eatingwell.com/gallery/7902553/weight-loss-recipes-for-beginners/, four recipes satisfy the constraints. And https://www.eatingwell.com/recipe/261611/white-bean-avocado-toast/ cost the least time. Click the Show Full Nutrition Label, the vitemin A privied per serves is 74IU. 2. The total fat provied is 9g. An avocado weighing 160g, take 1/4 according to the recipe, totaling 40g. Wiki shows that 100g of avocado provides 14.66g of fat. 1/4 * (160/100 * 14.66) = 5.864, 5.864/9*100% = 65.16%.",
        "urls": [
            "https://www.eatingwell.com/gallery/7902553/weight-loss-recipes-for-beginners/",
            "https://www.eatingwell.com/recipe/260715/tomato-avocado-cheese-sandwich/",
            "https://www.eatingwell.com/recipe/261611/white-bean-avocado-toast/",
            "https://www.eatingwell.com/recipe/250881/avocado-tomato-chicken-sandwich/",
            "https://www.eatingwell.com/recipe/270549/salmon-stuffed-avocados/"
        ]
    },
    {
        "topic": "",
        "question": "How many new computer science (cs) papers were added to arXiv between July 15, 2025, and July 16, 2025? Exclude cross-listed papers.",
        "true_answer": "598",
        "task": 2,
        "task_id": -4443511151121272222,
        "file_name": "",
        "solution": "Click page 3, there are 598 papers totally.",
        "urls": [
            "https://arxiv.org/search/advanced?advanced=&terms-0-operator=AND&terms-0-term=&terms-0-field=title&classification-computer_science=y&classification-physics_archives=all&classification-include_cross_list=exclude&date-year=2025&date-filter_by=date_range&date-from_date=2025-07-15&date-to_date=2025-07-16&date-date_type=submitted_date&abstracts=show&size=200&order=-announced_date_first&start=400"
        ]
    },
    {
        "question": "In the 2018–19 NBA season, consider the players selected for the All-NBA 1st and 3rd Teams. For the 1st Team, only include players whose teams reached at least the Conference Finals that year. For the 3rd Team, include all selections. What is the absolute difference between (A) the average regular season win total of the 1st Team players (using only those whose teams reached at least the top two of the Conference Finals) and (B) the average regular season win total of the 3rd Team players? Provide your answer rounded to one decimal place.",
        "task": 2,
        "task_id": 2870495276088031403,
        "solution": "1. From https://www.basketball-reference.com/awards/all_league.html and Basketball Reference, identify 2018–19 All-NBA 1st Team: Giannis Antetokounmpo (MIL), Stephen Curry (GSW), Paul George (OKC), James Harden (HOU), Nikola Jokic (DEN). 1st Team Conference Finals teams: Milwaukee Bucks (Giannis, reached ECF, 60 wins), Golden State Warriors (Curry, reached WCF, 57 wins), Denver Nuggets(Nikola Jokic, 54 wins) . 2. 3rd Team: Rudy Gobert (UTA), LeBron James (LAL), Blake Griffin (DET), Kemba Walker (CHO), Russell Westbrook (OKC). 3. Obtain 2018–19 team wins: MIL: 60, GSW: 57, UTA: 50, LAL: 37, DET: 41, CHO: 39, OKC: 49. 4. Compute averages: (A) 1st Team CF average = (60 + 57+ 54)/3 = 57; (B) 3rd Team average = (50 + 37 + 41 + 39 + 49)/5 = 43.2. 5. Absolute difference = |57 - 43.2| = 14.7.",
        "urls": [
            "https://www.basketball-reference.com/awards/all_league.html",
            "https://www.basketball-reference.com/leagues/NBA_2019_standings.html",
            "https://www.basketball-reference.com/playoffs/NBA_2019.html"
        ],
        "true_answer": "14.7",
        "file_name": ""
    },
    {
        "question": "Based on the most recent official numbers for 2025, what percentage of total active-duty personnel across all eight federal uniformed services of the United States is accounted for by the combined Public Health Service Commissioned Corps and NOAA Commissioned Officer Corps? (Note: Use published numbers for Army, Navy, Air Force, Marine Corps, Space Force, Coast Guard, Public Health Service Commissioned Corps, and NOAA Corps. Show calculation and answer to three decimal places.)",
        "task": 3,
        "task_id": 1246752951764405840,
        "solution": "1. Find 2025 active-duty personnel for each uniformed service: Army (445,475), Navy (330,011), Air Force (313,615), Marine Corps (168,298), Space Force (9,671), Coast Guard (40,590), Public Health Service Commissioned Corps (8,000), NOAA Corps (330).\n2. Sum all eight for the total: 445,475 + 330,011 + 313,615 + 168,298 + 9,671 + 40,590 + 8,000 + 330 = 1,315,990.\n3. Sum the two non-military services: 8,000 (USPHS) + 330 (NOAA) = 8,330.\n4. Percentage = (8,330 / 1,315,990) * 100 = 0.633% (rounded to three decimal places).",
        "urls": [
            "https://en.wikipedia.org/wiki/Uniformed_services_of_the_United_States",
            "https://usafacts.org/articles/how-many-people-are-in-the-us-military-a-demographic-overview/",
            "https://www.hhs.gov/about/budget/fy-2025-hhs-contingency-staffing-plan/index.html",
            "https://www.omao.noaa.gov/noaa-corps/about-noaa-corps"
        ],
        "true_answer": "0.633%",
        "file_name": ""
    },
    {
        "question": "Muriel Spark published several books between 1973 and 1996. One of the books had a title 8 years before it was first published, and the Wikipedia page for this book was edited several times by someone in 2021. Who was this person?",
        "task": 2,
        "task_id": 7958232944631551435,
        "solution": "Step 1: Collect publication years of his books.\nStep 2: Find which book was named in before this book published:  The Hothouse by the East River.\nStep 3: Find the editor of this book's editorial history in 2021: GrahamHardy. \n Final answer: GrahamHardy.",
        "urls": [
            "https://en.wikipedia.org/wiki/Muriel_Spark",
            "https://en.wikipedia.org/wiki/The_Hothouse_by_the_East_River",
            "https://en.wikipedia.org/wiki/The_Abbess_of_Crewe",
            "https://en.wikipedia.org/wiki/The_Takeover_(novel)"
        ],
        "true_answer": "GrahamHardy",
        "file_name": ""
    },
    {
        "question": "For the Slavic language whose ISO 639-3 code matches the language with more monolingual training sentences in the PaSeMiLL dataset than 'dsb', what is the best F-score (as a percentage) reported in Okabe et al., 2025? How much higher is it than the model with no post-processing?",
        "task": 2,
        "task_id": -5234249704560143567,
        "solution": "A) Find the ISO code for Upper Sorbian (hsb) and Lower Sorbian (dsb) in Glottolog; B) See in the PaSeMiLL dataset (https://github.com/shuokabe/PaSeMiLL) and Okabe et al. 2025 paper that Upper Sorbian has more monolingual sentences than Lower Sorbian; C) Extract from Okabe et al. Table 2 the best reported F-score for Upper Sorbian sentence mining, which is 51.38 (using SimAlign). The F1 of model with no post-processing is 30.73, is lower than previous 20.65. Final answer: 51.38, 20.65.",
        "urls": [
            "https://alexfraser.github.io/pubs/okabe_computel2025.pdf",
            "https://github.com/shuokabe/PaSeMiLL",
            "https://glottolog.org/resource/languoid/id/uppe1395",
            "https://en.wikipedia.org/wiki/Sorbs"
        ],
        "true_answer": "51.38, 20.65",
        "file_name": ""
    },
    {
        "question": "Among the ten highest-grossing domestic films released in the United States in 1995, what was the average number of Oscar nominations(including winner) each film received at the 68th Academy Awards? How does the number of nominations for 'Toy Story' compare to this average?",
        "task": 2,
        "task_id": 208858598736747372,
        "solution": "1. Identify the top 10 US domestic box office films of 1995: Batman Forever (3 nominations), Apollo 13 (9), Toy Story (4), Pocahontas (2), Ace Ventura: When Nature Calls (0), Casper (0), Die Hard with a Vengeance (0), Crimson Tide (3), GoldenEye (0), Waterworld (1). 2. Add the nominations: 3+9+4+2+0+0+0+3+0+1 = 22.\n3. Divide by 10 films: 22 / 10 = 2.2.\n4. Toy Story had 4 nominations—well above the average.",
        "urls": [
            "https://www.boxofficemojo.com/year/1995/",
            "https://www.boxofficemojo.com/title/tt0114709/",
            "https://www.oscars.org/oscars/ceremonies/1996"
        ],
        "true_answer": "2.2; 4 nominations (above average)",
        "file_name": ""
    },
    {
        "question": "There is a book whose protagonist is Garrick Haardrad. This book belongs to a series of three books. What is the time interval between the founding and acquisition of the publisher of the book with the highest rating on Goodreads?",
        "task": 2,
        "task_id": 882499031645144436,
        "solution": "1. Find the book whose protagonist is Garrick Haardrad: 'Fires of Winter'.\n 2. Find the series: 'Fires of Winter', 'Hearts Aflame', and 'Surrender My Love.\n 3.Find the book with highest rating: Hearts Aflame, the publisher is Avon and the founded year is 1941.\n 4. It was acquisition by  Hearst Corporation in 1959. \n  5. The gap is 18 years.",
        "urls": [
            "https://www.goodreads.com/book/show/891814.Fires_of_Winter",
            "https://www.goodreads.com/book/show/301821.Hearts_Aflame",
            "https://www.goodreads.com/book/show/325558.Surrender_My_Love",
            "https://en.wikipedia.org/wiki/Avon_(publisher)"
        ],
        "true_answer": "18 years",
        "file_name": ""
    },
    {
        "question": "In the series known as 'Predator Cities' by Philip Reeve, consider the main four novels. Taking into account major literary awards (such as the Guardian Children’s Fiction Prize or equivalent) won by any of these books in their publication year, calculate the absolute difference between the average Goodreads rating of all books that won such an award in their publication year versus those that did not. Give your answer rounded to three decimal places.",
        "task": 2,
        "task_id": 295959538196778506,
        "solution": "Step 1: List main books and publication years from Goodreads: 'Mortal Engines' (2001), 'Predator's Gold' (2003), 'Infernal Devices' (2005), 'A Darkling Plain' (2006).\nStep 2: From author website, note which books won a major literary prize in their year of publication:\n- 'A Darkling Plain' won Guardian prize in 2006 (the year of its publication), others did not win in their publication year.\nStep 3: Extract Goodreads ratings: 'Mortal Engines' (3.76), 'Predator's Gold' (3.92), 'Infernal Devices' (3.87), 'A Darkling Plain' (4.16).\nStep 4: Collect ratings:\n- Award-winner in publication year: 4.16\n- Not award-winners then: 3.76, 3.92, 3.87\nStep 5: Average award-winner = 4.16.\nStep 6: Average of others = (3.76+3.92+3.87)/3 = 3.85.\nStep 7: Absolute difference: |4.16 - 3.85| = 0.310, rounded to three decimals.",
        "urls": [
            "https://www.amazon.com/Predator-Cities-Philip-Reeve-ebook/dp/B01E8JUJEK",
            "https://philipreeve.com/",
            "https://www.goodreads.com/series/212609-mortal-engines-quartet",
            "https://en.wikipedia.org/wiki/Philip_Reeve",
            "https://en.wikipedia.org/wiki/Mortal_Engines_Quartet"
        ],
        "true_answer": 0.31,
        "file_name": ""
    },
    {
        "question": "In five landmark U.S. Supreme Court civil rights cases—Plessy v. Ferguson (1896), Brown v. Board of Education (1954), Regents v. Bakke (1978), Shelby County v. Holder (2013), and Obergefell v. Hodges (2015)—which one had the fewest dissenting justices? What was the Chief Justice's seat in 1945?",
        "task": 1,
        "task_id": 5478702955450888514,
        "solution": "1. For each case, look up the number of dissenting justices:\n   - Plessy v. Ferguson (1896): 1 dissent (Justice Harlan)\n   - Brown v. Board of Education (1954): 0 dissents (unanimous)\n   - Regents v. Bakke (1978): 4 dissents\n   - Shelby County v. Holder (2013): 4 dissents\n   - Obergefell v. Hodges (2015): 4 dissents\n2. Brown v. Board of Education (1954) has fewest dissenting justices.\n 3.The Chief Justice is Earl Warren, he was Governor of California in 1945.",
        "urls": [
            "https://www.oyez.org/cases/1850-1900/163us537",
            "https://www.oyez.org/cases/1940-1955/347us483",
            "https://www.oyez.org/cases/1979/76-811",
            "https://www.oyez.org/cases/2012/12-96",
            "https://www.oyez.org/cases/2014/14-556"
        ],
        "true_answer": "Brown v. Board of Education, Governor of California",
        "file_name": ""
    },
    {
        "question": "According to the most recent official Wyoming Secretary of State roster, what is the product of (a) the total number of unique commercial registered agents in Wyoming (as defined by state law as agents serving more than ten entities and formally registered) and (b) the number of registered agents whose names begin with the letter 'A' listed on the Delaware Division of Corporations' official government website? Provide only the resulting integer.",
        "task": 2,
        "task_id": -6663618044781803078,
        "solution": "1. Reference the Wyoming Secretary of State's commercial registered agent roster (as of May 23, 2025), which lists 375 unique commercial registered agents by entity name.\n2. Visit the Delaware Division of Corporations official registered agent's directory (https://corp.delaware.gov/agents/) and count the number of agents whose names begin with 'A'; in the latest listing, there are 16 such agents (manual count confirms this from the public registry sample).\n3. Compute the product: 375 (WY) * 16 (DE) = 6000.\nThe final answer is 6000.",
        "urls": [
            "https://sos.wyo.gov/Business/Docs/CRA-Roster.pdf",
            "https://corp.delaware.gov/agents/",
            "https://corp.delaware.gov/faqs-regarding-registered-agents/",
            "https://corpfiles.delaware.gov/agtwebreq.pdf"
        ],
        "true_answer": "6000",
        "file_name": ""
    },
    {
        "question": "There is a basketball player who was born in 1988 and was selected by the Houston Rockets in 2011. In the 2012-13 NBA season, considering every NBA game he played in April 2013 (including both regular season and playoffs), by how many standard deviations did his April 29 point total exceed his mean April 2013 points per game (rounded to three decimal places)?",
        "task": 1,
        "task_id": 8756001952300480183,
        "solution": "1. Find the player: Chandler Parsons\n 2. Find Chandler Parsons' April 29 point: 27\n 3. Gather Chandler Parsons' April 2013 game logs (regular season and playoffs): points = [29, 13, 24, 23, 9, 17, 21, 27].\n4. Calculate the mean: (29+13+24+23+9+17+21+27)/8 = 20.375\n5. Calculate the population standard deviation: sqrt(mean([(xi-mean)^2 for xi in points])) ≈ 6.460\n5. April 29, 2013: scored 27. Compute z = (27-20.375)/6.460 ≈ 1.026.\n6. Therefore, on April 29 Parsons was 1.026 standard deviations above his April average.",
        "urls": [
            "https://www.basketball-reference.com/players/p/parsoch01/gamelog/2013/",
            "https://www.basketball-reference.com/teams/HOU/2013.html",
            "https://www.espn.com/nba/player/gamelog/_/id/6460/year/2013/chandler-parsons"
        ],
        "true_answer": "1.026",
        "file_name": ""
    },
    {
        "question": "A musical inspired by Don Quixote won the Tony Award for Best Musical, and a new film adaptation of Don Quixote was released in the same decade. The youngest-born actor in the film starred in a film directed by a director born in 1898 and died in 1980. Which of these films is the oldest?",
        "task": 2,
        "task_id": 1109160082285609530,
        "solution": "Obtain the list of years for Don Quixote-inspired music/opera/ballet compositions (from Wikipedia/dissertation), and film adaptations (from the Wikipedia category page); find the decade  a newly released film (e.g., 1968: 'Don Chisciotte and Sancio Panza'). The Starring of this film are Franco Franchi and Ciccio Ingrassia. The Ciccio Ingrassia was born first. The director born in 1898 and died in 1980 is Mario Mattoli. The first film Ciccio Ingrassia starred is Appuntamento a Ischia.",
        "urls": [
            "https://en.wikipedia.org/wiki/List_of_works_influenced_by_Don_Quixote",
            "https://en.wikipedia.org/wiki/Category:Films_based_on_Don_Quixote",
            "https://en.wikipedia.org/wiki/Man_of_La_Mancha",
            "https://en.wikipedia.org/wiki/Tony_Award_for_Best_Musical",
            "https://en.wikipedia.org/wiki/Don_Chisciotte_and_Sancio_Panza"
        ],
        "true_answer": "Appuntamento a Ischia",
        "file_name": ""
    },
    {
        "question": "Among the 'lawyer-turned-preneurs' who founded legal tech startups, Doxly, LawGeex, and Kira Systems have achieved important milestones by being acquired or completing Series C funding. What is the stock price of the COO’s first job company on April 8, 2025, for the company that took the shortest time from founding to reaching these milestones?",
        "task": 2,
        "task_id": 6616970670004709468,
        "solution": "1. Obtain founding and milestone event years for each company from the main article and 3 funding/acquisition databases. Doxly: 2016 (founding) to 2019 (acquisition) = 3 years; LawGeex: 2014 (founding) to 2020 (Series C, $20M) = 6 years; Kira Systems: 2011 (founding) to 2021 (acquisition) = 10 years. \n 2. Find the COO of Doxly: D. Wayne Poole.\n 3.His first job is Ford Motor, the stock price is 8.69 on April 8, 2025.",
        "urls": [
            "https://www.crunchbase.com/organization/doxly",
            "https://www.crunchbase.com/organization/lawgeex",
            "https://www.crunchbase.com/organization/kira",
            "https://www.crunchbase.com/organization/ford/financial_details"
        ],
        "true_answer": "8.69 USD",
        "file_name": ""
    },
    {
        "question": "Among all individuals who achieved EGOT status by winning all awards between the years 2000 and 2023, what is the average number of years it took for these winners to complete their EGOT from their first qualifying award to their last? For which film did the oldest person who completed EGOT-qualifying win an Oscar?",
        "task": 2,
        "task_id": -6278680191618110477,
        "solution": "Step 1: Gather full list of EGOT winners with years of each qualifying award from PEOPLE.com, Wikipedia's EGOT table, Britannica, and Biography.com.\nStep 2: Filter for those who finished their EGOT between 2000 and 2023 (Viola Davis, Jennifer Hudson, John Legend, Robert Lopez.).\nStep 3: For each, calculate the time between first qualifying award and final (EGOT-completion) award.\nSpans (years): Davis 22, Hudson 15, Legend 12, Lopez 10.\nStep 4: Calculate the average: (22+15+12+10)/4 = 14.75 years.\nStep 5: Determine ages at completion: Oldest is Viola Davis, at age 57 (from Wikipedia and PEOPLE.com biographical info).\nStep 5: She won Oscar for Fences.",
        "urls": [
            "https://en.wikipedia.org/wiki/EGOT",
            "https://www.britannica.com/topic/list-of-EGOT-winners-2230527",
            "https://www.biography.com/celebrities/egot-winners-list"
        ],
        "true_answer": "14.75 years, Fences.",
        "file_name": ""
    },
    {
        "question": "In the late 20th and early 21st centuries, astronomers discovered increasingly larger galactic filaments and walls, eventually forming different Great Wall.  Which countries' team discovered the largest structure discovered from 1985 to 2014?",
        "task": 1,
        "task_id": -4267999443463857659,
        "solution": "Step 1: Find the size of each structure and year of discovery: Perseus–Pegasus Filament (1985, 1.0), Sloan Great Wall (2003, 1.38), Hercules–Corona Borealis Great Wall (2014, 10.0) [units: billion light-years]. Step 2: Find the information of Hercules–Corona Borealis Great Wall on wekipedia. 3. Find the team that discovered the structure: American and Hungarian astronomers led by István Horváth, Jon Hakkila and Zsolt Bagoly. The Final answer is: America and Hungary.",
        "urls": [
            "https://en.wikipedia.org/wiki/List_of_largest_cosmic_structures",
            "https://phys.org/news/2014-11-cosmic-wall-billion-light-years.html",
            "https://en.wikipedia.org/wiki/Hercules%E2%80%93Corona_Borealis_Great_Wall"
        ],
        "true_answer": "America and Hungary",
        "file_name": ""
    },
    {
        "question": "Among the world’s top 10 economies by total GDP in 1965, which country had the lowest population density, and what was its per capita GDP for that year? ( Rounded to the nearest integer)",
        "task": 3,
        "task_id": -5531367074557944465,
        "solution": "Step 1: Obtain the global top 10 countries by GDP for 1965 from https://www.kylc.com/stats/global/yearly/g_gdp/1965.html, then get their official 1960 and 1965 populations from Wikipedia (https://en.wikipedia.org/wiki/List_of_countries_by_past_and_projected_future_population). List their official land area (km²) from https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_area. Step 2: For each, compute population density = Pop1965 (in individuals) divided by area. Step 3: Identify the country with the lowest density (Canada, ~1.97 per km²). Step 4: Calculate its GDP per capita for 1965: 54,649,000,000 USD / 20,072,000 people ≈ 2,723 USD.",
        "urls": [
            "https://www.kylc.com/stats/global/yearly/g_gdp/1964.html",
            "https://en.wikipedia.org/wiki/List_of_countries_by_past_and_projected_future_population",
            "https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_area"
        ],
        "true_answer": "Canada, 2,723",
        "file_name": ""
    },
    {
        "question": "Among the following authors—William Shakespeare, Marcus Aurelius, Emily Dickinson, Henry David Thoreau, and Dante Alighieri—each of whom has at least one quote about 'fame' featured on Bookroo, is listed in the 'Fame quotes by:' sidebar of A-Z Quotes, and has at least one 'fame' quote on Goodreads, which author has the highest average number of 'likes' per fame quote for their 'fame' quotes on Goodreads? Only quotes that contain the word 'fame' or have the tag fame are considered for this calculation",
        "task": 2,
        "task_id": 1197564813551952544,
        "solution": "1. Verify that each author satisfies: (a) has at least one 'fame' quote on Bookroo, (b) is featured in the 'Fame quotes by:' sidebar on A-Z Quotes, and (c) has at least one 'fame' quote on Goodreads. 2. For each remaining author, list each 'fame' quote on Goodreads, enumerate how many 'likes' each quote has, and compute the average likes per 'fame' quote for each author. 3. The author with the highest average is the answer.",
        "urls": [
            "https://bookroo.com/quotes/fame",
            "https://www.azquotes.com/quotes/topics/fame.html",
            "https://www.goodreads.com/quotes/tag/fame",
            "https://www.goodreads.com/author/quotes/17212.Marcus_Aurelius?tag=fame",
            "https://www.goodreads.com/quotes/search?utf8=%E2%9C%93&q=Henry+David+Thoreau&commit=Search",
            "https://www.goodreads.com/quotes/search?utf8=%E2%9C%93&q=Dante+Alighieri&commit=Search"
        ],
        "true_answer": "Henry David Thoreau",
        "file_name": ""
    },
    {
        "question": "Across NBA history, compare the average league FG% for the 1980s (1980-81 through 1989-90) and for the 2010s (2010-11 through 2019-20) based on season statistics. Next, (a) identify among players who attempted at least 16 field goals per game in the 2023-24 regular season, who had the highest FG% and what was it? Finally, what is the difference between (b) the improvement in league average FG% from the 1980s to the 2010s (2010s avg minus 1980s avg)? Provide numerical values for (a) and (b). (Please keep to 3 decimal places)",
        "task": 2,
        "task_id": -9128789803514700232,
        "solution": "- Find NBA league average FG% for 1980-81 through 1989-90 and average them: values used: [0.486, 0.491, 0.485, 0.492, 0.491, 0.487, 0.48, 0.48, 0.477, 0.476]; avg = 0.4845\n- Find NBA league average FG% for 2010-11 through 2019-20 and average: values used: [0.459, 0.448, 0.453, 0.454, 0.449, 0.452, 0.457, 0.46, 0.461, 0.46]; avg = 0.4553\n- Compute improvement = 2010s avg - 1980s avg: {improvement_a}\n- Find 2023-24 regular season player with at least 16 FGA per game and highest FG%: Giannis Antetokounmpo (18.8 FGA/G, .611 FG%)\n- Find difference: leader FG% - 2010s avg: {diff_b}\n- All stats cross-referenced from basketball-reference.com league stats, 2024 per-game leaders, ESPN/NBA.com player stats.",
        "urls": [
            "https://www.basketball-reference.com/leagues/NBA_stats_per_game.html",
            "https://www.basketball-reference.com/leagues/NBA_2024_per_game.html",
            "https://www.nba.com/stats/players/traditional/",
            "https://www.basketball-reference.com/leaders/fg_pct_season.html"
        ],
        "true_answer": "(a) Giannis Antetokounmpo (0.611)\n (b) -0.029\n",
        "file_name": ""
    },
    {
        "question": "Comparing My Morning Jacket's US headline tour schedules for 2024 and 2025 as listed in their official set list archive, by how many days did the average interval between consecutive shows decrease from 2024 to 2025?",
        "task": 2,
        "task_id": 7319947580555772941,
        "solution": "1. Retrieve the full list of My Morning Jacket's headlining US tour dates for 2024 and 2025 from their official set list archive. 2. For each year, sort the show dates chronologically. 3. Calculate the interval (in days) between each consecutive show for each year. 4. Compute the mean interval between shows for 2024 and 2025 separately. 5. Subtract the 2025 mean from the 2024 mean and report the absolute difference (rounded to two decimals): Mean gap 2024 = 6.31 days; Mean gap 2025 = 1.74 days; Absolute difference = 4.57 days.",
        "urls": [
            "https://www.ticketmaster.com/my-morning-jacket-tickets/artist/841284",
            "https://www.mymorningjacket.com/",
            "https://archive.mymorningjacket.net/index.php/shows/2024-shows",
            "https://archive.mymorningjacket.net/index.php/shows/2025-shows"
        ],
        "true_answer": "4.57",
        "file_name": ""
    },
    {
        "question": "Consider two celebrities, the first Asian to win an Academy Award for Best Actress, and an Asian actor born in 1954 who directed The Fearless Hyena. For each, calculate the number of years between their first cinematic film appearance and their first Oscar (Academy Award) or Honorary Oscar. Who achieved this milestone in a shorter time span? How many Best Actress/Actor awards did the fastest person win in 2001?",
        "task": 2,
        "task_id": -7351147083314636257,
        "solution": "1. Find two celebrities: Michelle Yeoh and Jackie Chan.\n 2. Michelle Yeoh: Debut 1984 (The Owl vs Bombo), first Oscar (Best Actress) in 2023; gap = 39 years.\nJackie Chan: Debut 1962 (Big and Little Wong Tin Bar), Honorary Oscar in 2016; gap = 54 years.\n 3.Compare: Michelle Yeoh achieved this milestone faster by 15 years.",
        "urls": [
            "https://www.imdb.com/list/ls520550186/",
            "https://en.wikipedia.org/wiki/Michelle_Yeoh",
            "https://en.wikipedia.org/wiki/List_of_awards_and_nominations_received_by_Michelle_Yeoh",
            "https://en.wikipedia.org/wiki/Jackie_Chan",
            "https://en.wikipedia.org/wiki/The_Owl_vs_Bombo",
            "https://en.wikipedia.org/wiki/Big_and_Little_Wong_Tin_Bar"
        ],
        "true_answer": "Michelle Yeoh, 2",
        "file_name": ""
    },
    {
        "question": "Between the last state election before Citizens United and the first after, by what percentage did independent spending increase? Then, what was the average annual growth rate in total federal election costs across all general election cycles from 2010 through 2022? Finally, how did the percentage of Americans supporting limits on campaign finance spending change between Pew Research polls in 2018 and 2023?",
        "task": 3,
        "task_id": 8731085095048540713,
        "solution": "Find the Find Citizens United (2009). The last state election before Citizens United occurs in 2008, and the first after occurs in 2010. The independent spending  for 2008 ($83M) and 2010 ($185M); percent increase = ((185M-83M)/83M)*100 = 122.8%. Find total election costs (in $B) for 2010 and 2022: $5.2B, $9.5B  , average annual growth rate = (9.5/5.2)^(1/12) - 1 = 5.1%. Pew Research: 77% support in 2018, 72% in 2023; change = 72-77 = -5 percentage points.",
        "urls": [
            "https://dps.followthemoney.org/research/institute-reports/independent-spendings-role-in-state-elections-2006-2010",
            "https://jayapal.house.gov/2023/04/04/jayapal-introduces-constitutional-amendment-to-reverse-citizens-united/",
            "https://www.opensecrets.org/elections-overview/cost-of-election",
            "https://www.pewresearch.org/short-reads/2018/05/08/most-americans-want-to-limit-campaign-spending-say-big-donors-have-greater-political-influence/",
            "https://www.pewresearch.org/short-reads/2023/10/23/7-facts-about-americans-views-of-money-in-politics/"
        ],
        "true_answer": "122.8%, 5.1%, -5 percentage points",
        "file_name": ""
    },
    {
        "question": "For each of the three WNBA seasons from 2021 to 2023,  how many of those three seasons did the Win Shares leader's team also win the championship? Additionally, what was the average regular season Points Per Game (PPG) among these three Win Shares leaders over that period?",
        "task": 2,
        "task_id": -4941701054579673646,
        "solution": "From https://www.basketball-reference.com/wnba/years/2021_leaders.html: 2021 WS leader Jonquel Jones (CT Sun), champion: Chicago Sky. From https://www.basketball-reference.com/wnba/years/2022_leaders.html: 2022 WS leader Breanna Stewart (Seattle Storm), champion: Las Vegas Aces. From https://www.basketball-reference.com/wnba/years/2023_leaders.html: 2023 WS leader A'ja Wilson (Las Vegas Aces), champion: Las Vegas Aces. Only in 2023 did the WS leader's team also win the championship: so 1/3 seasons. Jonquel Jones's 2021 PPG: 19.4; Breanna Stewart's 2022 PPG: 21.8; A'ja Wilson's 2023 PPG: 22.8 Average = (19.4 + 21.8 + 22.8) / 3 = 64.0 / 3 = 21.33. Final answers: 1 season; 21.33 avg PPG.",
        "urls": [
            "https://www.basketball-reference.com/wnba/years/2021_leaders.html",
            "https://www.basketball-reference.com/wnba/years/2022_leaders.html",
            "https://www.basketball-reference.com/wnba/years/2023_leaders.html",
            "https://www.basketball-reference.com/wnba/players/j/jonesjo01w.html",
            "https://www.basketball-reference.com/wnba/players/w/wilsoa01w.html",
            "https://www.basketball-reference.com/wnba/players/s/stewabr01w.html"
        ],
        "true_answer": "1 season;  PPG 21.33",
        "file_name": ""
    },
    {
        "question": "By the end of 2024, what was the average tenure of the three most recent Missouri executive office commissioners (starting from acting)? Which commissioner had the shortest tenure? How long did he serve? (Answer in years)",
        "task": 3,
        "task_id": -6830127146982013180,
        "solution": "1) Find the three most recent Missouri Office of Administration commissioners: Doug Nelson, Sarah Steelman, and Ken Zellers. 2) Confirm Doug Nelson served from December, 2012  to Feb, 2017. 3) Confirm Sarah Steelman served from  February, 2017,  to October, 2021. 4) Ken Zellers started October, 2021 and served until at least December 2024. 5) Calculate each tenure: Nelson: ~4.14 years, Steelman: ~4.67 years, Zellers: ~3.75 years. 5) Average: (4.14 + 4.67 + 3.75) / 3 = 4.19 years. 6) The shortest tenure was Ken Zellers, at 3.75 years.",
        "urls": [
            "https://themissouritimes.com/sarah-steelman-new-oa-director-greitens/",
            "https://themissouritimes.com/cabinet-shakeup-knodell-to-dss-steelman-leaves-oa/",
            "https://ballotpedia.org/Missouri_state_executive_offices",
            "https://oa.mo.gov/commissioners-office",
            "https://en.wikipedia.org/wiki/Sarah_Steelman",
            "https://events.govtech.com/Missouri-Governors-Cybersecurity-Summit.html"
        ],
        "true_answer": "4.19 years, Ken Zellers, 3.75 years.",
        "file_name": ""
    },
    {
        "question": "Considering the decade-by-decade (one decade: 19x0-19x9) progression of Sesame Street, during the decade with the highest number of Emmy awards won, for the person who received the most Outstanding Performer in a Children's Series awards, how many Emmy he/she/they won?",
        "task": 2,
        "task_id": -6005058342794653354,
        "solution": "Count the number of Emmy awards won by Sesame Street each year from the Wikipedia accolades table, sum per decade and compute the average per year. From the official Sesame Workshop Mission and History timeline, count the number of distinct major milestone/events (e.g., character debuts, curriculum innovations, co-productions) per decade. The decade with the highest average Emmy awards/year is 1980s with 2.60 average per year; in that decade, 3 major milestones were introduced. Calculation details: [('1970s', 12=7+5, 1.2), ('1980s', 26=0+26, 2.6), ('1990s', 40=1+39, 4.0), ('2000s', 48=48, 4.8), ('2010s', 65=4+61, 6.5)]; 2010s, kelvin clash, 28 emmy awards (https://en.wikipedia.org/wiki/Kevin_Clash).",
        "urls": [
            "https://en.wikipedia.org/wiki/List_of_accolades_received_by_Sesame_Street",
            "https://en.wikipedia.org/wiki/Kevin_Clash"
        ],
        "true_answer": 28,
        "file_name": ""
    },
    {
        "question": "Between June 2024 and June 2025, for the fighting game released by Capcom on June, 2023, how much higher was the average monthly peak concurrent Steam player count during months when its lowest Steam price dropped below $59.99 compared to months with no such discount, and what was the average positive metacritic reviews of PlayStation 5 platform during those undiscounted months? Give both numbers rounded to nearest integer.",
        "task": 2,
        "task_id": -5614297645027666010,
        "solution": "1. Extracted release date and publisher from the Steam Store page. 2. Used Steambase monthly data for Street Fighter 6 for June 2024–June 2025. 3. Labeled months with lowest price < $59.99 as 'discount' (June 2025, May 2025, ... June 2024, except August/2024). 4. For these months, calculated the average peak concurrent player count (40,878), and for non-discount months (August 2024), average is 30,312. Difference: 10,566. For undiscounted month, calculated positive reviews (from metacritic : 3).",
        "urls": [
            "https://steambase.io/games/street-fighter-6/steam-charts",
            "https://store.steampowered.com/app/1364780/Street_Fighter_6/",
            "https://www.metacritic.com/game/pc/street-fighter-6"
        ],
        "true_answer": "10566 (peak players difference), 3 (average monthly positive reviews)",
        "file_name": ""
    },
    {
        "question": "Which of the two major parties in the U.S. House of Representatives gained more seats during the 43rd President of the United States' tenure as Governor of Texas and President of the United States?",
        "task": 2,
        "task_id": -4719413541655118013,
        "solution": "1. The 43rd President of the United States is George Walker Bush. 2. Bush served as governor of Texas from 1995 to 2000. 3. Bush served as 43rd President of the United States from 2001-2009. 4. The seats of Democrats is 204, 206, 211, 212, 205, 202, 233, 257.  5. The seats of Republicans is 230, 228, 223, 221, 229, 232, 202, 178. 5. The seats gain of Democrats is 53, Republicans is -52. Democrats has more seats gain.",
        "urls": [
            "https://en.wikipedia.org/wiki/George_W._Bush",
            "https://en.wikipedia.org/wiki/2002_United_States_House_of_Representatives_elections",
            "https://en.wikipedia.org/wiki/List_of_United_States_House_of_Representatives_elections"
        ],
        "true_answer": "Democrats",
        "file_name": ""
    },
    {
        "question": "Between its founding in 2017 and the end of 2024, what is the average number of new academic degree programs and major research or industry collaborations that CMKL University has launched each year? Note: Count each newly launched bachelor's, master's, or doctoral program as well as each new major research, industry, or institutional partnership as a single initiative.",
        "task": 1,
        "task_id": -7646456443593272675,
        "solution": "From the official CMKL 'About Us' (www.cmkl.ac.th/about-us), count all new academic programs (ECE graduate, TCI graduate, AICE undergraduate/graduate) and all major milestones involving research/industry partnerships per year (as detailed in the published year-by-year timeline: ThaiBev, CMU, CPEF (2017), ETDA (2018), SEC, ECE (2019), Infrastructure platform, Friends in Need (PA), PMU-C grant, TCI (2020), PTTEP, Newton Sixthform, ETDA, MOA (2021), Bumrungrad, AIEI Institute (2022), Bangchak/KMCH, University of Nevada (2024)). This gives 18 initiatives. Divide by the 7-year interval 2017–2024: 18/7 = 2.57 (rounded to two decimal places). Sources: CMKL University About Us page (timeline/milestones/quick facts), Wikipedia (program launch years, first graduating cohort 2022), CMU partnership page (collaboration scope/structure).",
        "urls": [
            "https://www.cmkl.ac.th/about-us",
            "https://en.wikipedia.org/wiki/CMKL_University",
            "https://engineering.cmu.edu/thailand/about/index.html"
        ],
        "true_answer": 2.57,
        "file_name": ""
    },
    {
        "question": "In the UK general elections between 2010 and 2019, voter turnout in Stratford-upon-Avon was consistently higher than the national average. In which year did Stratford-upon-Avon's voter turnout exceed the national turnout by the most, and by how many percentage points?",
        "task": 2,
        "task_id": -7181745943516339211,
        "solution": "1. Find the UK general elections between 2010 and 2019: 2010, 2015, 2017, 2019. 2.Find the turnout for Stratford-on-Avon in 2010 (72.7%), 2015 (72.2%), 2017 (73.8%), 2019 (74.4%) from local constituency results. 3. gather national turnout for those years: 2010 (65.1%), 2015 (66.2%), 2017 (68.8%), 2019 (67.3%). 4.Subtract the national from the constituency turnout for each year to obtain the margin: 2010: 7.6, 2015: 6.0, 2017: 5.0, 2019: 7.1. The largest margin is in 2010, at 7.6 percentage points.",
        "urls": [
            "https://electionresults.parliament.uk/elections/1854",
            "https://electionresults.parliament.uk/elections/1204",
            "https://electionresults.parliament.uk/elections/554",
            "http://news.bbc.co.uk/1/shared/election2010/results/constituency/e52.stm",
            "http://news.bbc.co.uk/2/shared/election2010/results/"
        ],
        "true_answer": "2010 (7.6 percentage points)",
        "file_name": ""
    },
    {
        "question": "In 1980s, quasicrystals exhibiting new (non-crystallographically allowed) rotational symmetry orders were discovered beginning with icosahedral, then decagonal, dodecagonal, and octagonal. What was the average number of years between the discoveries of new forbidden symmetry orders in quasicrystals during this period? (no need to consider the month, i.e., 1991-1990 = 1) Additionally, what was the shortest such interval and in which years did it occur?",
        "task": 2,
        "task_id": -6769578901304547488,
        "solution": "Extract forbidden symmetry orders and the year of first discovery for quasicrystals: icosahedral (5-fold, 1984), decagonal (10-fold, 1985), dodecagonal (12-fold, 1985), octagonal (8-fold, 1987); compute intervals: 1985-1984=1, 1985-1985=0, 1987-1985=2 years; average=(1+0+2)/3=1.0 years; the shortest interval is 0 years, in 1985.",
        "urls": [
            "https://en.wikipedia.org/wiki/Crystallographic_restriction_theorem",
            "https://www.nature.com/articles/s41597-024-04043-z",
            "https://www.nobelprize.org/prizes/chemistry/2011/shechtman/facts/",
            "https://www.nobelprize.org/uploads/2018/06/advanced-chemistryprize2011.pdf"
        ],
        "true_answer": "1 years, 0 years, in 1985.",
        "file_name": ""
    },
    {
        "question": "Between 1970 and 1999, in how many different decades did Susan Glaspell's play 'Trifles' appear in an edition of 'Images of Women in Literature', and what was the average interval in years between these editions?",
        "task": 2,
        "task_id": -4768511751786355377,
        "solution": "1) Find the editions of 'Images of Women in Literature' between 1970 and 1999, which is 5. 1970. 2) Use academic and bibliographical sources to establish that four editions including either 'Trifles' or its short story adaptation. 3) Span: 1973 –1991. 4) Thus, 'Trifles' appeared in anthologies in the 1970s, 1980s, and 1990s—i.e., three decades. 5) Average interval: (1991–1973)/(5-1) = 18/4 = 4.5 years. Final answers: 3 decades; average interval 4.5 years.",
        "urls": [
            "https://www.unm.edu/~unmvclib/cascade/handouts/conclusionexercisearticle.pdf",
            "https://en.wikipedia.org/wiki/Trifles_(play)",
            "https://www.historymatterscelebratingwomensplaysofthepast.org/plays/view/Trifles",
            "https://worldcat.org/"
        ],
        "true_answer": "3 decades; 4.5 years",
        "file_name": ""
    },
    {
        "question": "In the UK, 20-year-old Oscar Cregan was sentenced for supplying Class A drugs. Based on the sentencing figures reported in the statistics published by the Youth Justice Board for the periods 2020-2021, 2021-2022, and 2022-2023, what is the largest numerical difference between Oscar Cregan's prison sentence and the average custodial prison sentence for young offenders convicted of an indictable offence in which period (e.g., 2020-2021)? Give a positive answer in months.",
        "task": 2,
        "task_id": 978103427342348220,
        "solution": "Oscar Cregan's sentence was 3 years (36 months; Daily Mail, Jan 2023). Recent statistics and news reports from the Ministry of Justice show that the average immediate prison sentence for juvenile offenders in 2020-2021 is about 17.4 months, 2021-2022 is about 24.1 months, and in 2022-2023 it is 20.5 months. So the largest difference occurs in 2020-2021, which is 36-17.4 = 18.6 months.",
        "urls": [
            "https://www.inkl.com/news/nephew-of-cop-killer-dale-cregan-jailed-after-being-unmasked-as-drug-pusher-bad-man",
            "https://www.dailymail.co.uk/news/article-11665833/Cop-killer-Dale-Cregans-cocaine-dealing-nephew-jailed-three-years.html",
            "https://www.gov.uk/government/statistics/youth-justice-statistics-2022-to-2023/youth-justice-statistics-2022-to-2023-accessible-version",
            "https://www.gov.uk/government/statistics/youth-justice-statistics-2021-to-2022/youth-justice-statistics-2021-to-2022-accessible-version",
            "https://www.sentencingcouncil.org.uk/research-and-resources/data-collections/offence-specific-data-collections/drug-offences/",
            "https://www.gov.uk/government/statistics/youth-justice-statistics-2020-to-2021/youth-justice-statistics-2020-to-2021-accessible-version",
            "https://researchbriefings.files.parliament.uk/documents/CBP-9039/CBP-9039.pdf"
        ],
        "true_answer": "18.6",
        "file_name": ""
    },
    {
        "question": "Which Gretsch brand resonator guitar, focused on bluegrass, has a square neck design and active pickups? Based on online price history data, which guitar has a higher average resale value growth rate on the used market in good condition between 2018 and 2023: this Gretsch guitar or the Fender FR-50 (Blues, Round Neck)?",
        "task": 2,
        "task_id": 3797694434056195550,
        "solution": "1. Find the guitar: Gretsch G9230. \n2. From acousticmusic.org: Gretsch G9230 (square neck) bluegrass; Fender FR-50 (round neck) blues.\n3. From Reverb.com Price Guide, average price of Gretsch G9230 in excellent condition: 3171 HKD in 2018, 4012 HKD in 2023; Fender FR-50: 2069 HKD in 2018, 2637 HKD in 2023.\n4. Calculate the growth rate: Gretsch: (4012/3171)-1 ≈ 0.265. Fender: (2637/2069)-1 ≈ 0.275. \n5. Fender appreciated more during this period.",
        "urls": [
            "https://reverb.com/p/gretsch-g9230-bobtail-square-neck-resonator",
            "https://reverb.com/uk/p/fender-fr-50-resonator-acoustic-guitar-sunburst",
            "https://reverb.com/price-guide"
        ],
        "true_answer": "Gretsch G9230, Fender FR-50.",
        "file_name": ""
    },
    {
        "question": "During Cliven Bundy's 21st century standoff with the federal government, a ruling was issued by a judge born in 1930, permanently barring Bundy and his cattle herd from illegally trespassing on new lands. Which president appointed the judge who signed that ruling? How many other district judges were appointed the same year he was appointed by that president?",
        "task": 2,
        "task_id": -8860460174511584349,
        "solution": "Step 1: Find the judge signed this ruling: Lloyd Dee George. Step 2: The president Ronald Reagan appointed Lloyd Dee George, at 1984. Step 3: Find the number of district judges(except Lloyd Dee George) appointed by Ronald Reagan at 1984, 33.",
        "urls": [
            "https://en.wikipedia.org/wiki/Bundy_standoff",
            "https://en.wikipedia.org/wiki/Lloyd_D._George",
            "https://en.wikipedia.org/wiki/List_of_federal_judges_appointed_by_Ronald_Reagan"
        ],
        "true_answer": "33",
        "file_name": ""
    },
    {
        "question": "Who was voted 'the most beautiful woman' by the American edition of L'Officiel (March 2024) and 'the most beautiful woman in the world' by People magazine? Who won the Oscar for Best Actress the year she was nominated for an Oscar?",
        "task": 2,
        "task_id": -5527121688903077351,
        "solution": "Step 1: Extract all names listed in the 2024 L'Officiel USA 'Most Beautiful Women' article. Names identified include Grace Kelly, Bella Hadid, Marilyn Monroe, Audrey Hepburn, Gisele Bündchen, Angelina Jolie, Elizabeth Taylor, Zendaya, Brigitte Bardot, Naomi Campbell, Margot Robbie, Whitney Houston, Candice Swanepoel, Brooke Shields, Charlize Theron.\nStep 2: For each, determine if she has *topped* People's Magazine 'Most Beautiful Woman' list by checking the official People list—IMDB's edition confirms only Angelina Jolie (2006).\nStep 3: She was nominated for an Oscar in 2008. \nStep 4: In this year, Kate Winslet won the Oscar Best Actress.",
        "urls": [
            "https://www.lofficielusa.com/beauty/most-beautiful-women-in-the-world-ranking",
            "https://www.imdb.com/list/ls069593491/",
            "https://en.wikipedia.org/wiki/Academy_Award_for_Best_Actress"
        ],
        "true_answer": "Angelina Jolie, Kate Winslet",
        "file_name": ""
    },
    {
        "question": "During the weeks from episode 6 to episode 11 of the Oshi no Ko anime, what was the average weekly singles overall ranking of YOASOBI’s “Idol” on the Oricon Combined Singles Chart? Consider only the weeks corresponding to these episodes' broadcasts.",
        "task": 3,
        "task_id": 5456913721608482742,
        "solution": "1. Find the period (Episodes 6-11, aired May-June 2023) using Fandom Wiki episode list.\n2. Using Wikipedia's Idol (Yoasobi song) and Oricon chart data, list ‘Idol’ weekly Oricon positions for those dates: [1, 2, 3, 2, 2, 2, 2] for weeks May 17-June 28, average the values from episodes 6–11.\n3. Compute the average: mean([1, 2, 3, 2, 2, 2, 2]) = 2.\n4. Final answer: 2.",
        "urls": [
            "https://oshinoko.fandom.com/wiki/Oshi_no_Ko_Wiki",
            "https://en.wikipedia.org/wiki/Idol_(Yoasobi_song)",
            "https://www.oricon.co.jp/rank/",
            "https://oshinoko.fandom.com/wiki/Episode_11",
            "https://oshinoko.fandom.com/wiki/Episode_6"
        ],
        "true_answer": 2,
        "file_name": ""
    },
    {
        "question": "Game X is the best game of TGA 2023. According to SteamDB, its all-time player peak occurred shortly after release. Using monthly average player percentage change of the first six full months after-release (exclude the release month), and the next six months, calculate the difference in the average monthly percentage gain between these two periods and keep two decimal points (using positive numbers). How many PC critic reviews does Game X have according to Metacritic on the month with the highest peak players before Aug 2024 (exclude the release month)?",
        "task": 3,
        "task_id": 6085618918861309359,
        "solution": "1. Reference SteamDB (https://steamdb.info/app/1086940/graphs/) to confirm Baldur's Gate 3 all-time peak occurred August 2023, so analyze first six full months after launch for average player change.\n2. SteamCharts (https://steamcharts.com/app/1086940):\n   Sep 2023: -39.67%, Oct: -45.57%, Nov: -32.14%, Dec: +31.14%, Jan: +21.56%, Feb: -35.09%\n   Mar 2024: -18.67%, Apr: -20.35%, May: -12.05%, Jun: -0.86%, Jul: +13.53%, Aug: +0.01%\n   -> Average (1st6) = (-39.67-45.57-32.14+31.14+21.56-35.09)/6 = -16.63\n   -> Average (2nd6) = (-18.67-20.35-12.05-0.86+13.53+0.01)/6 = -6.40\n   -> Difference = -6.40 - (-16.63) = 10.23\n 3. Find the peak player month between September 2023 and August 2024 on SteamDB: September 2023. \n 4. Metacritic (https://www.metacritic.com/game/baldurs-gate-3/critic-reviews/?platform=pc) has 22 critic reviews (2024: 2, 2023: sep: 13, oct:4, Nov: 3).",
        "urls": [
            "https://steamcharts.com/app/1086940",
            "https://steamdb.info/app/1086940/graphs/",
            "https://www.metacritic.com/game/pc/baldurs-gate-3",
            "https://store.steampowered.com/app/1086940/Baldurs_Gate_3/"
        ],
        "true_answer": "10.23; 13 reviews",
        "file_name": ""
    },
    {
        "question": "Between 2013 and 2023, which country—Antigua and Barbuda or St. Lucia—had the higher average annual growth rate (CAGR) in GDP per capita, using the data from worldbank, and what was that growth rate (as a percentage, rounded to 6 decimals)?",
        "task": 3,
        "task_id": 8182858355601359072,
        "solution": "Antigua and Barbuda, 2013: 15051.5113587425, 2023: 21494.5474000727. St. Lucia, 2013: 9577.21733754358, 2023: 13554.7684784933. Antigua and Barbuda average annual growth rate: 3.627452%. St. Lucia average annual growth rate: 3.534544%. Antigua and Barbuda had the higher average annual growth rate: 3.627452%",
        "urls": [
            "https://www.worldometers.info/world-population/antigua-and-barbuda-population/"
        ],
        "true_answer": "Antigua and Barbuda, 3.627452%",
        "file_name": ""
    },
    {
        "question": "The song X marked its place as the first-ever number one on the Billboard Juke Box Folk Song Records chart in year Y. Focusing on year Y, calculate the average number of the longest consecutive weeks spent at number one by each song that reached the top of this weekly chart during that year. Compare the chart-topping run of X to this average: was it above or below average, and by how many weeks (rounded to two decimal places)? For the song that reached number one for six nonconsecutive weeks, which country scored the most goals in the year when the FIFA World Cup was won by the country of the producer's mother's father's origin?",
        "task": 2,
        "task_id": -1336386462168380302,
        "solution": "Collate all songs that reached #1 on the Billboard Juke Box Folk Song Records chart in 1944, listing their respective consecutive weeks at #1: [Pistol Packin' Mama: 7, Ration Blues: 2, Rosalita: 1, They Took the Stars Out of Heaven: 1, So Long Pal: 6, Too Late to Worry: 1, Straighten Up and Fly Right: 3, Is You Is or Is You Ain't My Baby: 5, Soldier's Last Letter: 4, Smoke on the Water: 13, I'm Wastin' My Tears on You:2]. Compute the mean: sum=52, count=11, so average=4.27 weeks. Pistol Packin' Mama (7 weeks) is 2.27 weeks above this average. 7 - 4.27 = 2.73",
        "urls": [
            "https://en.wikipedia.org/wiki/Timeline_of_Billboard_number-one_country_songs",
            "https://en.wikipedia.org/wiki/List_of_Most_Played_Juke_Box_Folk_Records_number_ones_of_1944#Chart_history",
            "https://en.wikipedia.org/wiki/Straighten_Up_and_Fly_Right",
            "https://en.wikipedia.org/wiki/Pistol_Packin%27_Mama",
            "https://en.wikipedia.org/wiki/Johnny_Mercer",
            "https://en.wikipedia.org/wiki/Croatia_national_football_team",
            "https://en.wikipedia.org/wiki/2018_FIFA_World_Cup#Goalscorers"
        ],
        "true_answer": "7 weeks; above, 2.91; Belgium",
        "file_name": ""
    },
    {
        "question": "At the Kentucky facility that destroyed the last declared U.S. chemical weapon, each agent type was eliminated in its own 'campaign,' with these campaigns: 1. 8-inch GB (sarin) projectiles, 2. 155mm VX projectiles, 3. 155mm mustard projectiles, 4. M55 VX rockets, and 5. M55 GB rockets. Using the public campaign completion dates, what was the shortest interval in days between the completion of any two consecutive full-agent destruction campaigns? For these two campaigns, regarding the munitions corresponding to the larger-indexed campaign in the question, they were divided into multiple batches. What is the 'sharing scores' of the lot number of the batch that accounts for the largest proportion?",
        "task": 3,
        "task_id": 8329839210617716578,
        "solution": "Gathered campaign completion dates:\n(1) 8-inch GB projectiles: 2020-05-15\n(2) 155mm VX projectiles: 2021-05-28\n(3) 155mm mustard projectiles: 2021-09-04\n(4) M55 VX rockets: 2022-04-19\n(5) M55 GB rockets: 2023-07-07\nCalculated consecutive intervals:\n(1->2): (8-inch GB (sarin) projectiles) to (155mm VX projectiles) = 378 days\n(2->3): (155mm VX projectiles) to (155mm mustard projectiles) = 99 days\n(3->4): (155mm mustard projectiles) to (M55 VX rockets) = 227 days\n(4->5): (M55 VX rockets) to (M55 GB rockets) = 444 days\nShortest: 99 days, between (155mm VX projectiles) and (155mm mustard projectiles). In the file https://web.archive.org/web/20130928060115/https://www.peoacwa.army.mil/wp-content/uploads/X-ray_assessment_report_Final.pdf, page 1-2, BGD-655-5, 12 percent; BGD-655-7 58 percent; BGD-655-9 30 percent. Sharing score of batch with largest proportion, BGD-655-7, is 55%",
        "urls": [
            "https://en.wikipedia.org/wiki/Blue_Grass_Chemical_Agent-Destruction_Pilot_Plant",
            "https://web.archive.org/web/20130928060115/https://www.peoacwa.army.mil/wp-content/uploads/X-ray_assessment_report_Final.pdf"
        ],
        "true_answer": "99 days, 55%",
        "file_name": ""
    },
    {
        "question": "Using the official Oregon Health Authority Death with Dignity Act data or the summary provided in the Wikipedia, Calculate the average of all deaths up to 2023 (rely on the numbers as of 2025, rounded to two decimals).",
        "task": 3,
        "task_id": -1159192116164796374,
        "solution": "1. Find the data from https://www.oregon.gov/oha/PH/PROVIDERPARTNERRESOURCES/EVALUATIONRESEARCH/DEATHWITHDIGNITYACT/Documents/year27.pdf . Final answer is 15.64% by [16, 27, 27, 21, 38, 42, 37, 38, 46, 49, 60, 59, 65, 71, 85, 73, 105, 135, 139, 158, 178, 193, 259, 255, 305, 386]",
        "urls": [
            "https://www.oregon.gov/oha/PH/PROVIDERPARTNERRESOURCES/EVALUATIONRESEARCH/DEATHWITHDIGNITYACT/Documents/year27.pdf"
        ],
        "true_answer": "110.27",
        "file_name": ""
    },
    {
        "question": "Please collect China's annual GDP growth rate from 2014 to 2023 using World Bank data (rounded to 8 decimal places) and the urbanization rate for each year from Statista. First, calculate and provide the standard deviation of China's GDP growth rate over this period (rounded to 8 decimal places); then, calculate the Pearson correlation coefficient between these two annual series for the same period (rounded to two decimal places).",
        "task": 3,
        "task_id": 3787156049704823915,
        "solution": "Step 1: From the World Bank, extract China's full-year GDP growth rates (%) (The website only displays low-precision data, so download and extract the original data.), 2014–2023: 7.46190028, 6.97878003, 6.77555563, 6.89126636, 6.75671800, 6.06850235, 2.3401884, 8.57008513, 3.13418887, 5.41484330. Step 2: From Statista, find China's annual urbanization rates (%) for 2014–2023: 55.75, 57.33, 58.84, 60.24, 59.6, 61.50, 62.71, 63.89, 64.72, 65.22, 66.16. Step 3: The computed value is approximately -0.48. The std is 1.93240758.",
        "urls": [
            "https://api.worldbank.org/v2/en/indicator/NY.GDP.MKTP.KD.ZG?downloadformat=csv",
            "https://data.worldbank.org/indicator/NY.GDP.MKTP.KD.ZG?locations=CN",
            "https://data.worldbank.org/country/china",
            "https://www.statista.com/statistics/270162/urbanization-in-china/"
        ],
        "true_answer": "1.93240758, -0.48",
        "file_name": ""
    },
    {
        "question": "For the four cities that appear in the experimental results of the IEEE Transactions on Intelligent Transportation Systems 2025 CABRA agent paper: A Cost-Aware Adaptive Bike Repositioning Agent Using Deep Reinforcement Learning,  use the following information: (1) the CABRA operator cost for each city as reported in the paper;  (2)  official 2024 population from world populatoin review of each city mentioned in the paper ; compute the CABRA cost divided by ([population]/100,000), and identify which city achieves the lowest such value, rounding to 5 decimal places. What is the city and that value?",
        "task": 3,
        "task_id": 8079205975924119395,
        "solution": "Step 1: Extract CABRA cost per city from the paper Table IV: Dublin 1731.45, London 7249.01, Paris 9408.44, New York 64.57.\nStep 2: Gather the latest (2024/2025) population for each city:\n- Dublin: 1,284,550\n- London: 9,748,030\n- Paris: 11,276,700\n- New York: 8,097,282\nStep 3: For each city, compute the following:\ncost / ((population / 100,000))\n- Dublin: 1731.45 / (12.8455 )  ≈ 134.79039352\n- London: 7249.01 / 97.4803 ≈ 74.36384582\n- Paris: 9408.44 / 112.767 ≈ 83.43256\n- New York: 64.57 / 80.97282 ≈ 0.79742808 (rounded to 5 decimals: 0.00032)\nThus, the answer to three decimals is: New York, 0.79743.",
        "urls": [
            "https://worldpopulationreview.com/cities",
            "https://www.mircomusolesi.org/papers/tist25_cabra.pdf",
            "https://ieeexplore.ieee.org/document/10877698/"
        ],
        "true_answer": "New York, 0.79743",
        "file_name": ""
    },
    {
        "question": "Among the asstronots, (1) the first foreign national to travel on a NASA mission, (2) first saw the world in Almelo, but it was Groningen that shaped him - the city he proudly called home, and (3) the first astronaut from Switzerland considering only missions launched up to the end of 1994, for the astronaut had the longest average mission duration, how many times his wikipedia webpage edited in 2018?",
        "task": 2,
        "task_id": -6580848266662649313,
        "solution": "Step 1: Identify all missions of Ulf Merbold (STS-9: 10d 07h 47m; STS-42: 8d 01h 14m; Soyuz TM-20/TM-19: 31d 12h 35m).\nStep 2: Identify all missions of Wubbo Ockels (STS-61A: 7d 00h 44m).\nStep 3: Identify all missions of Claude Nicollier up to end of 1994 (STS-46: 7d 23h 15m; STS-61: 10d 19h 58m).\nStep 4: Convert durations to minutes, calculate average for each astronaut up to the cut-off date.\n- Merbold: (10d7h47m + 8d1h14m + 31d12h35m)/3 ≈ 16d 15h 12m average\n- Ockels: one flight = 7d 0h 44m\n- Nicollier: (7d23h15m + 10d19h58m)/2 ≈ 9d 9h 36m average\nStep 5: Compare averages. Ulf Merbold has the longest average mission duration.",
        "urls": [
            "http://www.spacefacts.de/bios/international/english/merbold_ulf.htm",
            "http://www.spacefacts.de/bios/international/english/ockels_wubbo.htm",
            "http://www.spacefacts.de/bios/international/english/nicollier_claude.htm"
        ],
        "true_answer": "5",
        "file_name": ""
    },
    {
        "question": "Between 2014 and 2024, considering the World Snooker Championship only, which of the following players—Mark Selby, Judd Trump, or Ronnie O'Sullivan—had the smallest average year-to-year position in their finishing position (assigning 1 for champion, 2 for runner-up, 3 for semifinalist, 4 for others, 5 for non participation.)? For this person, what is his average score and how many champions he won?",
        "task": 2,
        "task_id": -3248372649086417687,
        "solution": "Mark Selby: 1 + 4 + 1 + 1 + 4 + 4 + 3 + 1 + 4 + 2 + 4; Judd Trump: 4 + 3 + 4 + 4 + 4 + 1 + 4 + 4 + 2 + 4 + 4; Ronnie O'Sullivan: 2 + 4 + 4 + 4 + 4 + 4 + 1 + 4 + 1 + 4 + 4",
        "urls": [
            "https://en.wikipedia.org/wiki/List_of_world_snooker_champions",
            "https://en.wikipedia.org/wiki/2014_World_Snooker_Championship",
            "https://en.wikipedia.org/wiki/2015_World_Snooker_Championship",
            "https://en.wikipedia.org/wiki/2016_World_Snooker_Championship",
            "https://en.wikipedia.org/wiki/2017_World_Snooker_Championship",
            "https://en.wikipedia.org/wiki/2018_World_Snooker_Championship",
            "https://en.wikipedia.org/wiki/2019_World_Snooker_Championship",
            "https://en.wikipedia.org/wiki/2020_World_Snooker_Championship",
            "https://en.wikipedia.org/wiki/2021_World_Snooker_Championship",
            "https://en.wikipedia.org/wiki/2022_World_Snooker_Championship",
            "https://en.wikipedia.org/wiki/2023_World_Snooker_Championship",
            "https://en.wikipedia.org/wiki/2024_World_Snooker_Championship"
        ],
        "true_answer": "Mark Selby, 2.636, 4",
        "file_name": ""
    },
    {
        "question": "After one game's update introducing the Embrion and Old Bird, calculate the compound monthly CAGR of its average players from the update's release month until (but excluding) the next major patch's month, using steambase.io data. For the same period, calculate the CAGR of another game, which is a 4 player online co-op psychological horror game released in 2020. Present both game name and number as 'game: ±XY.ZT%'.",
        "task": 3,
        "task_id": -6955585067003244434,
        "solution": "1. Find when Lethal Company's Hopping Update released: April 13, 2024 (Fandom Wiki). The next major patch is v60, on August 17, 2024.\n2. Get player gains/losses for April, May, June 2024 for both games on steambase.io:\n   Lethal Company: April (-12.13%), May (-44.44%), June (-9.54%); july(+18.44%) compound growth rate: -47.74%​\n   Phasmophobia: April (-13.24%), May (-11.48%), June (+22.58%), july(+34.11%); compound growth rate: +26.19%\n3. Compare which game had a smaller loss/higher retention. Answer: Phasmophobia (47.74% loss vs. Lethal Company +26.19% gain).",
        "urls": [
            "https://steambase.io/games/phasmophobia/steam-charts",
            "https://store.steampowered.com/app/1966720/",
            "https://lethal-company.fandom.com/wiki/Version_History",
            "https://steambase.io/games/lethal-company/steam-charts"
        ],
        "true_answer": "Lethal Company: -47.69%, Phasmophobia: +26.25%",
        "file_name": ""
    },
    {
        "question": "For the game published by Microsoft Studios in 2018, which was released on 2018 for Windows and Xbox One after being announced at Xbox's E3 2018 conference, and its sequel was released on November 2021. Calculate the standard deviation of the average concurrent players for each of the following two periods (according to steamcharts.com): (1) the six months surrounding its Steam delisting (including that month, and 3 previous, 2 after), and (2) the first six full months following its Steam release. Which period showed greater variability in average player numbers, and what are the respective standard deviations (rounded to one decimal place)?",
        "task": 2,
        "task_id": -6672881298210518974,
        "solution": "Step 1: Determine the Steam release date (March 10, 2021) and delisting date (Dec 15, 2024) from Wikipedia/Steam store. Step 2: From https://steamcharts.com/app/1293830, find the average concurrent players for each month. For period 1 (surrounding delisting): Sep 2024–Feb 2025 values are [13,751.0, 14,042.4, 11,415.6, 16,640.5, 8,709.5, 6,398.6]. For period 2 (launch months): Apr 2021–Sep 2021 values are [10623.7, 12,156.2, 7,695.5, 9,072.1, 15,263.8, 8,020.4]. Step 3: Calculate population standard deviation for both sets: SD1 = 3,771.9, SD2 = 2881.0 (rounded to 1 decimal place). Comparison: SD1 > SD2, so the delisting period showed greater variability.",
        "urls": [
            "https://steamcharts.com/app/1293830",
            "https://en.wikipedia.org/wiki/Forza_Horizon_4"
        ],
        "true_answer": "delisting period,3771.9, 2881.0",
        "file_name": ""
    },
    {
        "question": "According to the ICLR 2025 paper accepted as poster with two authors from National University of Singapore, I found that a method improved by 11.22% on ResNet-50 when fully tuned compared to the original local loss. According to the code provided by the authors, how should I set the parameters? (including epochs, learning rate, horizon, stride, in a json format.)",
        "task": 3,
        "task_id": 4609790701292509821,
        "solution": "1. Find original Forward-Forward test accuracy on CIFAR-100/ResNet-50 (external source, e.g., Adnan Masood's Medium article and AAAI 2024 paper): 52.0% \n2. Find BP-modified improvement for Forward-Forward from ICLR 2025 paper (OpenReview): +20.5 percentage points. \n3. Compute BP-modified accuracy: 52.0 + 20.5 = 72.5% \n4. Standard BP (ResNet-50) top-1 accuracy on CIFAR-100 from Papers With Code/arXiv: 86.9%. \n5. Final answer: 72.5 (rounded to one decimal place).",
        "urls": [
            "https://openreview.net/forum?id=MtW30ql5Oj"
        ],
        "true_answer": {
            "horizon": 5,
            "stride": 1,
            "epochs": 30,
            "learning_rate": 0.001
        },
        "file_name": ""
    },
    {
        "question": "According to the World Population Review, how many cities among the top 100 most populous cities in 2025 have experienced a population decrease compared to 2024? Which city has the highest rate of population decrease, and what is its area (km^2)?",
        "task": 3,
        "task_id": 4258858540567988842,
        "solution": "5, Fukuoka: -0.22%, Nagoya: -0.23%, Tokyo: -0.21%, New York: -1.98%, Osaka: -0.24%.",
        "urls": [
            "https://worldpopulationreview.com/cities",
            "https://en.wikipedia.org/wiki/New_York_City"
        ],
        "true_answer": "5, New York City, 778.2 km^2",
        "file_name": ""
    },
    {
        "question": "Based on all airline bombing attacks worldwide in the 1970s and 1980s from wikipedia, for the attacks caused the most causilities in each of the two decases, what is the difference between the total airframe hours of the two planes? (rounded to the nearest integer) According to the accident report of that one in 1970s, what are the occupation of the experts who analysed the cost to recover the wreckage?",
        "task": 2,
        "task_id": -1540171536404479059,
        "solution": "1970s: 20 incidents, 8 September 1974, 88 deaths, 21733 hrs flight; 1980s: 14 incidents, 23 June 1985, 331 deaths, 23634 hrs. In AAR75-07.pdf, page 11, two british experts exhibited the proofs that the uncertain contribution of any part of the recovered wreckage would not justify the high cost to recover it. Page 15 mentioned that the two explosive experts are a Chemist and a Metallurgist",
        "urls": [
            "https://en.wikipedia.org/wiki/Timeline_of_airliner_bombing_attacks",
            "https://asn.flightsafety.org/wikibase/329831",
            "https://asn.flightsafety.org/wikibase/327181",
            "https://libraryonline.erau.edu/online-full-text/ntsb/aircraft-accident-reports/AAR75-07.pdf"
        ],
        "true_answer": "1901; Chemist, Metallurgist",
        "file_name": ""
    },
    {
        "question": "From the initial Early Access release of a a 3D multiplayer third-person shooter video game with rogue-like elements, through the most recent patch closest with the second DLC release date, about improves survivors from the second major expansion DLC, what is the average number of new playable survivors introduced per year (the year should be rounded to one decimal while calculation)?",
        "task": 2,
        "task_id": -7706171843466485965,
        "solution": "Step 1: List each new survivor introduced on PC after Early Access launch (using the official version history and patch notes; exclude launch cast, variants, or reworks): 2019: REX (Jun), Loader (Sep), Acrid (Dec); 2020: Captain (Aug full release); 2021: Bandit (Mar, Anniversary Update); 2022: Railgunner, Void Fiend (Mar, Survivors of the Void DLC); 2024: Seeker, False Son, Chef (Aug, Seekers of the Storm DLC). Total = 10 survivors. Step 2: Calculate total years: March 28, 2019 to May 5, 2025 is 6.1 years; use full years for statistical stability (6 years: Mar 28, 2019 – Mar 28, 2025). Step 3: Average: 10 survivors / 6 years = 1.67 (rounded to two decimals). These are verified against the version history wiki and Wikipedia timeline.",
        "urls": [
            "https://riskofrain2.fandom.com/wiki/Version_History",
            "https://en.wikipedia.org/wiki/Risk_of_Rain_2",
            "https://store.steampowered.com/app/632360/Risk_of_Rain_2/"
        ],
        "true_answer": "1.64",
        "file_name": ""
    },
    {
        "question": "On the same day that the a landmark house on South Main Street in Coeymans Landing, new york, rich with local history, built in later 1830s, officially entered the National Register of Historic Places listing, how many places entered the list totally?",
        "task": 2,
        "task_id": 7070360778842318980,
        "solution": "The place is Dr._Wesley_Blaisdell_House. It entered the national-register-of-historic-places in 2021. According to the official document (https://www.nps.gov/subjects/nationalregister/upload/weekly-list-2012-national-register-of-historic-places.pdf), that day (7/17/12) has 12 landmarks listed.",
        "urls": [
            "https://en.wikipedia.org/wiki/Dr._Wesley_Blaisdell_House",
            "https://www.nps.gov/subjects/nationalregister/upload/weekly-list-2012-national-register-of-historic-places.pdf"
        ],
        "true_answer": "12",
        "file_name": ""
    },
    {
        "question": "Among Robert De Niro, Al Pacino, Christopher Walken, and Jessica Lange, who has the longest interval between their first and most recent Academy Award nominations without winning, and what is the length of that span in years? (as of the end of 2024)",
        "task": 2,
        "task_id": 2004279556637086328,
        "solution": "Gathered the years of first and most recent Oscar nominations as follows: Robert De Niro (1977–2024: 47y), Al Pacino (1973–2020: 47y), Christopher Walken (2003: 0y), Jessica Lange (1982–1989: 12y); thus, Robert De Niro and Al Pacino has the longest span, at 47 years.",
        "urls": [
            "https://en.wikipedia.org/wiki/List_of_awards_and_nominations_received_by_Robert_De_Niro",
            "https://en.wikipedia.org/wiki/List_of_awards_and_nominations_received_by_Al_Pacino",
            "https://en.wikipedia.org/wiki/List_of_awards_and_nominations_received_by_Christopher_Walken",
            "https://en.wikipedia.org/wiki/List_of_awards_and_nominations_received_by_Jessica_Lange"
        ],
        "true_answer": "Robert De Niro and Al Pacino, 47 years",
        "file_name": ""
    },
    {
        "question": "In June 2022, researchers from Huddersfield University published a paper on the application of YOLO in the agricultural. My research primarily focuses on the detection of leaf diseases. Among the works cited in the paper that detect leaf diseases in crops, which crops were involved (alphabetical order by crop name)? For the second last crop listed in alphabetical order is mentioned, in the original text of the referenced paper, the study area's displayed images include multiple agricultural parks. What are the latitude and longitude of the agricultural park with the sign lying down?",
        "task": 3,
        "task_id": -6242908282080694640,
        "solution": "",
        "urls": [
            "https://arxiv.org/html/2406.10139v1",
            "https://www.nature.com/articles/s41598-023-33270-4"
        ],
        "true_answer": "apple, bell pepper, mulberry, rice, tea, tomato; 24°56′11.2″ north latitude and 91°52′01.2″ east longitude",
        "mm": 1,
        "file_name": ""
    },
    {
        "question": "The most renowned member of this family codified a deeply influential customary law that is still cited in certain regions of the Balkans to this day. At the same time, another member of the family composed poetry in Ottoman Turkish during the 16th century, becoming a prominent figure in Diwan literature. These two individuals, celebrated for their contributions to law and literature respectively, belonged to a family whose earliest name-known descendants, aside from the founding ancestor who gave the family its name. There is also one unnamed sibling whose existence is confirmed by a certain historical event. On the official wiki page of the year that this event happened, how many deaths of important figure recorded?",
        "task": 1,
        "task_id": -8646987055932764674,
        "solution": "",
        "urls": [
            "https://en.wikipedia.org/wiki/Dukagjini_family",
            "https://www.dukagjini.org/geneaology/from-gjin-tanushi-to-tanush-i-madh",
            "https://en.wikipedia.org/wiki/1387"
        ],
        "true_answer": "8",
        "file_name": ""
    },
    {
        "question": "Based on historical accounts of gang formation and expansion, by estimating the average annual number of Chicago street gangs that first established a documented presence in the metropolitan suburbs in each of the following decades: 1960s, 1970s, and 1980s, give me the decade saw the greatest average annual expansion of gang settlements into the suburbs?",
        "task": 1,
        "task_id": -5593406584291983807,
        "solution": "Step 1: Use Chicago Gang History decade summaries and 'Death of the disco era 1979-1980' post to count the number of new suburbs that first saw documented presence of major Chicago gangs in each decade. Step 2: For the 1960s, approximately 5 suburbs; for the 1970s, approximately 10; for the 1980s, approximately 23—with the 1980s reporting most new first-time suburban settlements arriving in the single year 1980. Step 3: Calculate the average per year: 1960s: 5/10 = 0.5; 1970s: 10/10 = 1; 1980s: 23/10 = 2.3. Conclusion: The decade with the greatest average annual expansion was the 1980s, at 2.3 suburbs/year.",
        "urls": [
            "https://en.wikipedia.org/wiki/Gangs_in_the_United_States",
            "https://en.wikipedia.org/wiki/Gangs_in_Chicago",
            "https://chicagoganghistory.com/history/1958/",
            "https://chicagoganghistory.com/history/1966/",
            "https://chicagoganghistory.com/history/death-of-the-disco-era-1979-1980-a-new-recession-and-a-new-racial-and-gang-migration-shift/"
        ],
        "true_answer": "1980s",
        "file_name": ""
    },
    {
        "question": "For the earliest listed on the National Register of Historic Places in the Town of Colonie, its wiki homepage had an edit made by a user between approximately 2010 and 2020. The modification in this edit version was the addition of the field 'Category:National Register of Historic Places in Albany County, New York' under 'External links' in 'categories'. If this edit is numbered as 1, with earlier edits numbered incrementally, then what was the modification made in edit number 4? Answer directly using the comment information in the second-to-last parentheses of the edit record.",
        "task": 2,
        "task_id": -3759411657113421672,
        "solution": "the place is Watervliet Shaker Historic District. hamins add the entry: https://en.wikipedia.org/w/index.php?title=Watervliet_Shaker_Historic_District&diff=prev&oldid=721267992; https://en.wikipedia.org/w/index.php?title=Watervliet_Shaker_Historic_District&diff=prev&oldid=676102047 edited the year from 1847 -> 1848",
        "urls": [
            "https://www.albanycountyny.gov/our-county/historic-albany-county",
            "https://en.wikipedia.org/w/index.php?title=Watervliet_Shaker_Historic_District&diff=prev&oldid=676102047",
            "https://en.wikipedia.org/w/index.php?title=Watervliet_Shaker_Historic_District&diff=prev&oldid=721267992",
            "https://en.wikipedia.org/wiki/Watervliet_Shaker_Historic_District",
            "https://en.wikipedia.org/wiki/National_Register_of_Historic_Places_listings_in_Albany_County,_New_York"
        ],
        "true_answer": "->Meeting House: Corrected year of building",
        "file_name": ""
    },
    {
        "question": "The population of Stone, Staffordshire, as shown in successive census records and historical documents. For each decade (1991-2001, 2001-2011, 2011-2021), calculate the average annual population growth rate. For the decade saw the lowest average annual growth rate, what is the third largest ethnic group of the last year of that decade and the number is?",
        "task": 2,
        "task_id": -4032618762750604563,
        "solution": "Step 1: Gather census-based population for Stone: 1991: 12,305; 2001: 14,555; 2011: 16,385; 2021: 17,278. Step 2: For each decade, calculate the average annual growth rate: Use the formula: [(EndPop / StartPop)^(1/10)] - 1. So, 1991-2001: ((14,555 / 12,305)^(1/10) - 1) ≈ 0.01693 (1.693%). 2001-2011: ((16,385 / 14,555)^(1/10) - 1) ≈ 0.01191 (1.191%). 2011-2021: ((17,278 / 16,385)^(1/10) - 1) ≈ 0.00532 (0.532%). Step 3: The highest value is for 1991–2001 at 1.693% per year. White: 16616, asian: 253, black:59.",
        "urls": [
            "https://en.wikipedia.org/wiki/Stone,_Staffordshire",
            "https://citypopulation.de/en/uk/westmidlands/staffordshire/E63002109__stone/"
        ],
        "true_answer": "black: 59",
        "file_name": ""
    },
    {
        "question": "Given the word 'house' in Arabic, compute its adaptive augmentation weight as follows: For each character, obtain its relative frequency in Arabic (in %) from Wikipedia's Arabic letter frequency table. Calculate the mean of the reciprocals of these frequencies. Rounded to three decimal places, what is the final weight?",
        "task": 1,
        "task_id": -3407937027220835275,
        "solution": "Decompose 'بيت' to ['ب', 'ي', 'ت']; frequencies are 4.67, 6.36, 2.61. Reciprocals: 1/4.67=0.21413276231263384, 1/6.36=0.15723270440251572, 1/2.61=0.3831417624521073. Mean weight=(0.214+0.157+0.383)/3 = 0.252 (rounded to 3 decimal places).",
        "urls": [
            "https://en.wikipedia.org/wiki/Arabic_letter_frequency"
        ],
        "true_answer": "0.252",
        "file_name": ""
    },
    {
        "question": "In June 2025, which city—Guangzhou or Shenzhen—had daily maximum temperatures closer on average to the Southern Coastal China June average (defined as the midpoint of 26°C and 33°C), and what was the mean absolute deviation for the closer city? (maybe you can use tianqihoubao.com) For that city, on which day in June does it have the highest precipitation?",
        "task": 3,
        "task_id": 7864276530751981964,
        "solution": "1. Retrieve daily maximum temperatures for all 30 days of June 2025 for Guangzhou and Shenzhen from their respective historical weather tables. 2. Determine the regional June average as (26+33)/2=29.5°C. 3. For each day, compute the absolute difference between the city's daily maximum and 29.5°C, and take the mean over the 30 days (the mean absolute deviation). 4. Guangzhou's MAD: 2.87°C; Shenzhen's MAD: 3.07°C. 5. Shenzhen is closer, with a mean absolute deviation of 2.3°C.",
        "urls": [
            "https://www.tianqihoubao.com/lishi/guangzhou/month/202506.html",
            "https://www.tianqihoubao.com/lishi/shenzhen/month/202506.html",
            "https://www.tianqihoubao.com/lishi/shenzhen/6.html"
        ],
        "true_answer": "Shenzhen, 2.3°C; 9th june.",
        "file_name": ""
    },
    {
        "question": "Between the game's release month and three months afterward, what was the average monthly percentage change in peak concurrent players for a superhero PvP shooter game released in 2024 (from steamcharts.com)? The game lost the highest average number of players in a month before July 2025. At the beginning of that month, there was a Twitch Drops event where watching for 30 minutes rewarded an item featuring two characters. Before July 2025, how many times was the character on the right buffed and nerfed, respectively? Rounded to two decimals.",
        "task": 3,
        "task_id": -7005061943857780571,
        "solution": "1. Get peak concurrent players for Marvel Rivals for Dec 2024, Jan 2025, Feb 2025, and Mar 2025 from statistics (418,832; 642,333; 437,272; 307,057). 2. Calculate % change month-to-month: (Jan-Dec)/Dec*100 = +53.36%, (Feb-Jan)/Jan*100 = -31.92%, (Mar-Feb)/Feb*100 = -29.78%. 3. Average the percentages: (53.36 + -31.92 + -29.78) / 3 = -2.78%. 4. The month this game lost the highest number of average players is March 2025. 5. The first twitch drops shown in marvelrivals.com is https://www.marvelrivals.com/announcements/20250217/40955_1212338.html, Feb 21st 12:00 PM - Mar 6th 23:59, UTC 0. The 30mins reward is a spray, the human torch on the right of the spray. 6. Finally, use the https://marvelrivals.fandom.com/wiki/Human_Torch#Strategy, we can count the red arrow (nerf), 7, and green arrow (buffed), 3.",
        "urls": [
            "https://steamcharts.com/app/2767030",
            "https://www.marvelrivals.com/announcements/20250217/40955_1212338.html",
            "https://marvelrivals.fandom.com/wiki/Human_Torch#Balance_Changes"
        ],
        "true_answer": "-2.78; buffed: 3, nerfed: 7",
        "file_name": ""
    },
    {
        "question": "Between 2010 and 2024 (July 1), which Miami-Dade County census-designated place—The Hammocks or Kendall West—had a greater average annual increase (change/years) in population density (persons per square mile, land area), and by how much (rounded to two decimal places during all of the calculation)?",
        "task": 3,
        "task_id": -1611818379839653984,
        "solution": "Step 1: Gather population and area for The Hammocks and Kendall West from Wikipedia and HomeTownLocator for 2010 and 2024.\nThe Hammocks: 2010 pop=51,003; 2024 pop=59,480; land=7.89 sq mi.\nKendall West: 2010 pop=36,154; 2024 pop=36,536; land=2.75 sq mi. The Hammocks:2010 population density = 6464.26 persons/sq mi.  2024 population density = 7538.66 persons/sq mi. Annual increase in population density = 74.10 persons/sq mi per year. Kendall West: 2010 population density = 13146.91 persons/sq mi 2024 population density = 13285.82 persons/sq mi Annual increase in population density = 9.58 persons/sq mi per year. The Hammocks had a greater average annual increase in population density by 66.82 persons/sq mi per year.",
        "urls": [
            "https://florida.hometownlocator.com/fl/miami-dade/the-hammocks.cfm",
            "https://en.wikipedia.org/wiki/The_Hammocks,_Florida",
            "https://florida.hometownlocator.com/fl/miami-dade/kendall-west.cfm",
            "https://en.wikipedia.org/wiki/Kendall_West,_Florida",
            "https://florida.hometownlocator.com/census/sorted-demographics.cfm"
        ],
        "true_answer": "The Hammocks, 66.82",
        "file_name": ""
    },
    {
        "question": "For each decade (19x0s) since The Beach Boys' debut, what is the average song per album of the decade that had the highest number of albums of The Beach Boys containing songs written or co-written by Mike Love? (Rounded to 2 decimals. Use the data in wikipedia)",
        "task": 3,
        "task_id": 8716535681673033520,
        "solution": "https://en.wikipedia.org/wiki/List_of_songs_recorded_by_the_Beach_Boys. 1960s. 1962: Surfin' Safari: 5; 1963: Christmas Album: 1, Little Deuce Coupe: 3, Surfer Girl: 4, Surfin' U.S.A.: 3; 1964: All Summer Long: 8, Christmas Album: 2, Shut Down Volume 2: 4, Surfer Girl: 0; 1965: Party!: 0,Summer Days (And Summer Nights!!): 7, The Monkey's Uncle Soundtrack: 0, Today!: 10; 1966: Pet Sounds: 3; 1967: Smiley Smile: 3, Wild Honey: 10; 1968: Friends:4; 1969: 20/20:1. 75/18= 4.17",
        "urls": [
            "https://en.wikipedia.org/wiki/List_of_songs_recorded_by_the_Beach_Boys"
        ],
        "true_answer": "4.17",
        "file_name": ""
    },
    {
        "question": "Between 2023 and 2024, whose share changed more as a percentage of global digital ad spending: the US Social Media Ad Spending or the US Digital Ad Spending (use the data source at least in 2024 from oberlo.com.)? Provide both percentage-point changes and state which increased more.",
        "task": 2,
        "task_id": 6020739205795438459,
        "solution": "Step 1: Find Social Media Ad Spend in the US for 2023 and 2024 from Statista ($72.3B and $76.4B). Step 2: Find global digital ad spend for 2023 and 2024 from Oberlo ($679.8B and $740.3B). Step 3: Calculate Social Media Ad Spend in the US for each year: 72.3/679.8 = 10.64%; 76.4/740.3 = 10.32%. Step 4: Find US digital ad spending for 2023 and 2024 from Oberlo ($271.2B, $298.4B). Step 5: Calculate US share for each year: 271.2/679.8 = 39.91%; 298.4/740.3 = 40.31%. Step 6: Compute change in shares (2024 vs 2023): Social Media Ad Spend = -0.32pp; US = 0.40pp. ",
        "urls": [
            "https://www.oberlo.com/statistics/digital-ad-spend",
            "https://www.oberlo.com/statistics/us-digital-ad-spending",
            "https://www.oberlo.com/statistics/social-media-ad-spend"
        ],
        "true_answer": "the US Digital Ad Spending: 0.40%, Social Media Ad Spend = -0.32%;  US Digital Ad Spending",
        "file_name": ""
    },
    {
        "question": "Consider Dev's singles 'Bass Down Low' and 'In the Dark' each of which charted in both the United States (Billboard) and the United Kingdom (officialcharts.com). One of them ranked higher on US that UK. In the week when this song had its lowest UK chart position, the song ranking 15th in that week, what was the username of the user who edited its Wikipedia page on January 11, 2012? Then, using all Dev singles that charted on the US Billboard Hot 100, remove the singles with the highest and lowest peak positions. For the remaining singles, calculate the variance in their total weeks spent on the US Billboard Hot 100. What are the two results?",
        "task": 2,
        "task_id": 4596877900835873869,
        "solution": "Step 1: Find US and UK peak chart positions for 'Bass Down Low' and 'In the Dark'. 'Bass Down Low': US peak #61, UK peak #10. Absolute difference = 51. 'In the Dark': US peak #11, UK peak #37. Absolute difference = 26. Average of differences: (51 + 26) / 2 = 38.5. the chart of the week is https://www.officialcharts.com/charts/singles-chart/20110911/7501/. The 15th soon is DOWN WITH THE TRUMPETS. From the wiki page, https://en.wikipedia.org/wiki/Down_with_the_Trumpets, the editor is Earl CG. Step 2: For 'Backseat' (20 weeks), 'In the Dark' (20 weeks), 'Bass Down Low' (12 week), mean = 17. Variance = ((20-17)² + (20-17)² + (12-17)²)/3 = 14.33.",
        "urls": [
            "https://www.officialcharts.com/songs/dev-ft-the-cataracs-bass-down-low/",
            "https://www.officialcharts.com/songs/dev-in-the-dark/",
            "https://www.officialcharts.com/charts/singles-chart/20110911/7501/",
            "https://www.billboard.com/artist/dev/chart-history/",
            "https://en.wikipedia.org/wiki/Down_with_the_Trumpets"
        ],
        "true_answer": "Earl CG, 14.33",
        "file_name": ""
    },
    {
        "question": "For the 2010 official CD compilation album 'Best Of Nena', what percentage of its total album duration is accounted for by tracks whose original release year was in the 1980s? (Maybe you can use discogs.com. Rounded to interger.)",
        "task": 2,
        "task_id": 9019051685966934048,
        "solution": "1. Liebe Ist (Radio Version) - 4:04 = 4 * 60 + 4 = 244 secs, 2005 2. Leuchtturm (New Version) - 4:16 = 4 * 60 + 16 = 256 secs3. Anyplace, Anywhere, Anytime (New Version) - 4:04 = 4 * 60 + 4 = 244 secs4. Haus Der Drei Sonnen (New Version, Radio Edit) - 3:44 = 3 * 60 + 44 = 224 secs5. Wir Sind Wahr (Radio Edit) - 3:58 = 3 * 60 + 58 = 238 secs6. Mein Weg Ist Mein Weg - 4:38 = 4 * 60 + 38 = 278 secs7. ? (Fragezeichen) - 3:45 = 3 * 60 + 45 = 225 secs8. 99 Luftballons (2009) - 3:56 = 3 * 60 + 56 = 236 secs9. Es Interessiert Mich Nicht - 3:47 = 3 * 60 + 47 = 227 secs10. Nur Getraumt - 3:43 = 3 * 60 + 43 = 223 secs11. Rette Mich - 3:21 = 3 * 60 + 21 = 201 secs12. Old School Baby - 3:05 = 3 * 60 + 5 = 185 secs13. Lass Mich - 3:28 = 3 * 60 + 28 = 208 secs14. Jetzt Bist Du Weg (Live) Featuring – Udo Lindenberg - 5:09 = 5 * 60 + 9 = 309 secs15. Mach Die Augen Auf - 3:26 = 3 * 60 + 26 = 206 secs16. Silbermond - 5:39 = 5 * 60 + 39 = 339 secs17. Willst Du Mit Mir Gehen - 3:48 = 3 * 60 + 48 = 228 secs18. Wunder Geschehen (Live) - 5:58 = 5 * 60 + 58 = 358 secs19. In Meinem Leben (Album Version) - 5:40 = 5 * 60 + 40 = 340 secs 1. Find the official tracklist and durations for 'Best Of Nena' (Discogs, release 6061040):\n   [(track 1, 4:04), (track 2, 4:16), ..., (track 19, 5:40)].\n2. For each track, use Wikipedia/Discogs to look up the original release year.\n   7 of the 19 tracks are originally from 1980-1989: 'Leuchtturm', 'Haus Der Drei Sonnen', '? (Fragezeichen)', '99 Luftballons', 'Nur Getraumt', 'Rette Mich', 'Wunder Geschehen'.\n3. Sum durations for these 7 tracks: 1723 seconds. Total album duration: 4612 seconds.\n4. Compute proportion: 1723 / 4612 = 0.3736; answer: 37.36%.",
        "urls": [
            "https://www.discogs.com/release/6061040-Nena-Best-Of-Nena",
            "https://en.wikipedia.org/wiki/Nena_discography",
            "https://www.allmusic.com/artist/nena-mn0000395187"
        ],
        "true_answer": "37%",
        "file_name": ""
    },
    {
        "question": "Determine the population of the City of Coffs Harbour in 2001 and 2021 using official census records or authoritative population tables. Obtain the annual mean rainfall (in millimeters) recorded for Coffs Harbour. Refer to the City of Coffs Harbour Council’s 2020/21 Annual Report to find the own-source operating revenue ratio for that financial year. Give me the following result: Then, use the above data, calculate the average annual population increase from 2001 to 2021 (rounded to the nearest whole number) and determine the mean annual rainfall per new resident added during this period (rounded to one decimal place). Also, provide the own-source operating revenue ratio.",
        "task": 3,
        "task_id": -4222866287472559719,
        "solution": "1. Find population in 2001 and 2021 from Wikipedia/table: 61,186 and 78,759; 2. Compute average annual increase: (78,759 - 61,186)/20 = 879; 3. Obtain mean annual rainfall from Wikipedia Coffs Harbour climate table: 1,699.0 mm; 4. Divide rainfall by annual population increase: 1,699.0 / 879 = 1.9 mm (rounded to one decimal place).  Own source operating revenue ratio: 2020/21=76.0%. URLs: Wiki Coffs Harbour for population; Wikipedia Coffs Harbour for climate; 2020/21 Council Annual Report for financial metric.",
        "urls": [
            "https://en.wikipedia.org/wiki/City_of_Coffs_Harbour",
            "https://en.wikipedia.org/wiki/Coffs_Harbour",
            "https://www.coffsharbour.nsw.gov.au/files/sharedassets/public/your-council/corporate-planning-and-reporting/2020-21/annual-report-202021/annual-report-section-1-significant-achievements-2020-21.pdf",
            "https://www.abs.gov.au/census/find-census-data/quickstats/2021/LGA11800",
            "https://www.abs.gov.au/census/find-census-data/quickstats/2001/LGA11800"
        ],
        "true_answer": "Average annual population increase: 879; Mean annual rainfall per new resident added: 1.9 mm; the own source operating revenue ratio: 76%",
        "file_name": ""
    },
    {
        "question": "From 1900 to 1950, in the year with the highest number of live births in the UK, the first event recorded in October on the that year of UK's Wikipedia page is about a university. Regarding this event, the university's website has a complete timeline. Could you please specify the number of students involved in the 1920 event mentioned on that timeline and describe what the 1935 events were?",
        "task": 3,
        "task_id": 7762302646050004561,
        "solution": "1920, according to https://en.wikipedia.org/wiki/1920_in_the_United_Kingdom. It's about university of oxford.",
        "urls": [
            "https://en.wikipedia.org/wiki/1920_in_the_United_Kingdom",
            "https://www.statista.com/statistics/281981/live-births-in-the-united-kingdom-uk/"
        ],
        "true_answer": "130;1935 events:  (1)The first black woman graduated from The University of Oxford. (2)Merze Tate: expert on US diplomacy graduated. ",
        "file_name": ""
    },
    {
        "question": "Between 2014 and 2024, which year has the most new Trustees of current Embry-Riddle Board of Trustees? And how many of current Trustees received the Living Legend of Aviation?",
        "task": 2,
        "task_id": -523805511707264742,
        "solution": "2014–2024:{2014: 1+1+1, 2018: 1, 2019: 1+1+1, 2021: 1, 2023: 1, 2024: 1+1}. Living Legend of Aviation: Kenn Ricci, Jean Rosanvallon",
        "urls": [
            "https://news.erau.edu/headlines/embry-riddle-welcomes-two-aviation-business-leaders-to-board-of-trustees"
        ],
        "true_answer": "2014 & 2018; 2",
        "file_name": ""
    },
    {
        "question": "The authors of GenBench, A Taxonomy and Review of Generalization Research, posted their video on YouTube to introduce their work. In the video, they mentioned two rates regarding the motivation situations of the papers they reviewed, which are illustrated in a figure in the original text. Give me the number of the first figure mentions this. For the definition corresponding to the smaller rate, among all the papers from 2019 included in GenBench, what is the last name of the first author? Arrange them in alphabetical order.",
        "task": 3,
        "task_id": 5053399225946356797,
        "solution": "1.Search A Taxonomy and Review of Generalization Research. 2. They have a website to illustrate the references: https://genbench.org/references/ 3. Transcript the video from https://youtubetotranscript.com/transcript?v=ARipp-MslrY&current_language_code=en. 4. The rates mentioned in the video are motivation of papers, 70% practical and 3% Fairness. 5. The motivation rate illustrated in the 'Figure 2'. 6. Filter the paper in the website, https://genbench.org/references/?motivation=fairness. Only one paper satisfied https://dl.acm.org/doi/10.1145/3308560.3317593. The last name of the first author is 'Borkan'.",
        "urls": [
            "https://genbench.org/references/",
            "https://scitube.io/genbench-mapping-out-the-landscape-of-generalization-research/",
            "https://youtubetotranscript.com/transcript?v=ARipp-MslrY&current_language_code=en",
            "https://genbench.org/references/?motivation=fairness",
            "https://dl.acm.org/doi/10.1145/3308560.3317593"
        ],
        "true_answer": "Figure 2; Borkan",
        "file_name": ""
    },
    {
        "question": "KFF published an article on abortion in Women's Health Policy on Feb 27, 2025. In the article, they included a line chart, and I would like to understand some specific data points in the chart. The data can be downloaded from the article, along with the cited data sources. According to the file at Public Funding for Family Planning and Abortion Services (FY 1980-2015) Tables of guttmacher.org, in 1987, the state with the highest public expenditures for family planning client services had historical data from Apr' 22 to Dec' 22 in the data source cited by KFF's article above mentioned. Now, using single exponential smoothing and MSE, search for the optimal alpha (0.01-0.99) based on the historical data, the MSE loss, and use the alpha to estimate the next data point. (all results should be rounded to two decimals)",
        "task": 2,
        "task_id": -477797075155173887,
        "solution": "From https://societyfp.org/wp-content/uploads/2024/10/WeCount-Report-8-June-2024-data.pdf, table 3, the 5th state is Illinois. Search and visit the article https://www.kff.org/womens-health-policy/issue-brief/key-facts-on-abortion-in-the-united-states/. Download the data soure in the line chart, the data points are [5590.0, 5550.0, 6170.0, 6780.0, 7250.0, 6640.0, 6620.0, 6320.0, 7190.0]. Write code to fulfill the procedure, we have 0.99, 24,45, 7181.33",
        "urls": [
            "https://societyfp.org/wp-content/uploads/2024/10/WeCount-Report-8-June-2024-data.pdf",
            "https://www.kff.org/womens-health-policy/issue-brief/key-facts-on-abortion-in-the-united-states/",
            "https://www.guttmacher.org/sites/default/files/report_downloads/public-funding-family-planning-abortion-services-fy-1980-2015-tables.pdf"
        ],
        "true_answer": "0.99, 24.45, 7181.33",
        "file_name": ""
    },
    {
        "question": "Using data from the World Bank for 2014-2023, calculate annual averages for: (1) GDP growth rate, (2) Inflation rate (consumer prices), (3) CO₂ emissions per capita growth (excluding LULUCF), (4) Internet users (% population), (5) Women in parliament (%). Find the intersection between: (a) top 200 countries in (1)/(3)/(4)/(5) and (b) bottom 200 countries in (2). Rank the countries in ascending order based on their CO₂ emissions growth rate, with CO₂ emissions growth rate rounded to 5 decimals. For the growth rate conditions mentioned above, if the World Bank provides growth rate data, prioritize using the World Bank's growth rate data. Otherwise, calculate it using the following formula: Growth Rate (%) = ( (data[n] - data[n-1]) / (data[n-1] + 0.00001) ) × 100  The small constant 0.00001 is added to avoid division by zero in cases where data[n-1] = 0. Give me the answer as 'Country:  CO₂ emissions growth rate'",
        "task": 2,
        "task_id": 1037147421468932279,
        "solution": "1. download 5 csv/xlsx files from provided urls. 2. For CO₂ emissions data, since the World Bank does not provide growth rates, calculate them using CO₂ emission quantities.  3. After calculation, find the intersection across all 5 data files. 4. Result:[{'Country Name': 'Viet Nam', 'Average Growth Rate': 6.088553, 'Average Inflation Rate': 2.870476, 'Average Carbon per Capita Rate': 2.460674, 'Average Women Seat': 27.297918, 'Average internet-use Capita': 63.68}, {'Country Name': 'Israel', 'Average Growth Rate': 3.799108, 'Average Inflation Rate': 1.076182, 'Average Carbon per Capita Rate': 0.581567, 'Average Women Seat': 25.750000, 'Average internet-use Capita': 84.35}, {'Country Name': 'Slovenia', 'Average Growth Rate': 3.036577, 'Average Inflation Rate': 2.255981, 'Average Carbon per Capita Rate': 0.543596, 'Average Women Seat': 33.000000, 'Average internet-use Capita': 81.69}] 5. Rank the country, we have the final answer as: Slovenia, Israel, Viet Nam.",
        "urls": [
            "https://data.worldbank.org/indicator/SG.GEN.PARL.ZS?locations=CN",
            "https://data.worldbank.org/indicator/NY.GDP.MKTP.KD.ZG?locations=CN",
            "https://data.worldbank.org/indicator/FP.CPI.TOTL.ZG?locations=CN",
            "https://data.worldbank.org/indicator/EN.GHG.CO2.PC.CE.AR5?locations=CN",
            "https://data.worldbank.org/indicator/IT.NET.USER.ZS?locations=CN"
        ],
        "true_answer": "Slovenia:  0.54360, Israel: 0.58157, Viet Nam: 2.46067",
        "file_name": ""
    },
    {
        "question": "Which Italian region experienced the largest net decrease in the number of municipalities between 1951 and 2011 (from istat.it), and by how much did its population density change in the same period (in inhabitants per km²)?",
        "task": 2,
        "task_id": 4173863569215186380,
        "solution": "Step 1: For each Italian region, obtain number of municipalities in 1951 and 2011 using ISTAT 'Tavola_2.24.xls'. Step 2: For each region, calculate the difference (2011 - 1951) in number of municipalities. Step 3: Identify the region with the greatest negative difference (largest decrease): Marche (-6). Step 4: For Marche, obtain population density for 1951 (140.74) and 2011 (163.95) from same table. Step 5: Calculate the change: 163.95 - 140.74 = +23.21 inhabitants/km². Result: Marche, +23.21 inhabitants/km².",
        "urls": [
            "https://seriestoriche.istat.it/fileadmin/documenti/Tavola_2.24.xls"
        ],
        "true_answer": "Marche, +23.21 inhabitants/km²",
        "file_name": ""
    },
    {
        "question": "Among Emma Goldman's books published between 1910 and 1930, for the book with the largest page counts according to The Anarchist Library, what is the job position of the second dr. mentioned in that book?",
        "task": 2,
        "task_id": 8314229658655898831,
        "true_answer": "Medical Adviser",
        "file_name": ""
    },
    {
        "topic": "Statistical analysis of time-based player and user sentiment trends for Marvel Rivals (gaming analytics across playerbase and review data)",
        "question": "Between the game's release month and three months afterward, what was the average monthly percentage change in peak concurrent players for a superhero PvP shooter game released in 2024 according to steamcharts? The game lost the highest average number of players in a month before July 2025. At the beginning of that month, there was a Twitch Drops event where watching for 30 minutes rewarded an item featuring two characters. Before July 2025, how many times was the character on the right buffed and nerfed, respectively? Rounded to two decimals.",
        "true_answer": "-2.78; buffed: 3, nerfed: 7",
        "task": 3,
        "task_id": -4884482103103493110,
        "file_name": "",
        "structure": {
            "Domain": "Gaming",
            "Logic_Flow": {
                "structure": "parallel",
                "details": [
                    "A->B",
                    "C->D",
                    "B, D->answer"
                ]
            },
            "Aggregation_Operation": {
                "type": [
                    "Element-wise->Math",
                    "List/Set-wise->Aggregation->average",
                    "List/Set-wise->Aggregation->percentage change",
                    "Element-wise->Direct retrieval"
                ]
            }
        },
        "annotation": {
            "Question_Solvability": {
                "Solvability": 1,
                "Consistency": {
                    "score": 1,
                    "another_ref_answer": ""
                },
                "Reference_Quality": 1,
                "Solution_Quality": {
                    "score": 1,
                    "Solution_Quality_Comment": ""
                }
            }
        },
        "mm": 1,
        "solution": "1. Get peak concurrent players for Marvel Rivals for Dec 2024, Jan 2025, Feb 2025, and Mar 2025 from statistics (418,832; 642,333; 437,272; 307,057). 2. Calculate % change month-to-month: (Jan-Dec)/Dec*100 = +53.36%, (Feb-Jan)/Jan*100 = -31.92%, (Mar-Feb)/Feb*100 = -29.78%. 3. Average the percentages: (53.36 + -31.92 + -29.78) / 3 = -2.78%. 4. The month this game lost the highest number of average players is March 2025. 5. The first twitch drops shown in marvelrivals.com is https://www.marvelrivals.com/announcements/20250217/40955_1212338.html, Feb 21st 12:00 PM - Mar 6th 23:59, UTC 0. The 30mins reward is a spray, the human torch on the right of the spray. 6. Finally, use the https://marvelrivals.fandom.com/wiki/Human_Torch#Strategy, we can count the red arrow (nerf), 7, and green arrow (buffed), 3.",
        "urls": [
            "https://steamcharts.com/app/2767030",
            "https://www.marvelrivals.com/announcements/20250217/40955_1212338.html",
            "https://marvelrivals.fandom.com/wiki/Human_Torch#Balance_Changes"
        ]
    },
    {
        "topic": "",
        "question": "Which 2021 film used the attached image as a filming location? The movie is based on a novel by an author born in Sheffield, 1975, raised in Newark, Nottinghamshire, who writes both fiction and non-fiction—often speculative—for children and adults.",
        "true_answer": "A Boy Called Christmas",
        "task": 1,
        "task_id": 3024338875638427332,
        "file_name": "files/14556897.jpg",
        "mm": 1,
        "solution": "",
        "urls": [
            "https://upload.wikimedia.org/wikipedia/commons/c/c6/Krivoklat_castle_01.jpg",
            "https://en.wikipedia.org/wiki/K%C5%99ivokl%C3%A1t_Castle",
            "https://www.latlong.net/location/a-boy-called-christmas-locations-1455",
            "https://en.wikipedia.org/wiki/Matt_Haig"
        ]
    },
    {
        "topic": "",
        "question": "In the game PUBG: BATTLEGROUNDS, what is the type of vehicle shown in the bottom right corner of the image? What is the maximum capacity of this vehicle type? Were there any balance changes made to this vehicle type in the major update released in February 2025?",
        "true_answer": "Buggy, 2, Yes",
        "task": 2,
        "task_id": -851942301233224366,
        "file_name": "files/EqAxmbSL.jpg",
        "mm": 1,
        "solution": "",
        "urls": [
            "https://pubg.com/en/news/8170",
            "https://pubg.com/en/game-info/vehicles/land"
        ]
    },
    {
        "topic": "",
        "question": "In which patch of League of Legends were the skins shown in the image released? Before this patch, what was the latest patch in which balance changes were made to the champion on the right in the image? Please only provide the patch number, e.g., Patch xx.xx.",
        "true_answer": "Patch 25.06, Patch 14.21",
        "task": 2,
        "task_id": -4895223295901039056,
        "file_name": "files/6fca43aaa72993a9a22349f82-1920x1080.jpg",
        "mm": 1,
        "solution": "",
        "urls": [
            "https://www.leagueoflegends.com/en-us/news/game-updates/patch-25-06-notes/",
            "https://www.leagueoflegends.com/en-us/news/game-updates/patch-14-21-notes/",
            "https://www.leagueoflegends.com/en-us/champions/xayah/"
        ]
    },
    {
        "topic": "",
        "question": "Please specify the game name the screenshot is from, and also indicate the level number in the format 'Level X-Y'. When did the producer release the chef shown in the screenshot?",
        "true_answer": "Overcooked 2, Level 3-2, December 17 2019",
        "task": 2,
        "task_id": 7252457570283213322,
        "file_name": "files/114039_461.png",
        "mm": 1,
        "solution": "",
        "urls": [
            "https://overcooked.fandom.com/wiki/3-2_(Overcooked!_2)",
            "https://en.wikipedia.org/wiki/Overcooked_2",
            "https://steamcommunity.com/sharedfiles/filedetails/?l=french&id=1862773780",
            "https://www.team17.com/news/overcooked-2-winter-wonderland-update-out-now"
        ]
    },
    {
        "topic": "",
        "question": "This is a screenshot from a roguelike deckbuilder game. How many items need to be discovered from the collection to unlock the deck displayed in this screenshot? Which consumable card type is influenced by the voucher card shown in the screenshot: Tarot Cards, Planet Cards, or Spectral Cards?",
        "true_answer": "75, Planet Cards",
        "task": 2,
        "task_id": 1102951346015464311,
        "file_name": "files/960x0.jpg",
        "mm": 1,
        "solution": "",
        "urls": [
            "https://en.wikipedia.org/wiki/Balatro",
            "https://balatrogame.fandom.com/wiki/Decks",
            "https://balatrogame.fandom.com/wiki/Vouchers"
        ]
    },
    {
        "id": 58,
        "user_": 11,
        "question": "Find the movie shown by the poster, look up the Audience Reviews for this movie and find the user who wrote the following review: Pure drama, action, and tragedy.\nThen tell me how many stars this user gave to the movie Final Destination Bloodlines. Your answer should only be a number.",
        "file_name": "files/e9d01e8355382e81c5df385a8f42d763.png",
        "gt_answer_level1": "B",
        "true_answer": "5",
        "urls": [],
        "task": 3,
        "task_id": -9220734779879267551,
        "mm": 1
    },
    {
        "id": 61,
        "user_": 6,
        "question": "Here are three movies: A. A Hard Day's Night, B. City Lights, C. Avengers: Endgame, D. None of them. Find the one that movie poster in rottentomatoes has the following characteristic: The poster features horizontal bands with alternating colors and stylized silhouettes of human figures in motion, depicting lively city activity and urban energy. Then look up the Audience Reviews for this movie and find the user who wrote the following review: i like that part with the cat\nThen tell me how many stars this user gave to the movie Modern Times. Your answer should only be a number.",
        "gt_answer_level1": "B",
        "true_answer": "5",
        "task": 2,
        "task_id": -5246748088079030097,
        "mm": 1,
        "file_name": ""
    },
    {
        "topic": "",
        "question": "How many times was the wiki page of the depicted collection item edited in 2024? What is the name of the museum section where this item is located in its respective museum?",
        "true_answer": "2, Great Court",
        "task": 1,
        "task_id": -8206529700405853894,
        "file_name": "files/12341298126358.png",
        "mm": 1,
        "solution": "",
        "urls": [
            "https://en.wikipedia.org/w/index.php?title=Lion_of_Knidos&action=history",
            "https://www.britishmuseum.org/collection/galleries/great-court"
        ]
    },
    {
        "topic": "",
        "question": "According to the history introduction on the wiki homepage of the medical procedure shown in the picture, the regions where the involved figures are from belong to which countries today? List them in alphabetical order.",
        "true_answer": "China, England, Greece, Itlay, Spain",
        "task": 2,
        "task_id": 6962413058383548261,
        "file_name": "files/aa18972bd40735fa5c275d5a9c510fb30e240879.jpg",
        "mm": 1,
        "solution": "",
        "urls": [
            "https://en.wikipedia.org/wiki/Cupping_therapy",
            "https://en.wikipedia.org/wiki/William_Osler",
            "https://en.wikipedia.org/wiki/Maimonides"
        ]
    },
    {
        "topic": "",
        "question": "The animal shown in the figure has 17,321 first-generation individuals in a certain region, with 53% being male. Assuming all individuals of the less numerous sex participate in reproduction, and according to the population reproduction success rate and breeding territory range indicated on its wiki (using the minimum numbers for range estimation if applicable), and assuming all offspring survive (also using the wiki's data on offspring production), with a male-to-female ratio of 37.1:62.9 in the offspring. After the second-generation individuals mature, when the first- and second-generation individuals in the region reproduce to raise the third generation, what is the total required breeding territory range (allowing any combination within and between the first and second generations)? Assume the territories are circular and do not overlap. All individual counts are estimated by directly truncating decimals.",
        "true_answer": "39.501",
        "task": 3,
        "task_id": -3617963875513628413,
        "file_name": "files/Photograph_of_animal.jpg",
        "mm": 1,
        "solution": "Parent: Male: 17,321 × 53% = ​9,180； Female: 17,321 × 47% = ​8,141. P♂ 9,180 + F1♂ 3,987 = ​13,167; P♀ 8,141 + F1♀ 6,759 = ​14,900. 13167* 0.003 = 39.501",
        "urls": [
            "https://en.wikipedia.org/wiki/Black_drongo"
        ]
    },
    {
        "id": 50,
        "user_": 3,
        "question": "Here are three movies: A. Cabaret, B. It Happened One Night, C. Inglourious Basterds, D. None of them. Find the one that movie poster in rottentomatoes has the following characteristic: The poster features a figure reclining horizontally across stylized text, dressed in vibrant, theatrical attire that includes black tights and black high-heeled boots. Then look up the Audience Reviews for this movie and find the user who wrote the following review: The club scenes are quite good, especially those with Joel Grey. The rest of the film seems like scenes in search of a story.\nThen tell me how many stars this user gave to the movie Justice League: The Flashpoint Paradox. Your answer should only be a number.",
        "gt_answer_level1": "A",
        "urls": [],
        "true_answer": "3",
        "task": 3,
        "task_id": -750343627330825646,
        "mm": 1,
        "file_name": ""
    },
    {
        "topic": "",
        "question": "My friend went to the computer science department of a university, but didn't tell me the name, only sent me a map address. May I ask what is the official address of the computer science department on the university's website?",
        "true_answer": "吉林省长春市朝阳区前进大街2699号吉林大学前卫南区计算机楼",
        "task": 2,
        "task_id": -2450009513575315031,
        "file_name": "files/6e340e234a41dcc798d07061e3cceb1b.png",
        "mm": 1,
        "solution": "",
        "urls": [
            "https://ccst.jlu.edu.cn/index.htm",
            "https://en.wikipedia.org/wiki/Jilin_University"
        ]
    },
    {
        "id": 167,
        "user_": 8,
        "question": "Here are three movies: A. The Elephant Man, B. The Lord of the Rings: The Fellowship of the Ring, C. Blazing Saddles, D. None of them. Find the one that movie poster in rottentomatoes has the following characteristic: The poster prominently features a group of nine riders on horseback surrounded by swirling mist. Then look up the Audience Reviews for this movie and find the user who wrote the following review: Pinnacle of Cinema!!\nThen tell me how many stars this user gave to the movie 2001: A Space Odyssey. Your answer should only be a number.",
        "gt_answer_level1": "B",
        "urls": [],
        "true_answer": "4.5",
        "task": 2,
        "task_id": 5816079011774012686,
        "mm": 1,
        "file_name": ""
    },
    {
        "topic": "",
        "question": "The attached image is a screenshot from a music video. How many albums did the owner of this MV on YouTube release between 2009 and 2025?",
        "true_answer": "8",
        "task": 2,
        "task_id": 6074695049196313014,
        "file_name": "files/dd19f83cae2e9198f7bff0df85cfcea7.png",
        "mm": 1,
        "solution": "",
        "urls": [
            "https://en.wikipedia.org/wiki/See_You_Again",
            "https://en.wikipedia.org/wiki/Wiz_Khalifa"
        ]
    },
    {
        "topic": "",
        "question": "I saw a table in a paper (attached) somewhere about agents. I'm very interested, help me find the arXiv number (yymm.xxxxx) of this paper. Where was this paper published? Please provide the information in the format of Conference/Journal Name + Year (e.g., CVPR 2021).",
        "true_answer": "ICLR 2025",
        "task": 1,
        "task_id": -6465468687375447874,
        "file_name": "files/948a0cb2870f63f93e8316ed04a06fa0.png",
        "mm": 1,
        "solution": "",
        "urls": [
            "https://openreview.net/forum?id=vunPXOFmoi"
        ]
    },
    {
        "topic": "",
        "question": "Which choice is correct for the problem shown in the attached picture?",
        "true_answer": "C",
        "task": 1,
        "task_id": -6085319257783845502,
        "file_name": "files/725f301ae12944404b4e46d3b86ee5f8.png",
        "mm": 1,
        "solution": "",
        "urls": []
    },
    {
        "topic": "",
        "question": "Which choice is correct for the problem shown in the attached picture?",
        "true_answer": "D",
        "task": 2,
        "task_id": -60853192577838455,
        "file_name": "files/f2ae3561fde457bd499534570e577f9a.png",
        "mm": 1,
        "solution": "",
        "urls": []
    },
    {
        "topic": "",
        "question": "All departments of the company are shown in the diagram, but my daughter scribbled on it. Please calculate which department has the highest average salary.",
        "true_answer": "经济部",
        "task": 1,
        "task_id": -2045707909891895924,
        "file_name": "files/1a4e4c6bd88291e66250f2183251e88c.png",
        "mm": 1,
        "solution": "To calculate the average salary for each department, we first need to add up the salaries of the employees in each department, then divide by the number of employees in that department. Here are the average salary calculations for each department: 1. Basic Department: (1830 + 1265 + 1300) / 3 = 4395 / 3 = 1465; 2. Construction Department: (1420 + 1600) / 2 = 3020 / 2 = 1510; 3. Economic Department: (1780 + 1620 + 1690) / 3 = 5090 / 3 = 1696.67; 4. Machinery Department: 1250; 5. Auto Repair Department: 1130; 6. Computer Department: (1280 + 1140) / 2 = 2420 / 2 = 1210. Based on the calculation results, the Economic Department has the highest average salary, at 1696.67 yuan.",
        "urls": []
    },
    {
        "topic": "Film and television statistics – Bella Heathcote",
        "question": "Between 2012 and 2020, Bella Heathcote appeared in several feature films. How many people appeared in all the movie posters?",
        "true_answer": 26,
        "task": 3,
        "task_id": -2064972234470895387,
        "file_name": "",
        "mm": 1,
        "solution": "Gather list of Bella Heathcote's feature films (2012–2020) from IMDb, Wikipedia, Rotten Tomatoes: Relic (3), Professor Marston & the Wonder Women (1), Fifty Shades Darker (2), The Neon Demon (1), Pride and Prejudice and Zombies (2), The Curse of Downers Grove (4), The Rewrite (6), Not Fade Away (2), Dark Shadows (5).",
        "urls": [
            "https://en.wikipedia.org/wiki/Bella_Heathcote",
            "https://www.rottentomatoes.com/m/relic#critics-reviews",
            "https://www.rottentomatoes.com/m/professor_marston_and_the_wonder_women",
            "https://www.rottentomatoes.com/m/fifty_shades_darker",
            "https://www.rottentomatoes.com/m/the_neon_demon",
            "https://www.rottentomatoes.com/m/pride_and_prejudice_and_zombies",
            "https://www.rottentomatoes.com/m/the_curse_of_downers_grove",
            "https://www.rottentomatoes.com/m/the_rewrite",
            "https://www.rottentomatoes.com/m/not_fade_away_2012",
            "https://www.rottentomatoes.com/m/dark-shadows-2010"
        ]
    },
    {
        "topic": "",
        "question": "There is an exhibition about the Ancient Americas on metmuseum.org, in May 2025, in a reimagined Michael C. Rockefeller Wing. Among the exhibits, there is a remarkable one featuring a sculpture of a figure with snow-white teeth. In its audio guide, the third sentence spoken by the first person mentions that the person depicted in the statue took a certain medicine. What is the name of this medicine?",
        "true_answer": "Cohoba",
        "task": 3,
        "task_id": -498881247107920019,
        "file_name": "",
        "mm": 1,
        "solution": "The exhibition is https://www.metmuseum.org/exhibitions/arts-of-the-ancient-americas/exhibition-objects. The Zemí cohoba stand, Taíno artist(s) is the only one with snow-white teeth. Click the audio guide part, find the 1612. part for The Zemí cohoba stand. The first speaker is LAWRENCE WALDRON. The third sentence of LAWRENCE WALDRON is: 'Under this crown is a squatting figure. He's very thin, but he has these broad bulging shoulders. The shaman gnashes his teeth as the Cohoba takes effect and his eyes are streaming tears. So, he's got these keyhole designs on his eyes and then they taper into these teardrops running down the side of his face'. The medicine is Cohoba.",
        "urls": [
            "https://www.metmuseum.org/exhibitions/arts-of-the-ancient-americas/exhibition-objects",
            "https://www.metmuseum.org/exhibitions/arts-of-the-ancient-americas/audio-guide"
        ]
    },
    {
        "topic": "",
        "question": "The album cover of one of Jay Chou's albums features him wearing a red hood. How many times was the English Wikipedia page of this album edited in the second year after its creation?",
        "true_answer": "11",
        "task": 2,
        "task_id": 2196209553240142780,
        "file_name": "",
        "mm": 1,
        "solution": "The album is Fantasy(范特西). The found date of the wiki page is 2006. The page was edited 11 times in 2007.",
        "urls": [
            "https://en.wikipedia.org/w/index.php?title=Fantasy_(Jay_Chou_album)&action=history&offset=&limit=500"
        ]
    },
    {
        "topic": "",
        "question": "An album by a mainland Chinese singer, born between 1985 and 1989, released his/her second album in 2012 and their fifth album in 2017, featuring a cover of a person lying on a white van. What is the title of the 9th song in the album?",
        "true_answer": "茉莉",
        "task": 2,
        "task_id": 568340905157141144,
        "file_name": "",
        "mm": 1,
        "solution": "",
        "urls": [
            "https://en.wikipedia.org/wiki/Silence_Wang",
            "https://music.163.com/#/album?id=3287224"
        ]
    },
    {
        "topic": "",
        "question": "According to World Bank data, among the countries where life expectancy at birth has declined year by year from 2020 to 2023, one country's coat of arms features an extinct creature. On the creature's Wikipedia page, a comparison is made between its size and that of a human. In the diagram, what are the colors of the creature and the human, respectively? (Select from: brown, red, green, white, black, blue.)",
        "true_answer": "brown, blue",
        "task": 3,
        "task_id": 2693163530849303543,
        "file_name": "",
        "solution": "1. visit worldbank and download the life expectancy at birth data. 2. filte the countries, only Germany and Mauritius satisfied. 3. The coat of arms of Mauritius has the extincted bird, dodo. 4. The figure comparing the size of human and dodo, shown that the brown dodo, blue human.",
        "urls": [
            "https://data.worldbank.org/indicator/SP.DYN.LE00.IN",
            "https://en.wikipedia.org/wiki/Dodo"
        ]
    },
    {
        "topic": "",
        "question": "Do the two attached images belong to the same country? Name the countries they belong to.",
        "true_answer": "Yes; Germany",
        "task": 1,
        "task_id": 8165577699281582039,
        "file_name": "files/A_014.jpg",
        "mm": 1,
        "solution": "The two images are from the wikipage of Germany.",
        "urls": [
            "https://en.wikipedia.org/wiki/Germany"
        ]
    },
    {
        "topic": "",
        "question": "In which of Jiang Wen's (姜文) directed movies does the poster feature three people, with the person on the far right wearing a white hat?",
        "true_answer": "Let the Bullets Fly",
        "task": 3,
        "task_id": 2419021775878258797,
        "file_name": "",
        "mm": 1,
        "solution": "",
        "urls": []
    },
    {
        "topic": "",
        "question": "For the Sudoku puzzle in the attached figure, what are the numbers to be filled in the last row from left to right? Do not use any separators.",
        "true_answer": "973418652",
        "task": 2,
        "task_id": 1114866565602152448,
        "file_name": "files/sudoku.png",
        "mm": 1,
        "solution": "files/sudoku_ans.png",
        "urls": []
    },
    {
        "topic": "",
        "question": "Do the two mazes shown in the attached figure still have valid solutions after being marked with pen strokes?",
        "true_answer": "Yes, No",
        "task": 1,
        "task_id": -7531769193940399618,
        "file_name": "files/maze.jpg",
        "mm": 1,
        "solution": "",
        "urls": []
    },
    {
        "id": 96,
        "user_": 9,
        "question": "Here are three movies: A. Fast Five, B. Pretty in Pink, C. Fargo. Which movie poster features the most people?",
        "true_answer": "A",
        "task": 2,
        "task_id": -5541375913113062582,
        "mm": 1,
        "file_name": "",
        "solution": "",
        "urls": [
            "https://www.rottentomatoes.com/m/fast_five",
            "https://www.rottentomatoes.com/tv/fargo",
            "https://www.rottentomatoes.com/m/pretty_in_pink"
        ]
    },
    {
        "question": "For the WNBA drafts of 2022 and 2023: Among all colleges that had at least two players drafted in the 2023 WNBA Draft, which college's rookies had the highest average points per game (PPG) during their 2023 WNBA rookie season, and how does that average compare to that college's average rookie PPG for their draftees from the 2022 draft and season? Return both the college name and the two averages.",
        "task": 2,
        "task_id": 776824039874050533,
        "solution": "1. From the 2023 WNBA Draft page, identify all colleges with at least 2 draftees (Iowa State, LSU, Maryland, South Carolina, South Florida, Stanford, UConn, USC, Virginia Tech).\n2. From the 2023 rookie stats page, collect the rookie season PPG for each draftee from those colleges:\n   - Iowa State:    Joens (2.1), Soares (1.1), Avg: (2.1+1.1)/2 = 1.6\n   - LSU:    Morris (0), Williams (0), Avg: (0+0)/2 = 0\n   - Maryland:    Meyers (1.4), Miller (7.4), Avg: (1.4+7.4)/2 = 4.4\n   - South Carolina:    Saxton (1.1), Beal (0), Cooke (4.1), Amihere (3.1), Boston (14.7), Avg: (1.1+0+4.1+3.1+14.7)/5 = 4.6\n   - South Florida:    Mendjiadeu (3.4), Tsineke (0), Avg: (3.4+0)/2 = 1.7\n   - Stanford:    Prechtel (0), Jones (3.7), Avg: (0+3.7)/2 = 1.9\n   - UConn:    Juhász (5.4), Sénéchal (0.9), Avg: (5.4+0.9)/2 = 3.2\n   - USC:    Adika (0), Sissoko (1.5), Avg: (0+1.5)/2 = 0.75\n   - Virginia Tech:    Soule (0.5), Traylor (1.0), Avg: (0.5+1.0)/2 = 0.75\n3. Among these, South Carolina has the highest average rookie PPG in 2023 (4.6).\n4. For the 2022 South Carolina draftees (if any), check their 2022 rookie PPG. 2022 South Carolina draftees: Henderson (4.8).",
        "urls": [
            "https://www.basketball-reference.com/wnba/draft/2023.html",
            "https://www.basketball-reference.com/wnba/draft/2022.html"
        ],
        "true_answer": "South Carolina: 2023 rookies avg PPG 4.6, 2022 rookies avg PPG 4.8",
        "file_name": ""
    },
    {
        "question": "What was the absolute margin of Donald Trump's popular vote victory over the Democratic candidate in the 2024 U.S. Presidential election, in millions (to two decimal places)? What was the absolute margin in his last win in the U.S. Presidential election (also in millions, to two decimal places)? Which year had the slimmest margin in the popular vote in U.S. Presidential election history up until 2024? Which year had the slimmest margin in the Electoral vote in U.S. Presidential election history up until 2024?",
        "task": 2,
        "task_id": 3408073904297249207,
        "solution": "Get 2024 US presidential votes: Trump: 77,302,580; Harris: 75,017,613. Margin = 2,284,967 votes = 2.28 million votes. In his previous presidential election in 2016, Trump garnered 62,984,828 votes (46.2%), while Hillary Clinton received 65,853,514 votes (48.2%). Clinton's popular vote margin over Trump was approximately 2.87 million votes. The slimmest margin in U.S. presidential election history occurred in 1880, when Republican James A. Garfield narrowly defeated Democrat Winfield Scott Hancock. Garfield secured 4,453,611 votes (48.3%), while Hancock received 4,445,256 votes (48.2%), resulting in a margin of 1,898 votes—approximately 0.09% of the total popular vote. The U.S. presidential election with the slimmest margin in the Electoral College up until 2024 was the 1876 election between Republican Rutherford B. Hayes and Democrat Samuel J. Tilden. Hayes won with 185 electoral votes to Tilden's 184.",
        "urls": [
            "https://www.cnn.com/election/2024/results/president",
            "https://edition.cnn.com/election/2016/results/president",
            "https://en.wikipedia.org/wiki/List_of_United_States_presidential_elections_by_popular_vote_margin"
        ],
        "true_answer": "2.28 million, 2.87 million, 1880, 1876",
        "file_name": ""
    },
    {
        "question": "For each of the following acclaimed science fiction books by female authors—Madeleine L'Engle's 'A Wrinkle in Time', Veronica Roth's 'Divergent', Lois Lowry's 'The Giver', Ursula K. Le Guin's 'The Left Hand of Darkness', and Octavia E. Butler's 'Parable of the Talents'—determine the number of years between the book’s original publication year and the year it won its first major literary or science fiction award. What is the average number of years across all five books?",
        "task": 1,
        "task_id": -1336865014389995605,
        "solution": "Step 1: Find the original publication year for each book:\n- A Wrinkle in Time: 1962\n- Divergent: 2011\n- The Giver: 1993\n- The Left Hand of Darkness: 1969\n- Parable of the Talents: 1998\nStep 2: Find the year it first won a major award:\n- A Wrinkle in Time: 1963 (Newbery Medal)\n- Divergent: 2011 (Goodreads Choice Award)\n- The Giver: 1994 (Newbery/Regina Medal)\n- The Left Hand of Darkness: 1970 (Hugo/Nebula)\n- Parable of the Talents: 1999 (Nebula Award)\nStep 3: Calculate years between publication and first major award:\n- A Wrinkle in Time: 1\n- Divergent: 0\n- The Giver: 1\n- The Left Hand of Darkness: 1\n- Parable of the Talents: 1\nStep 4: Compute the average: (1+0+1+1+1)/5 = 0.8\nFinal answer: 0.8",
        "urls": [
            "https://en.wikipedia.org/wiki/A_Wrinkle_in_Time",
            "https://en.wikipedia.org/wiki/Newbery_Medal",
            "https://en.wikipedia.org/wiki/Sequoyah_Book_Award",
            "https://en.wikipedia.org/wiki/Lewis_Carroll_Shelf_Award",
            "https://en.wikipedia.org/wiki/Divergent_(novel)",
            "https://en.wikipedia.org/wiki/The_Giver",
            "https://en.wikipedia.org/wiki/The_Left_Hand_of_Darkness",
            "https://en.wikipedia.org/wiki/Parable_of_the_Talents_(novel)",
            "https://en.wikipedia.org/wiki/Nebula_Award_for_Best_Novel"
        ],
        "true_answer": "0.8",
        "file_name": ""
    },
    {
        "question": "Between the years 2000 and 2020, for the county named for Alfred Gray, what is the average annualized rate of population change per decade (rounded to three decimal places, answer two numeric values for 2000s and 2010s, respectively), and during which decade did the county experience the greatest absolute average annual rate (growth or decline)? Additionally, compare Gray County's 2000 per capita income to the Kansas state average: what is the percentage difference (rounded to one decimal place, using the format -x.x if Gray County's value is below state average and +x.x for above).",
        "task": 2,
        "task_id": 6158788496616250680,
        "solution": "1. search gray country, kansas. 2. visit the wikipedia. 3. population: 2020: 5904,10.2%, 2010:6006, 2020:5653; 4. 2000–10: ≈ +0.171%/yr; 2010–20: ≈ –0.604%/yr; greatest absolute rate: 2010–20 (decline). 5. $18632 per capita of gray country, $20506 of Kansas.",
        "urls": [
            "https://en.wikipedia.org/wiki/Kansas",
            "https://en.wikipedia.org/wiki/Gray_County,_Kansas",
            "https://en.wikipedia.org/wiki/List_of_Kansas_locations_by_per_capita_income"
        ],
        "true_answer": "0.171, -0.604, 2010-2020 (decline), -9.1",
        "file_name": ""
    },
    {
        "question": "Between the WNBA playoff champions from 2021 to 2023, which team had the largest positive difference between their average playoff points per game and the overall league playoff average for that year, and what was that difference (rounded to one decimal place)?",
        "task": 2,
        "task_id": 1039090028943653921,
        "solution": "1. Find the playoff champion each year (2021: Chicago Sky, 2022: Las Vegas Aces, 2023: Las Vegas Aces). 2. For each, look up their playoff PPG and the overall league playoff average PPG from each season's playoff per-game stats. 3. Calculate the difference: Champion PPG minus league playoff average PPG. 4. Compare the results. 5. The Las Vegas Aces in 2023 had the largest difference at 4.9 PPG (86.3 for Aces, 81.4 league average).",
        "urls": [
            "https://www.basketball-reference.com/wnba/playoffs/2021_per_game.html",
            "https://www.basketball-reference.com/wnba/playoffs/2022_per_game.html",
            "https://www.basketball-reference.com/wnba/playoffs/2023_per_game.html",
            "https://www.basketball-reference.com/wnba/playoffs/2022.html",
            "https://www.basketball-reference.com/wnba/playoffs/2021.html"
        ],
        "true_answer": "Las Vegas Aces, 2023 (4.9)",
        "file_name": ""
    },
    {
        "question": "Among Alaska, Delaware, and California, business corporation law diverges on eligibility to serve as one's own registered agent. According to official state government rules and statutory FAQs, which state explicitly allows a business corporation to act as its own registered agent, if its registered office is identical to its business address? How many states border this state? Your answer should include the state name followed by the number of bordering states.",
        "task": 1,
        "task_id": -6560777825803720480,
        "solution": "1. Alaska (https://www.commerce.alaska.gov/web/cbpl/Corporations/RegisteredAgentsFAQs.aspx): Registered agent cannot be the entity itself—only a separate Alaska individual resident or qualifying corporation may serve. 2. Delaware (https://corp.delaware.gov/agents/, https://corp.delaware.gov/faqs/, https://corp.delaware.gov/faqs-regarding-registered-agents): Delaware law explicitly allows an entity to be its own registered agent if it maintains a physical office in Delaware. 3. California (https://www.sos.ca.gov/business-programs/business-entities/faqs, https://www.sos.ca.gov/business-programs/business-entities/service-of-process): Generally prohibits the business entity from acting as its own agent (except for a sole proprietorship); requires either an individual California resident or a registered Section 1505 corporation. Thus, among the three, only Delaware allows a business corporation to be its own registered agent by rule, provided address/presence requirements are met. Delaware borders Maryland to its south and west, Pennsylvania to its north, New Jersey to its northeast",
        "urls": [
            "https://www.commerce.alaska.gov/web/cbpl/Corporations/RegisteredAgentsFAQs.aspx",
            "https://corp.delaware.gov/agents/",
            "https://corp.delaware.gov/faqs/",
            "https://corp.delaware.gov/faqs-regarding-registered-agents",
            "https://www.sos.ca.gov/business-programs/business-entities/faqs",
            "https://www.sos.ca.gov/business-programs/business-entities/service-of-process",
            "https://en.wikipedia.org/wiki/Delaware"
        ],
        "true_answer": "Delaware, 3",
        "file_name": ""
    },
    {
        "question": "Statista surveys on the share of adults who trust mass media in the United States reveal a noticeable downward trend in recent years (2015-2024). Based on these trends: 1) What is the average annual decrease in the percentage of American adults expressing trust in mass media from 2018 to 2024 (rounded to two decimal places)? 2) If this trend persists, what would be the projected value for 2027 (rounded to the nearest integer)? 3) In the 2024 survey, how many respondents indicated that they placed 'a great deal of trust' or 'a fair amount of trust' in the media (rounded to the nearest integer)?",
        "task": 1,
        "task_id": 8593977713346996478,
        "solution": "Find the Statista chart for the share of adults who trust mass media (2018–2024): 2018=45%, 2024=31%. Compute (31%-45%)/6 years = -2.33 percentage points per year. For 2027, extrapolate: 2024 value 31% - 3*2.33 = 24%. There are 1,007 respondents and the rate is 31%, the result is 312.",
        "urls": [
            "https://www.statista.com/statistics/1347598/share-adult-trust-mass-media-us/"
        ],
        "true_answer": "1) 2.33; 2) 24; (3) 312",
        "file_name": ""
    },
    {
        "question": "Which publisher won the most medals in the 2010s decade for an award established in 1955 that recognizes distinguished illustration in a children's book, and how does this compare to the number of medals they won in the previous decade (2000s)? List the publisher, the counts for both decades, and the difference. When did this publisher first win the Carnegie Medal for Illustration, and what is the nationality of the author of the awarded book?",
        "task": 1,
        "task_id": -5200965022019391564,
        "solution": "Extract the list of annual medal winners and publishers from 2000-2019 from Wikipedia. Tally the number of wins by publisher for each decade: 2000-2009 (2000s) and 2010-2019 (2010s). Walker Books won in 2001, 2002, 2004 (2000s) and 2012, 2014, 2018 (2010s): 3 medals in each decade. The difference is 0. Publishers Weekly confirms the 2012 win for Walker Books, corroborating Wikipedia. See referenced URLs.",
        "urls": [
            "https://en.wikipedia.org/wiki/Carnegie_Medal_for_Illustration",
            "https://en.wikipedia.org/wiki/Selina_Hastings_(writer)"
        ],
        "true_answer": "Walker Books; 3 medals in the 2010s, 3 medals in the 2000s; difference: 0; 1985; British",
        "file_name": ""
    },
    {
        "question": "For Call of Duty: Modern Warfare II, a major update known as 'Season 2' was launched in 2023. How many days did Season 2 last (inclusive of both the start and end dates)? One of my friends played the game on November 3, 2023. Which season was he playing? The sequel, released at the end of 2023, won which award at the 2025 Game Audio Network Guild Awards?",
        "task": 2,
        "task_id": 1979229753146223960,
        "solution": "The number of days in Season 2 (between February 15th, 2023, and April 11th, 2023), is 55 days. Season 6 was between September 27th, 2023 - December 5th, 2023. Call of Duty: Modern Warfare III won Sound Design of the Year at the 2025 Game Audio Network Guild Awards.",
        "urls": [
            "https://callofduty.fandom.com/wiki/Season_Two_(Modern_Warfare_II)",
            "https://callofduty.fandom.com/wiki/Season_Six_(Modern_Warfare_II)",
            "https://en.wikipedia.org/wiki/Game_Audio_Network_Guild_Awards",
            "https://en.wikipedia.org/wiki/Call_of_Duty:_Modern_Warfare_III_(2023_video_game)"
        ],
        "true_answer": "56, Season 6, Sound Design of the Year",
        "file_name": ""
    },
    {
        "question": "Between 2000 and 2023 in Sandy Valley, Nevada, which increased at a higher average annual rate: the population or the median house value? By how many percentage points did the compound annual growth rate of the faster-growing indicator exceed the other? Use the 2010 population (2,051) and the reported +13.7% population change since 2000 to approximate the 2000 population. For median house value, use the data provided by city-data. Whose music video filmed at Sandy Valley won a video of the year at Sunday’s Academy of Country Music Awards?",
        "task": 2,
        "task_id": 3780653728075497369,
        "solution": "1. Get Sandy Valley's 2010 population (2,051) and the population change since 2000 (+13.7%) from City-Data to compute 2000 population: 2,051/1.137 ≈ 1,804.\n2. Get 2023 population (1,752) from Nevada-Demographics.\n3. Compute population CAGR: ((1,752 / 1,804)^(1/23)) - 1 ≈ -0.13% per year.\n4. Median house value: $87,700 (2000), $238,725 (2023).\n5. Compute house value CAGR: ((238,725 / 87,700)^(1/23)) - 1 ≈ 4.45% per year.\n6. Subtract: 4.45% - (-0.13%) = 4.58 percentage points per year.  Country singer Kelsea Ballerini's music video Peter Pan (2015) was filmed in Sandy Valley.",
        "urls": [
            "https://www.city-data.com/city/Sandy-Valley-Nevada.html",
            "https://www.nevada-demographics.com/sandy-valley-demographics",
            "https://en.wikipedia.org/wiki/Kelsea_Ballerini",
            "https://neon.reviewjournal.com/music/country-star-ballerini-filmed-acm-nominated-video-in-nevada-desert-video/"
        ],
        "true_answer": "median house value,4.58, Kelsea Ballerini",
        "file_name": ""
    },
    {
        "question": "Which of Shinhwa's studio albums, released in the early 2010s, achieved the highest physical sales in their home country, and by what percentage does its sales surpass the album with the lowest sales in this period? Additionally, calculate the total duration, in seconds, of Shinhwa's debut studio album released specifically for the Japanese market",
        "task": 2,
        "task_id": 4574320838967701731,
        "solution": "Find Shinhwa's physical sales: 'The Return' (2012): 88,059; 'The Classic' (2013): 78,727; 'We' (2015): 57,808 (from Wikipedia). 'The Return' has the highest physical sales at 88,059, and 'We' has the lowest at 57,808. The sales of 'The Return' exceed those of 'We' by (88,059 - 57,808) / 57,808 = 52.33%. Shinhwa's first Japanese studio album, Inspiration #1, has a total length of 55 minutes and 18 seconds, 3,318 seconds",
        "urls": [
            "https://en.wikipedia.org/wiki/Shinhwa_discography",
            "https://en.wikipedia.org/wiki/Inspiration_(Shinhwa_album)"
        ],
        "true_answer": "The Return, 52.33%, 3,318",
        "file_name": ""
    },
    {
        "question": "Consider three narrative poems featured in Family Friend Poems' collection: 'The Three Little Pigs', 'Annabel Lee', and 'The Ballad of the Harp-Weaver'. Using reliable sources, determine the age at which each poet was when their respective poem was published. What is the average of these ages, rounded to two decimal places?",
        "task": 2,
        "task_id": -3852580437570435337,
        "solution": "Step 1: Find the publication year for each poem: 'The Three Little Pigs' (1982), 'Annabel Lee' (1849), 'The Ballad of the Harp-Weaver' (1922). Step 2: Find the birth years: Dahl (1916), Poe (1809), Millay (1892). Step 3: Calculate ages at publication: Dahl: 1982-1916=66, Poe: 1849-1809=40, Millay: 1923-1892=31. Step 4: Compute average: (66+40+30)/3 = 136/3 = 45.33 (rounded to two decimal places).",
        "urls": [
            "https://www.familyfriendpoems.com/collection/narrative-poems/",
            "https://en.wikipedia.org/wiki/Roald_Dahl",
            "https://en.wikipedia.org/wiki/Edgar_Allan_Poe",
            "https://en.wikipedia.org/wiki/Edna_St._Vincent_Millay",
            "https://www.pulitzer.org/winners/edna-st-vincent-millay"
        ],
        "true_answer": "45.33",
        "file_name": ""
    },
    {
        "question": "Between 1945 and 2010, considering only those leaders of the Labour Party and the Conservative Party who became Prime Minister after first being appointed party leader, which party's leaders waited fewer days on average from becoming party leader to becoming Prime Minister—and what was the average time in days for each party in that period (rounded to one decimal place)?",
        "task": 2,
        "task_id": -4224272470314945287,
        "solution": "1. From the Timeline Geek's Labour and Conservative leader timelines, collect all leaders 1945–2010 who became PM after being appointed party leader (with their leader and PM start dates):\n   - Labour: Clement Attlee (25 Oct 1935–26 Jul 1945, 3,562 days), Harold Wilson (14 Feb 1963–16 Oct 1964, 610 days), James Callaghan (5 Apr 1976–5 Apr 1976, 0 days), Tony Blair (21 Jul 1994–2 May 1997, 1016 days), Gordon Brown (24 Jun 2007–27 Jun 2007, 3 days).\n   - Conservative: Anthony Eden (21 Apr 1955–6 Apr 1955, -15 days), Harold Macmillan (22 Jan 1957–10 Jan 1957, -12 days), Alec Douglas-Home (11 Nov 1963–18 Oct 1963, -24 days), Edward Heath (27 Jul 1965–19 Jun 1970, 1788 days), Margaret Thatcher (11 Feb 1975–4 May 1979, 1543 days), John Major (27 Nov 1990–28 Nov 1990, 1 day), David Cameron (6 Dec 2005–11 May 2010, 1617 days).\n2. For each, calculate the number of days from party leader start to PM start.\n3. For Labour: [3562, 610, 0, 1016, 3] → mean = 1037.2 days\n   For Conservative: [-15,-12,-24,1788,1543,1,1617] → mean = 699.7 days\n4. Labour leaders waited less time on average (699.7 < 1037.2 days).\n",
        "urls": [
            "https://en.wikipedia.org/wiki/Leader_of_the_Labour_Party_(UK)",
            "https://en.wikipedia.org/wiki/Leader_of_the_Conservative_Party_(UK)",
            "https://en.wikipedia.org/wiki/List_of_prime_ministers_of_the_United_Kingdom"
        ],
        "true_answer": "Conservative Party leaders, with an average of 699.7 days (vs 1037.2 days for the Conservatives)",
        "file_name": ""
    },
    {
        "question": "Since the introduction of leap seconds in 1972, what is the average number of leap seconds added per full decade (1972–1979, 1980–1989, etc.)? Additionally, how many unique countries or territories currently use UTC+5:30 or UTC+5:45 as their standard time zones (only answer one single number)?",
        "task": 2,
        "task_id": 9175303337609493156,
        "solution": "Leap seconds per decade: [9, 6, 7, 2, 3]. Mean per decade = 5.4. Countries/territories at UTC+5:30/+5:45: ['India', 'Sri Lanka', 'Nepal'].",
        "urls": [
            "https://en.wikipedia.org/wiki/List_of_UTC_offsets",
            "https://en.wikipedia.org/wiki/Leap_second"
        ],
        "true_answer": "5.4, 3",
        "file_name": ""
    },
    {
        "question": "In four countries that were ruled by military dictatorships in the late 20th or early 21st centuries, what is the average duration (in years, integer) of the most recent continuous civilian rule after the end of military dictatorship, counting until the next return to military rule (if one occurred), or to the year 2024 if not? These nations include one from South America, one from Southeast Asia, one from the Sahel, and another from the Southern Cone with the lowest population density.",
        "task": 2,
        "task_id": 417237765250927199,
        "solution": "1. For each country, determine the year in which military rule ended: Brazil (1985), Argentina (1983), Myanmar (2011), Mali (2013). 2. Find out the year in which the next military takeover occurred, if any, or use 2024 if uninterrupted civilian rule. - Brazil: continuous civilian rule as of 2024. - Argentina: continuous civilian rule as of 2024. - Myanmar: military coup in 2021. - Mali: coups in 2012 and 2020, but 2013 marked the end of the post-coup transition; another coup in 2020 marks the next military rule. 3. Calculate the duration for each country: - Brazil: 2024-1985 = 39 years; - Argentina: 2024-1983 = 41 years; - Myanmar: 2021-2011 = 10 years; - Mali: 2020-2013 = 7 years. 4. Calculate the average: (39 + 41 + 10 + 7) / 4 = 24.25 years. Final answer is 24.25.",
        "urls": [
            "https://en.wikipedia.org/wiki/Military_dictatorship_in_Brazil",
            "https://en.wikipedia.org/wiki/History_of_Argentina_(1976–1983)",
            "https://en.wikipedia.org/wiki/Myanmar",
            "https://en.wikipedia.org/wiki/Mali",
            "https://en.wikipedia.org/wiki/2020_Malian_coup_d%27%C3%A9tat",
            "https://en.wikipedia.org/wiki/2021_Myanmar_coup_d%27%C3%A9tat"
        ],
        "true_answer": 24.25,
        "file_name": ""
    },
    {
        "question": "Robert Peterson was awarded Southwest Oklahoma's Artist of the Year, and his portrait of author Ernest J. Gaines was featured on the USPS Black Heritage Forever Stamp. From the year Peterson received his award through the release of the Gaines stamp, what was the average number of stamps issued per year in the USPS Black Heritage Series (rounded to one decimal place)? What is the denomination of the next USPS Black Heritage Forever series stamp?",
        "task": 2,
        "task_id": 650068060656945397,
        "solution": "Step 1: Identify Peterson's award year as 2016 (from gallery bios and press sources). Step 2: Determine the number of Black Heritage Series stamps released from 2016 to 2023 inclusive (consult Mystic Stamp Discovery Center and USPS official listing; one stamp per year is issued in this period: Richard Allen (2016), Dorothy Height (2017), Lena Horne (2018), Gregory Hines (2019), Gwen Ifill (2020), August Wilson (2021), Edmonia Lewis (2022), Ernest J. Gaines (2023)). Step 3: Count the number of years: 2023 - 2016 + 1 = 8. Step 4: Count total stamps: 8. Step 5: Compute average per year = 8 / 8 = 1. ",
        "urls": [
            "https://www.albertzbenda.com/artists/52-robert-peterson/",
            "https://1xrun.com/collections/robert-peterson",
            "https://about.usps.com/who/profile/history/african-american-stamp-subjects.htm"
        ],
        "true_answer": "1.0, 68¢",
        "file_name": ""
    },
    {
        "question": "For each major Chapter expansion of The Elder Scrolls Online released from 2017 to 2024, calculate the relative percentage change in the 'Average Players' statistic on Steam from the month immediately before to the month of the expansion's release (on PC). What is the mean of these percentage changes, rounded to two decimal places?",
        "task": 2,
        "task_id": 8923978276617208036,
        "solution": "Step 1: Gather major Chapter expansion names and their release months from ESO-Hub and Wikipedia:\n  - Morrowind (2017-06), Summerset (2018-05), Elsweyr (2019-06), Greymoor (2020-05), Blackwood (2021-06), High Isle (2022-06), Necrom (2023-06), Gold Road (2024-06).\nStep 2: For each expansion, obtain Steamcharts average player counts for the month before and the month of release. The results:\n  Morrowind: 7.37%\n  Summerset: -7.32%\n  Elsweyr: -1.33%\n  Greymoor: -17.82%\n  Blackwood: 12.14%\n  High Isle: 18.31%\n  Necrom: 4.12%\n  Gold Road: 4.51%\nStep 4: Take the mean of the above 8 percentage changes: (Sum/8) = 2.50%\nFinal Answer: 2.50%",
        "urls": [
            "https://steamcharts.com/app/306130",
            "https://en.wikipedia.org/wiki/The_Elder_Scrolls_Online"
        ],
        "true_answer": "2.50%",
        "file_name": ""
    },
    {
        "question": "Consider the first seven months following the Steam launches of Game A, which was originally developed as part of a well-known series and intended as a direct sequel to its first installment, however, its direction was significantly reworked during development: Initially announced in 2011, but faced multiple delays and underwent two name changes; and Game B, a Korean MMO action RPG broke Steam records as the #2 most-played game (1.3M+ players) at launch (including the release month). What was the average month-on-month percentage change in average players for each game during this period? Which title retained its players better according to this metric (where a higher average is better, even if both are negative)? Round each result to two decimal places. Maybe you can use data from steamcharts.",
        "task": 2,
        "task_id": -835757129263256028,
        "solution": "Extracted THRONE AND LIBERTY (Oct 2024–Mar 2025) avg player counts (155026.1, 84039.2, 47179.0, 28270.3, 19845.0, 25877.4, 21326.8) and Lost Ark (Feb 2022–Jul 2022) avg player counts (699094.1, 444117.2, 354897.8, 489020.4, 465789.3, 214630.6, 168339.4) from their Steamcharts pages. Calculated each month-on-month percent change between months 1–6: \nTHRONE AND LIBERTY: [-45.79, -43.86, -40.08, -29.80, +30.40, -17.59], mean is -24.45%; Lost Ark: [-36.47, -20.09, +37.79, -4.75, -53.92, -21.57], mean is -16.50%. Lost Ark had a less negative drop and thus retained players better by this metric.",
        "urls": [
            "https://steamcharts.com/app/2429640",
            "https://steamcharts.com/app/1599340"
        ],
        "true_answer": "THRONE AND LIBERTY: -24.45%. Lost Ark: -16.50%. Lost Ark retained players better.",
        "file_name": ""
    },
    {
        "question": "Between 1995 and 2023, the band fronted by Jeff Tweedy released some studio albums. Between 2017 and 2020, Jeff Tweedy released several solo studio albums. Calculate the average number of years between consecutive studio albums of the band and between consecutive Jeff Tweedy solo albums. By how many years is the band average gap longer than the Tweedy solo average gap (rounded to two decimal places)?",
        "task": 1,
        "task_id": -3550707668212260324,
        "solution": "Step 1: List Wilco studio album years: 1995, 1996, 1999, 2001, 2004, 2007, 2009, 2011, 2015, 2016, 2019, 2022, 2023. There are 12 gaps: [1, 3, 2, 3, 3, 2, 2, 4, 1, 3, 3, 1]; average = (1+3+2+3+3+2+2+4+1+3+3+1)/12 = 28/12 ≈ 2.33 years.\nStep 2: List Tweedy solo album years: 2017, 2018, 2019, 2020. Gaps: [1,1,1]; average = (1+1+1)/3 = 1.0 year.\nStep 3: Compute difference = 2.33 - 1.0 = 1.33 years.",
        "urls": [
            "https://en.wikipedia.org/wiki/Jeff_Tweedy_discography",
            "https://en.wikipedia.org/wiki/Wilco_discography"
        ],
        "true_answer": "1.33",
        "file_name": ""
    },
    {
        "question": "For the 2012 action role-playing first-person shooter video game developed by Gearbox Software and published by 2K, during the three months directly before and the three months directly after the release of its last DLC, what is the average monthly gain in average players in each period, and which period experienced the greater average gain? What is the difference between the two average monthly gains? On which date can a Nintendo Switch player play the 2019-released sequel at the earliest?",
        "task": 2,
        "task_id": -8163932803072907743,
        "solution": "Locate SteamCharts monthly data for Borderlands 2 for April 2013–September 2013. DLC released June 25, 2013.\nThree months BEFORE: April (+1359.3), May (+4600.4), June (-116.0). Average = (1359.3+4600.4-116.0)/3 = 1,947.9.\nThree months AFTER: July (+7752.6), August (-6934.2), September (-3382.9). Average = (7752.6-6934.2-3382.9)/3 = -854.8.\nDifference = 1,947.9 - (-854.8) = 2,802.7. Thus, greater gain before DLC, by 2,802.7 players/month. A Nintendo Switch version of Borderlands 3 was released on 6 October 2023",
        "urls": [
            "https://steamcharts.com/app/49520",
            "https://en.wikipedia.org/wiki/Borderlands_2",
            "https://en.wikipedia.org/wiki/Borderlands_2:_Tiny_Tina%27s_Assault_on_Dragon_Keep"
        ],
        "true_answer": "The three months before the DLC saw an average monthly gain of approximately 1,947.9 players; the three months after saw an average monthly loss of approximately 854.8 players. The pre-DLC period had the greater gain, exceeding the post-DLC period by about 2,802.7 average players per month. A Nintendo Switch player can access the game Borderlands 3 by 6 October 2023.",
        "file_name": ""
    },
    {
        "question": "In the first round of a specific NBA playoff series in 2011, a player known for his passing skills posted 20 assists in a game that ended with his team securing a victory over a New York-based franchise. What is the percentage increase of his assists in that game relative to his average assists per game for the rest of the 2011 playoffs, rounded to the nearest 0.1%, and was this the highest single-game assist total of his 2011 playoff run?",
        "task": 2,
        "task_id": -6849101169444832196,
        "solution": "1. Find Rajon Rondo's assist numbers for each playoff game in the 2011 postseason: [9, 7, 20, 12, 7, 12, 11, 5, 3].\n2. Note that on April 22, 2011, he had 20 assists.\n3. Calculate his assists per game for all other 2011 playoff games (excluding the 20-assist game): (9 + 7 + 12 + 7 + 12 + 11 + 5 + 3) / 8 = 8.25.\n4. Compute the percentage increase: (20 - 8.25) / 8.25 × 100 ≈ 142.4%\n5. Confirm whether 20 assists was his highest single-game total in that playoff run by comparing all game logs. It was.",
        "urls": [
            "https://www.basketball-reference.com/players/r/rondora01/gamelog-playoffs/"
        ],
        "true_answer": "142.4, and yes, it was his highest single-game assist total of the 2011 playoffs.",
        "file_name": ""
    },
    {
        "question": "Among the top 10 most played games on Steam as of January 2025, which were released after the release of an action-adventure game developed by Santa Monica Studio loosely based on Norse mythology, what is the average of their average concurrent player counts as reported for that period (rounded to two decimal places)?",
        "task": 2,
        "task_id": 7831138617767094762,
        "solution": "Step 1: Obtain the top 10 most played Steam games by concurrency for January 2025 from authoritative Statista chart. Step 2: Cross-reference each game's release year from Wikipedia or the Steam store. Step 3: Filter for games released after 2018: Counter-Strike 2 (2023), Marvel Rivals (2024), Path of Exile 2 (2024), Palworld (2024), Banana (2024), NARAKA: BLADEPOINT (2021). Step 4: Extract their average concurrent player numbers from the Statista chart: 949,652 (CS2), 285,445 (Marvel Rivals), 251,622 (PoE2), 140,431 (Palworld), 93,920 (Banana), 93,422 (Naraka). Step 5: Compute the arithmetic mean: (949652 + 285445 + 251622 + 140431 + 93920 + 93422) / 6 = 302415.33. Step 6: The answer is 302415.33.",
        "urls": [
            "https://www.statista.com/statistics/1179973/steam-games-peak-concurrent-players/",
            "https://en.wikipedia.org/wiki/Counter-Strike_2",
            "https://en.wikipedia.org/wiki/Marvel_Rivals",
            "https://en.wikipedia.org/wiki/Path_of_Exile_2",
            "https://en.wikipedia.org/wiki/Naraka:_Bladepoint",
            "https://store.steampowered.com/app/1623730/Palworld/",
            "https://store.steampowered.com/app/2882010/Banana/"
        ],
        "true_answer": 302415.33,
        "file_name": ""
    },
    {
        "question": "According to the ROSPaCe dataset (Nature, 2024), how many attack types are specific to the ROS2 protocol? What is the F1 score reported for the DPDGAD anomaly detection model on the SWaT dataset? Additionally, Furthermore, a 2021 research paper presented at an IEEE conference specializing in software testing and validation achieved even greater anomaly detection performance on the same dataset—what is the reported performance?",
        "task": 2,
        "task_id": 4827680960242105299,
        "solution": "From the ROSPaCe article (Nature, 2024), there are 3 attack types that are specific to ROS2 protocol (see background & summary). DPDGAD (ScienceDirect, 2024) reports an F1 score of 0.9009 on the SWaT dataset.",
        "urls": [
            "https://www.sciencedirect.com/science/article/abs/pii/S1474034624001952",
            "https://www.nature.com/articles/s41597-024-03311-2",
            "https://ieeexplore.ieee.org/document/9438560/"
        ],
        "true_answer": "3, 0.9009, 0.97585",
        "file_name": ""
    },
    {
        "question": "Between 2012 and 2021, was the rate of increase (by the slope of the linear regression line fitted to the data) in China's average annual temperature higher or lower than the global land and ocean average temperature anomalies, and what is the Pearson correlation coefficient (to two decimal places) between China's average annual temperature and its CO₂ emissions per capita over the same period?",
        "task": 2,
        "task_id": -961224475804687018,
        "solution": "China's national average annual temperature (°C, 2012–2021): 7.09, 7.76, 7.72, 8, 7.95, 8.13, 7.79, 8.04, 7.96, 8.24 — from TradingEconomics. China's CO₂ emissions per capita: 7.140667, 7.2199216, 7.1978407, 7.0648346, 7.0648346, 7.084865, 7.2818375, 7.5264006, 7.6471825, 8.025529. Global mean temperature anomaly (°C, 2012–2021): 0.75, 0.66, 0.83, 0.83, 0.96, 0.91, 0.84, 0.91, 1.00, 0.82 — from NOAA/NCEI. Fit a linear regression (least squares) for each temperature series over the 10 years. Slope for China: 0.08 °C/year; for global anomaly: 0.021 °C/year. Thus, China's warming rate for this period is higher than the global rate. Calculate the (Pearson) correlation coefficient between China's temperature & CO₂ emissions: result is 0.42 (rounded to two decimal places).",
        "urls": [
            "https://tradingeconomics.com/china/temperature",
            "https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance/global/time-series",
            "https://ourworldindata.org/co2/country/china"
        ],
        "true_answer": "China's rate of increase was higher than the global average. The Pearson correlation coefficient between China's average annual temperature and its CO₂ emissions for 2012–2021 is 0.42.",
        "file_name": ""
    },
    {
        "question": "Based on the historical archives of a theater located in Pennsylvania (zip code 18017), in which decade from 2000 to 2020 did it produce the highest average number of shows per year? Which play was performed at the Pennsylvania Playhouse in 1944? My elderly brother attended a play on January 30, 1966. What show was he watching?",
        "task": 2,
        "task_id": -136595700085404014,
        "solution": "From the Playhouse Archives (http://archives.paplayhouse.org), count shows for each year of the 2000s and 2010s. Average the yearly totals per decade: 2010s (6.9), 2000s (6.7). The 2010s are higher.",
        "urls": [
            "http://archives.paplayhouse.org",
            "http://archives.paplayhouse.org/displayShow.php?year=1944&show=01PursuitOfHappiness",
            "http://archives.paplayhouse.org/displayShow.php?year=1966&show=01BirthdayOfAPoet"
        ],
        "true_answer": "The 2010s had the highest average. Pursuit Of Happiness. Birthday Of A Poet",
        "file_name": ""
    },
    {
        "question": "Following a significant update to a popular multiplayer survival game, a new game mode was introduced that emphasized low-tech combat and siege warfare. For the month following this update, what was the average daily player count (according to steamcharts), and what was the lowest game price in USD during that month? When did the game support native ARM64 support on Mac for Apple Silicon.",
        "task": 1,
        "task_id": 1947769723475181729,
        "solution": "1. From the official Rust patch notes (Facepunch), identify the Primitive update in January 2025. 2. Find the average daily player count for Rust in February 2025: 121,569.1. 3. From IsThereAnyDeal price chart, observe Rust's price for Feb 2025 was $39.99 and historical low was below this at a later date. Final answer: average daily player count = 125,216; price ≈ $19.99 (Feb 6, 2025, according to SteamDB). Rust added native ARM64 support on Mac for Apple Silicon on December 5, 2024",
        "urls": [
            "https://steamcharts.com/app/252490",
            "https://rust.facepunch.com/changes",
            "https://steamdb.info/app/252490/"
        ],
        "true_answer": "121,569.1; $19.99; December 5 2024",
        "file_name": ""
    },
    {
        "question": "Between 2015 and 2019, what was the average annual percentage change in total passenger numbers at Heathrow airport (rounded to two decimal places)? A recent airline joined the SkyTeam alliance in 2024. How many unique scheduled destinations (ignoring the seasonal ones) does this airline have at London Heathrow Airport? If I am a passenger traveling from Heathrow to Shanghai Pudong Airport, how many airlines can I choose from (consider only non-stop flights)?",
        "task": 2,
        "task_id": 1332270832471094242,
        "solution": "1. Obtain Heathrow annual passenger numbers for 2015–2019 from official sources. Numbers used: 2015: 74,959,058, 2016: 75,676,223, 2017: 77,988,752, 2018: 80,102,017, 2019: 80,884,310. 2. The average annual percentage change in total passenger numbers at Heathrow Airport between 2015 and 2019 was approximately 1.92%. 3. There are 3 unique scheduled destinations (Copenhagen, Oslo, Stockholm–Arlanda) operated by Scandinavian Airlines. 4. Passengers traveling from London Heathrow (LHR) to Shanghai Pudong International Airport (PVG) have two airline options: British Airways and China Eastern Airlines.",
        "urls": [
            "https://en.wikipedia.org/wiki/SkyTeam",
            "https://en.wikipedia.org/wiki/Heathrow_Airport"
        ],
        "true_answer": "Average annual percentage change: 1.92%; destinations: 3; options: 2",
        "file_name": ""
    },
    {
        "question": "According to a local news summary, Cape Cod Church’s weekly attendance was 140 in 2012. What was the average annual percentage increase in weekly attendance at Cape Cod Church between 2012 and 2015 (rounded to )? Where would Cape Cod Church rank in the 2023 Fastest-Growing Churches in America leaderboard based on its weekly attendance growth between 2012 and 2015 (considering only the percentage gain)?",
        "task": 2,
        "task_id": 2688906001956771964,
        "solution": "Cape Cod Church attendance: 140, 1190 (2015, Outreach 100 ranking). Years = 3 (2012 to 2015). Compute CAGR: (1190/140)^(1/3)-1 ≈ 104% average per year. Cape Cod Church would be ranked at the 4th place with 104%, the top three are Kingdom Fellowship AME Church (509%), Grace Church (277%), Maryland Community Church (136%).",
        "urls": [
            "https://outreach100.com/fastest-growing-churches-in-america/2023",
            "https://outreach100.com/churches/cape-cod-church"
        ],
        "true_answer": "104%, 4th",
        "file_name": ""
    },
    {
        "question": "Consider a former British colony that achieved independence in the early 1960s and saw significant political instability in the years following. In which year did this country undergo the highest number of official transitions in its executive leadership, based strictly on start and end dates of office? Additionally, an assassination attempt was made on the life of the inaugural head of government post-independence. In what year did this event occur?",
        "task": 2,
        "task_id": 995907099368093706,
        "solution": "The year with the most transitions is the one with the most handovers (1985). Apollo Milton Obote served as the second (the first after independence) prime minister of Uganda from 1962 to 1966. On 19 December 1969, there was an assassination attempt against Obote.",
        "urls": [
            "https://en.wikipedia.org/wiki/Prime_Minister_of_Uganda",
            "https://en.wikipedia.org/wiki/Uganda",
            "https://en.wikipedia.org/wiki/Milton_Obote"
        ],
        "true_answer": "1985, 1969",
        "file_name": ""
    },
    {
        "question": "For all Broadway shows based on Hans Christian Andersen’s fairy tales that are listed in the Internet Broadway Database—including The Little Mermaid, Once Upon a Mattress (The Princess on the Pea), The Emperor’s New Clothes, The Chinese Nightingale, The Garden of Paradise (The Little Mermaid), and Ib and Little Christina—what is the average number of years that elapsed between the original Danish publication of each tale (as per the official Odense H.C. Andersen Centre registry) and its Broadway debut? Please round your answer to two decimal places. Which specific story by the Danish author, known for his fairy tales, was honored by an award recognizing works of speculative fiction with libertarian themes, and in which year was it inducted?",
        "task": 2,
        "task_id": 5665421765738874623,
        "solution": "Step 1: From IBDB, list Broadway shows based on Andersen fairy tales and determine their debut years: The Little Mermaid (2008), Once Upon a Mattress (1959, from wiki), The Emperor’s New Clothes (1935), The Chinese Nightingale (1934), The Garden of Paradise (1914, IBDB notes based on The Little Mermaid), Ib and Little Christina (1900).\nStep 2: For each show, match the original Andersen tale and find its original Danish publication date from the Odense H.C. Andersen Centre registry: The Little Mermaid (7 April 1837), The Princess on the Pea (8 May 1835), The Emperor’s New Clothes (7 April 1837), The Nightingale (11 November 1843), Ib and Little Christine (30 April 1855). For The Garden of Paradise, per IBDB, use The Little Mermaid (1837).\nStep 3: For each pairing, subtract original publication year from Broadway debut year to get number of years elapsed: The Little Mermaid: 2008-1837=171, Once Upon a Mattress: 1959-1835=124, The Emperor’s New Clothes: 1935-1837=98, The Chinese Nightingale: 1934-1843=91, The Garden of Paradise: 1914-1837=77, Ib and Little Christina: 1900-1855=45. \nStep 4: Compute the average: (171 + 124 + 98 + 91 + 77 + 45) / 6 = 606 / 6 = 101.00 (rounded to two decimal places). Andersen's fable The Emperor's New Clothes was inducted in 2000 into the Prometheus Hall of Fame for Best Classic Fiction.",
        "urls": [
            "https://en.wikisource.org/wiki/Author:Hans_Christian_Andersen",
            "https://www.ibdb.com/broadway-cast-staff/9267",
            "https://en.wikipedia.org/wiki/Ib_and_Little_Christina",
            "https://en.wikipedia.org/wiki/Prometheus_Award#Hall_of_Fame_Award_inductees",
            "https://en.wikipedia.org/wiki/Once_Upon_a_Mattress",
            "https://en.wikipedia.org/wiki/The_Little_Mermaid"
        ],
        "true_answer": "101.00, The Emperor's New Clothes, 2000",
        "file_name": ""
    },
    {
        "question": " How many copies did Clair Obscur: Expedition 33 sell on average per day during its first 33 days after its releasing day (in millions, rounded to one decimal place)? What is the lowest critic review score on Metacritic for Clair Obscur: Expedition 33 on the PlayStation 5 platform? What review score did this critic give to Persona 5 Royal on the PlayStation 4 platform?",
        "task": 3,
        "task_id": -6536265736445143244,
        "solution": "1. Clair Obscur sold 3,300,000 copies in 33 days, so average daily sales = 3,300,000 / 33 = 100,000 (0.1 million). 2.the lowest critic review score on Metacritic for Clair Obscur: Expedition 33 is 70, from IGN Japan. 3. IGN Japan give Persona 5 Royal on the PlayStation 4 a review score of 89.",
        "urls": [
            "https://www.metacritic.com/game/clair-obscur-expedition-33/",
            "https://www.metacritic.com/game/persona-5-royal/",
            "https://www.metacritic.com/game/persona-5-royal/critic-reviews/?platform=playstation-4",
            "https://x.com/thegameawards/status/1927366458792686056",
            "https://jp.ign.com/persona-5-the-royal/40086/review/5",
            "https://jp.ign.com/clair-obscur-expedition-33/79204/review/clair-obscur-expedition-33"
        ],
        "true_answer": "0.1 million, 70, 89",
        "file_name": ""
    },
    {
        "question": "Between the 2018 and 2022 Pennsylvania House of Representatives general elections, which district experienced a party change at every election cycle (i.e., the winner's party switched each time), and what is the highest number of consecutive incumbent defeats in the same district during this period?",
        "task": 3,
        "task_id": 9150948315246355098,
        "solution": "Party change: 020: District 33, District 55, District 123, District 143, District 152; 2022: District 7, District 9, District 10, District 26, District 29, District 30, District 33, District 49, District 50, District 54, District 82, District 104, District 105, District 119, District 129, District 134, District 144, District 151, District 168, District 189. District 33 experienced two consecutive party change. the highest number of consecutive incumbent defeats is 1",
        "urls": [
            "https://en.wikipedia.org/wiki/2018_Pennsylvania_House_of_Representatives_election",
            "https://en.wikipedia.org/wiki/2020_Pennsylvania_House_of_Representatives_election",
            "https://en.wikipedia.org/wiki/2022_Pennsylvania_House_of_Representatives_election"
        ],
        "true_answer": "District 33, 1",
        "file_name": ""
    },
    {
        "question": "According to the Wyoming Secretary of State's official list as of May 2025, and the Delaware Division of Corporations' current public registry, which commercial registered agent appears in both state lists and comes first alphabetically? Which email should I contact that ommercial registered agent?",
        "task": 2,
        "task_id": -8476030360422449783,
        "solution": "1) Download Wyoming's official commercial registered agent list and count unique names: 370 (from https://sos.wyo.gov/Business/Docs/CRA-Roster.pdf). 2) Scrape/parse the Delaware Division of Corporations online registry: (https://corp.delaware.gov/agents/). 3) Extract all agent names. 4) Standardize names (ignore case, spacing, punctuation inconsistencies). 5) Find set intersection for agents listed in both states; sort this set alphabetically. 6) The first name alphabetically is 'Accumera LLC'. The email is info@accumera.com",
        "urls": [
            "https://sos.wyo.gov/Business/Docs/CRA-Roster.pdf",
            "https://corp.delaware.gov/agents/"
        ],
        "true_answer": "Accumera LLC, info@accumera.com",
        "file_name": ""
    },
    {
        "question": "Consider a 2021 action role-playing game, developed and published by a major Japanese video game company, that underwent a significant expansion approximately one year post-launch. Using the official update timeline and aggregating monthly average concurrent player data from Steam (as typically available in SteamCharts summaries), answer the following questions: \n1. What is the average change in monthly concurrent players from the three months before its largests DLC's release to the three months after (excluding the release month in both cases, rounded to one decimal place)?\nIn which month during the first year after its release did this game experience the largest single-month positive jump in average concurrent Steam players, and what was the value of this jump?",
        "task": 2,
        "task_id": -4788624492273227954,
        "solution": "Release/expansion dates confirmed with Capcom update portal and Wikipedia.\nMonthly average players March–May 2022: 9978.6 (Mar), 7661.9 (Apr), 15055.8 (May): average ≈ 10898.8.\nAfter expansion (July–Sep 2022): 86887.4 (Jul), 44483.8 (Aug), 19501.8 (Sep): average ≈ 50291.0.\nAverage change = 39392.2 (rounded to nearest integer).\nLargest positive jump was June–July 2022: 86887.4 - 25878.3 = 61009.1.",
        "urls": [
            "https://steamcharts.com/app/1446780",
            "https://www.monsterhunter.com/rise-sunbreak/update/en-us/",
            "https://en.wikipedia.org/wiki/Monster_Hunter_Rise"
        ],
        "true_answer": "1. The average increase is approximately 39392.2 concurrent players.\n2. July 2022 saw the largest single-month jump, increasing by 61009.1 concurrent players over June 2022.",
        "file_name": ""
    },
    {
        "question": "I am a fan of an artist. Imagine a toddler mastering piano before most kids can tie their shoes, at age 2! By age 8, she wrote her first song—about a stealthy, sneaker-wearing jungle cat; and another music about bat. She released several self-produced solo studio albums between 1988 and 2010. What is the average interval between consecutive album release years (format: x.x years) for these albums, and what percentage of these albums is currently available on her Bandcamp page (rounded to nearest integer)?",
        "task": 3,
        "task_id": -3729973042151219549,
        "solution": "1. Gather solo album titles and release years from the artist’s website: Beth Quist (1988), Whispering Wood (1991), Lucidity (1995), Shall We Dance (1998), Silver (2003), New Moon (2010).\n2. Compute intervals between consecutive release years: [3, 4, 3, 5, 7].\n3. Average interval: (3+4+3+5+7)/5 = 22/5 = 4.4 years.\n4. From Bandcamp, match available solo albums by name: Shall We Dance, Silver.\n5. Compute percentage: 2/6 = 0.333..., or 33% (rounded to nearest integer).\nFinal answer: 4.4 years; 33%.",
        "urls": [
            "https://bethquist.bandcamp.com/",
            "https://bethquist.com/"
        ],
        "true_answer": "4.4 years; 33%",
        "file_name": ""
    },
    {
        "question": "In a notable regular-season game in late January during the 1996-1997 NBA season, two teams from the Midwest Division faced off, with a final score difference of only 4 points. By how many points did each team exceed their own scoring average, and which team exceeded their scoring average by a greater margin?",
        "task": 2,
        "task_id": -7477231534050171269,
        "solution": "1. Game box score: Denver 113, Houston 109. 2. Denver 1996–97 PPG: 97.8; Houston: 100.6. 3. Compute: Denver delta = 113 - 97.8 = 15.2; Houston delta = 109 - 100.6 = 8.4. 4. Denver's margin is greater.",
        "urls": [
            "https://www.espn.com/nba/game/_/gameId/170130010/nuggets-rockets",
            "https://www.basketball-reference.com/teams/DEN/1997.html",
            "https://www.basketball-reference.com/teams/HOU/1997.html"
        ],
        "true_answer": "Denver Nuggets exceeded their season average by 15.2 points; Houston Rockets by 8.4 points; Denver Nuggets exceeded by a greater margin.",
        "file_name": ""
    },
    {
        "question": "Considering the Cameron–Clegg coalition Cabinet formed in 2010, the second Cameron ministry formed in 2015, and the first Johnson ministry formed in 2019, what was the average change in the number of women holding full Cabinet posts per transition across these three governments (include the sign, + for an increase and - for a decrease)?",
        "task": 2,
        "task_id": -6335381038203957873,
        "solution": "Step 1: Find the number of women holding full UK Cabinet posts at the formation of each Cabinet:\n  - In 2010: 4 women (Cameron–Clegg coalition).\n  - In 2015: 7 women (second Cameron ministry).\n  - In 2019: 6 women (first Johnson ministry).\nStep 2: Calculate the difference per reshuffle:\n  - 2010→2015: 7-4 = +3\n  - 2015→2019: 6-7 = -1\nStep 3: Calculate the average change per reshuffle: (+3 + -1) / 2 = +1.\nFinal answer: The average change in number of female full Cabinet members per reshuffle from 2010 to 2019 is +1.",
        "urls": [
            "https://en.wikipedia.org/wiki/Cameron%E2%80%93Clegg_coalition",
            "https://en.wikipedia.org/wiki/Second_Cameron_ministry",
            "https://en.wikipedia.org/wiki/First_Johnson_ministry"
        ],
        "true_answer": "+1",
        "file_name": ""
    },
    {
        "question": "Across these societies: British Origami Society, Italian Origami Society, Belgian-Netherlands Origami Society, OrigamiUSA, Japan Origami Academic Society, Israeli Origami Center:\nWhat is the average number of years between each consecutive founding? OrigamiUSA has sponsored several conventions. Which city was the most recent conference held at the American Museum of Natural History in 2009, and on what specific date was it held?",
        "task": 2,
        "task_id": 4788212851660350261,
        "solution": "Step 1: List society founding years: 1967, 1978, 1979, 1980, 1990, 1993. Step 2: Calculate intervals: 1978-1967=11, 1979-1978=1, 1980-1979=1, 1990-1980=10, 1993-1990=3. Step 3: Compute the average: (11+1+1+10+3)/5 = 5.2 years.",
        "urls": [
            "https://en.wikipedia.org/wiki/British_Origami_Society",
            "https://cfcorigami.com/group/cdo-centro-diffusione-origami",
            "https://www.origami-osn.nl/en/content/origami-society-netherlands",
            "https://en.wikipedia.org/wiki/OrigamiUSA",
            "https://www.britannica.com/art/origami/History-of-origami",
            "https://helenhiebertstudio.com/podcast/miri-golan/",
            "https://origamiusa.org/copyrightmtg2009"
        ],
        "true_answer": "5.2, New York City, June 30 2009",
        "file_name": ""
    },
    {
        "question": "For the extractive summarization of Australian legal case reports with 1000 labeled samples (using the dataset described in the Springer paper published in September 2023), what is the value of the improvement in ROUGE-1 between the proposed method and a previous extractive summarization model that utilized reinforcement learning and multi-step episodic Markov decision processes (as reported in that paper)? Please specify the publication venues for these two papers: for journals, provide the full journal name; for conferences, include the abbreviation and year (e.g., CVPR 2024).",
        "task": 1,
        "task_id": 1753019310182090853,
        "solution": "Step 1: From Table 6 of the Springer 2023 paper, ROUGE-1 for MemSum is 34.94, for the proposed method is 49.51. The improvement is 49.51 - 34.94 = 14.57.\nStep 2: Find their publication venues: Artificial Intelligence and Law, ACL 2022.",
        "urls": [
            "https://link.springer.com/article/10.1007/s10506-023-09373-8",
            "https://aclanthology.org/2022.acl-long.450.pdf"
        ],
        "true_answer": "14.57, Artificial Intelligence and Law, ACL 2022",
        "file_name": ""
    },
    {
        "question": "In the 2017 Eastern Conference Semifinals between the Boston Celtics and Washington Wizards, the Celtics scored 15 points during the overtime period of Game 2. Was this their highest-scoring overtime period in the entire 2017 NBA Playoffs? Additionally, please provide (a) the Celtics' average points per quarter across all quarters of that 7-game series, and (b) the leaguewide average number of points per team in an overtime period during the 2017 NBA Playoffs? Please provide both values as decimals rounded to one decimal place.",
        "task": 3,
        "task_id": 3534136380915005423,
        "solution": "A. Only Game 2 vs. Wizards (May 2, 2017) went to OT for the Celtics; they scored 15 in that OT.\nB. Celtics per-quarter average in the 7-game series: sum of all non-OT quarters is 757 points over 28 quarters = 27.0 per quarter.\nC. Leaguewide: all OT points in playoffs: 5-15, 6-9, 12-14; average points per team=(5+15+6+9+12+14)/6=10.2.\nD. Thus, yes—15 was their highest all playoffs, and comparison numbers are: 27.0 (Celtics per quarter), 10.2 (league OT per team per OT).",
        "urls": [
            "https://www.basketball-reference.com/playoffs/2017-nba-eastern-conference-semifinals-wizards-vs-celtics.html",
            "https://en.wikipedia.org/wiki/2017_NBA_playoffs"
        ],
        "true_answer": "Yes; the Celtics' 15 points in overtime were their highest (and only) in a 2017 playoff OT. Their per-quarter series average was 27.0; the leaguewide playoff OT average was 10.2.",
        "file_name": ""
    },
    {
        "question": "In the 2024 US presidential election, according to Federal Reserve Bank of St. Louis database on state-level college attainment: What was the difference (in percentage points, rounded to one decimal) between the mean percent of adults with a bachelor's degree or higher in states won by Harris and those won by Trump, and which Trump-won state had the highest such percent? ",
        "task": 2,
        "task_id": 1495211656039876338,
        "solution": "1. List all US states and for each, record the percent of adults with a bachelor's degree or higher, per FRED (St. Louis Fed, 2023 ACS). 2. Mark each state by whether Trump or Harris won it in 2024 (BBC results). 3. Compute the average for Trump-won states and for Harris-won states. 4. Take the difference (Harris minus Trump) and round to one decimal. 5. Scan all Trump-won states and find the one with the highest percentage as of 2023. Result: Mean for Trump-won = 32.0%, Harris-won = 41.7%, difference = 9.7; highest among Trump-won states is Utah, 38.4%.",
        "urls": [
            "https://fred.stlouisfed.org/release/tables?eid=391444&rid=330",
            "https://www.bbc.com/news/election/2024/us/results"
        ],
        "true_answer": "9.7 percentage points; Utah (38.4%)",
        "file_name": ""
    },
    {
        "question": "On the neural network generalization and robustness formalization paper poster-presented at NeurIPS 2021 by Yu-Lin Tsai and colleagues, what is the test accuracy (rounded to 2 decimal places) on MNIST under a weight PGD attack with the largest regularization shown for their theory-driven loss, and which coauthor of that paper was also a co-author of a 2025 ICLR publication on inversion-free backdoor defense via model reprogramming? Additionally, what is the title of a 2023 published survey cite the NeurIPS 2021 paper? Provide your answer as three values in the format: '<accuracy>%, <coauthor>, <title>'.",
        "task": 2,
        "task_id": 9011527928881087935,
        "solution": "1) Go to the NeurIPS 2021 paper 'Formalizing Generalization and Adversarial Robustness of Neural Networks to Weight Perturbations' by Tsai et al. (see NeurIPS and arXiv links). 2) Find Table 1 in the paper/appended supplement: under 'theory-driven loss', for the largest weight perturbation (epsilon_test=0.008 on MNIST), test accuracy is 72.47%. 3) Examine Pin-Yu Chen's DBLP author profile. Search for 2025 ICLR publications: 'REFINE: Inversion-Free Backdoor Defense via Model Reprogramming' first author is Yukun Chen, also a coauthor on the NeurIPS 2021 paper. 4) Use MDPI Electronics 2023 'A Survey of Bit-Flip Attacks on Deep Neural Networks' or ACM 'A.I. Robustness' (2025): check references and confirm that they cite the Tsai et al. paper. 5) Compose the final answer as '72.47%, Yukun Chen, yes'.",
        "urls": [
            "https://proceedings.neurips.cc/paper/2021/hash/a3ab4ff8fa4deed2e3bae3a5077675f0-Abstract.html",
            "https://arxiv.org/abs/2103.02200",
            "https://dblp.uni-trier.de/pid/39/8969.html",
            "https://www.mdpi.com/2079-9292/12/4/853",
            "https://openreview.net/forum?id=Conma3qnaT"
        ],
        "true_answer": "72.47%, Pin-Yu Chen, A Survey of Bit-Flip Attacks on Deep Neural Networks",
        "file_name": ""
    },
    {
        "question": "Between 2022 and 2024, which year did the total U.S. domestic box office gross differ the most from the 3-year average for those years, and by how much (in millions of dollars, rounded)?",
        "task": 1,
        "task_id": -3607250523358975667,
        "solution": "Gather the total US domestic box office gross for 2022 ($7,370M), 2023 ($8,910M), 2024 ($8,570M); calculate the 3-year average: (7,370 + 8,910 + 8,570)/3 = $8,283M; calculate each year’s absolute deviation from this average: 2022: |7,370 - 8,283| = 86.67M; 2023: |8,910 - 8,283| = 627M; 2024: |8,570 - 8,283| = 176.67M; the largest deviation is in 2024, with a difference of $263.33 million.",
        "urls": [
            "https://www.boxofficemojo.com/year/?ref_=bo_nb_hm_secondarytab",
            "https://www.boxofficemojo.com/year/2023/?ref_=bo_hm_yrdom",
            "https://www.boxofficemojo.com/year/2024/?ref_=bo_hm_yrdom",
            "https://www.boxofficemojo.com/year/2025/?ref_=bo_hm_yrdom",
            "https://www.forbes.com/sites/bradadgate/2025/01/05/led-by-sequels-domestic-box-office-grossed-856-billion-in-2024/",
            "https://en.wikipedia.org/wiki/List_of_highest-grossing_films"
        ],
        "true_answer": "2022, 913",
        "file_name": ""
    },
    {
        "question": "The first author of 'TrustAgent: Towards Safe and Trustworthy LLM-based Agents.', in his/her/they first paper published in 2023, the citation numbered [14] in the arXiv v5 version, what is the citation number in the officially accepted conference version?",
        "task": 2,
        "task_id": -7054444608607790919,
        "solution": "The google scholar page of the first author show the paper published in 2023 is https://arxiv.org/pdf/2304.04370. The paper is acceptet at https://proceedings.neurips.cc/paper_files/paper/2023/file/1190733f217404edc8a7f4e15a57f301-Paper-Datasets_and_Benchmarks.pdf. Compare the arxiv v5 and published paper, the paper name [14] of the v5 is: Mike Lewis, 2019. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461 (2019). The number in the accepted paper is [20].",
        "urls": [
            "https://arxiv.org/pdf/2402.01586",
            "https://scholar.google.com/citations?hl=en&user=Yqw8P-QAAAAJ&view_op=list_works&sortby=pubdate",
            "https://proceedings.neurips.cc/paper_files/paper/2023/file/1190733f217404edc8a7f4e15a57f301-Paper-Datasets_and_Benchmarks.pdf",
            "https://arxiv.org/pdf/2304.04370v5"
        ],
        "true_answer": "20",
        "file_name": ""
    }
]