- Daily Oracle is a continuous evaluation benchmark using automatically generated QA pairs from daily news to assess how the future prediction capabilities of LLMs evolve over time. 
- While Daily Oracle is daily updated, so far we release our dataset covering the period from January 1st 2020 to September 30th 2024 (~17.3 questions per day).


`daily_oracle_tf_20240930.csv` \
This file contains 16,802 True/False QA pairs. Each row represents a generated QA pair alongside the article from which the question is generated. Below is a description of each column included in the dataset:
- `question`
- `answer`
- `date` -  the resolution date of the question, also the publishing date of the corresponding news article
- `category` -  category of the question
- `article_selection` - the selection method of this article, "random" means the random selection, "selected" means the hot topic selection method
- `title` - title of the news article
- `text` - the main text of the news article
- `summary` - the summary of the news article, created during the "Article Summary" stage in the QA construction process
- `keypoint` - the keypoint of the news article, also created during the "Article Summary" stage in the QA construction process
- `url` - the url of the news article
- `source_domain` - the source domain of the news article
- `qa_filter` - the results from LLM that evaluates the QA pairs against seven principles as part of the "QA Filtering" step
- `total_points` - the total scores assigned by LLM during the "QA Filtering" step, reflecting the QA pair’s overall quality



``daily_oracle_mc_20240930.csv``
This file contains 13,906 Mutiple Choice QA pairs. The columns are similar to those in the TF dataset, with the addition of the following columns representing the answer choices:
- `choice_a`
- `choice_b`
- `choice_c`
- `choice_d`