The scripts must be executed sequentially as follows:

APPROACH 1:
 1. process_data.py: converts raw datasets into json files inside the folder "./data/processed"
 2. sample_generation.py: generates random samples with RAG=False and stores the different files in their own folders in ./data/raw
 3. plot_generation.py (OPTIONAL): create line charts of all time series samples
 4. caption_refinement.py (run once for each refinement type)
 5. extract_facts.py: extract facts that are mentioned in the captions and saves them in the facts folder (apply it to the captions with added facts)
 6. filter_fake_facts.py: checks the extracted facts from step 5 and discards those deemed false or unclear
 7. fact_bank.py: 
            - applies to the filtered facts from step 6.
            - creates and saves an indexed list of facts as strings
            - creates and saves a tensor of their embeddings, shape = [#facts, emb. size]
 8. remove_common_sense.py: checks the fact bank and removes facts that are too obvious
 9. split_facts_by_period.py: goes through the remaining facts and categorizes them accoding to the time period. A fact might span multiple years, so it can end up in multiple periods.
 10. fact_banks_by_period: creates one fact bank per period. Inside ./by period/10, create one folder per period, each folder contans its list of facts and a tensor of their embeddings. Then, when using RAG in sample_generation.py, only retrieve from the facts in the relevant period.
 11. sample_generation.py: again if you want to set RAG=True, because now the fact bank is ready.


 OPTIONAL: iterate from step 2 (add facts) through 11,  this leads to:
  - growing fact bank
  - refining captions




APPROACH 2 (simpler and works better, no bank needed):
1. process_data.py: converts raw datasets into json files inside the folder "./data/processed"
2. sample_generation.py: generates random samples with RAG=False and stores the different files in their own folders in ./data/raw
3. plot_generation.py (OPTIONAL): create line charts of all time series samples
4. caption_refinement.py: run "add fact" option on raw captions
5. caption_refinement.py: run "factual checking" option on "add fact" captions from step 4.


APPROACH 3 (even simpler, no falsity correction needed but you might need to generate more raw captions):
1. process_data.py: converts raw datasets into json files inside the folder "./data/processed"
2. sample_generation.py: generates random samples with RAG=False and stores the different files in their own folders in ./data/raw
3. plot_generation.py (OPTIONAL): create line charts of all time series samples
4. caption_refinement.py: run "add fact" option on raw captions
5. caption_filtering.py: run "factual checking" option on "add fact" captions from step 4. It checks all captions and removes those with falsities.