SOL (Delaney) dataset – 230+ compounds with solubility measurements and descriptors. You can download it from here: delaney-processed.csv

FreeSolv dataset – hydration free-energy values for 642 small molecules (SMILES strings and energies). Here’s the file: freesolv.csv

BIOSSES (annotation pairs & scores) – 100 pairs of biomedical sentences with similarity scores from five annotators. The TSV file is available here: annotation_pairs_scores.tsv

Lipophilicity (octanol/water partition) – 4 200 molecules with experimental logD values. You can download it here: Lipophilicity.csv