Abstract: Highlights•We implemented and systematically compared six recent foundation models (FMs) and six multiple instance learning (MIL) methods across seven clinically relevant prediction tasks, in four cancer types, using 4044 whole slide images (WSIs) in a multicenter setting.•FMs like UNI, trained with more diverse histological images, outperform generic models with smaller training datasets in patch embeddings, significantly enhancing downstream MIL classification accuracy and model training convergence speed.•Instance feature fine-tuning, known as online feature re-embedding, to capture both fine-grained details and spatial interactions can often further improve WSI classification performance.•FMs advance MIL models by enabling promising grading classifications, biomarker status and microsatellite instability (MSI) predictions without requiring pixel- or patch-level annotations.