\section{Conclusion}

Our study demonstrates the effectiveness of \ac{CL} strategies in multimodal medical image object detection. Implementation of bounding box size-based and teacher-guided curricula improved overall detection accuracy, particularly for small and medium-sized objects. However, the lack of improvement on the MSD\_Liver dataset necessitates further investigation into dataset-specific factors affecting the effectiveness of \ac{CL}. We observed that both sorting heuristics can be effectively applied in an anti-curriculum fashion, with only the teacher heuristic being able to match the regular sorting approach. Thus, studying the necessary conditions for successful anti-\ac{CL} needs further investigation. We also explored pretraining G-DINO on a single modality before fine-tuning on the full pathological dataset. Our results indicate that multi-modal pretraining yields slightly better performance. Despite these promising results, challenges in \ac{CL} implementation remain. Data-level \ac{CL} still requires hand-crafted difficulty categories and predefined scheduling. In addition, the computational overhead of \ac{CL}, especially in the teacher-guided approach, warrants consideration in balancing improved performance with increased training time.