\begin{abstract}
Accurately predicting implantation outcomes based on blastocyst developmental potential is valuable in in-vitro fertilization (IVF). Clinically, embryologists analyze multiple focal-plane images (FP-images) to comprehensively assess embryo grades, which is extremely cumbersome and easily prone to inconsistency. Developing automatic computer-aided methods for analyzing embryo images is highly desirable. However, effectively fusing multiple FP-images for prediction remains a largely under-explored issue. To this end, we propose a novel Multiple Focal-plane Image Fusion Network, called \model, to predict implantation outcomes of blastocyst. Specifically, our \model consists of two sub-networks: a Core Image Generation Network (CI-Gen) and a Key Feature Fusion Network (KFFNet). In CI-Gen, we fuse multiple FP-images to generate a \textit{core image} by pixel-wise weighting since different FP-images can have different focus positions. To further capture key features in each FP-image, we propose KFFNet to extract key information from the FP-images again and fuse them with the core image. In KFFNet, a Fusion Module is designed to capture key information of each FP-image, for which Squeeze Multi-Headed Attention is developed to exchange features and mitigate computationally intensive issue in attention. Comprehensive experiments validate the superiority and the rationality of our \model approach over state-of-the-art methods in various metrics. Ablation studies also confirm the positive impact of each component in our \model. The code will be publicly available upon acceptance. Model implementation details are available on https://github.com/Ch3ngY1/MFIF-Net.
\end{abstract}

\begin{keywords}
Blastocyst implantation prediction, in-vitro fertilization, multi-modalities
\end{keywords}
