Abstract: To optimize clinical outcomes, many fertility clinics select embryos strategically, based on how quickly they reach certain developmental milestones. This requires manually annotating time-lapse EmbryoScope videos with their corresponding morphokinetics, a time-consuming process that requires experienced embryologists. We propose late-fusion ConvNets with a dynamic programming-based decoder for automatically labeling these videos. Experiments address data extracted from EmbryoScope incubators at the Cleveland Clinic Foundation Fertility Center. We focus on 6 stages, demonstrating 87% per-frame accuracy.