\section{Limitations}
One of the main limitations with our approach is that we assume prior knowledge is known (and reasonably correct) and can be expressed in first-order logic. A second limitation is that we assume embeddings in training and testing are generated by models optimizing the same function (i.e., for the same downstream task). From a practical perspective, as is the case with mixture models in general, our approach takes longer than methods that utilize a single model. We also assume that our HMLN does not contain hard constraints (formulas that have infinite weights) in the current work.