\subsection{Related Work}
 The growing area of Neuro-Symbolic (NeSy) reasoning~\citep{ijcai2020p688, sarker2021neuro} seeks to integrate DNN and symbolic models. \citet{Kautz_2022} proposed a taxonomy of NeSy models based on how the DNN and symbolic components interact with each other. Previously, Statistical Relational Models~\citep{getoor&taskar07} was a predominant approach in unifying the representational power of first-order logic with probabilistic models to provide a general framework for uncertainty quantification in the presence of relational structure. Markov Logic Networks (MLNs)~\citep{domingos&lowd09}, Probabilistic Soft Logic (PSL)~\citep{bach_psl17} and Problog~\citep{raedt&al07} are arguably three of the most well-known statistical relational models.
 pLogicNet~\citep{NEURIPS2019_13e5ebb0} utilizes MLNs to represent domain knowledge with first-order logic and handle uncertainty, while also incorporating knowledge graph embedding methods for efficient inference. 
 Neural Markov Logic Networks~\citep{nmln_marra} introduced an approach where instead of symbolic rules in an MLN, neural networks are used within a log-linear model. Problog was extended to DeepProbLog~\citep{deepproblog18}, which supports both symbolic and sub-symbolic representations and inference by integrating neural networks through the use of neural predicates. \citep{deepstoch} further extended on this and introduced DeepStochLog, which introduces neural networks into stochastic logic programs based on stochastic definite clause grammars, defining a probability distribution over derivations to enable better scalability and handling of longer sequences compared to DeepProbLog. \citep{NEURIPS2023_bf215fa7} introduced DeepSoftLog, which is a framework that integrates soft-unification and probabilistic logic programming, where they use distance between embeddings instead of exact-matching of the symbolic terms for unification.
More recently, PSL was extended to NeuPSL~\citep{ijcai2023p0461} to augment DNN learning with logical rules. However, the aforementioned approaches do not account for variations in DNN representations during learning/inference, which is the focus of our work. Related to our mixture model approach, a stacking method was developed to scale-up learning in MLNs~\cite{islam&al18} that combines multiple MLNs that are compressed using symmetries.
