Abstract: Bayes nets (BNs) for relational databases are a major research topic in machine learning and artificial intelligence. When the database exhibits cyclic probabilistic dependencies, measuring the fit of a BN model to relational data with a likelihood function is a challenge [5, 36, 28, 9]. A common approach to difficulties in defining a likelihood function is to employ a pseudo-likelihood; a prominent example is the pseudo likelihood defined for Markov Logic Networks (MLNs). This paper proposes a new pseudo likelihood P∗ for Parametrized Bayes Nets (PBNs) [32] and other relational versions of Bayes nets. The pseudo log-likelihood L∗ = ln(P∗) is similar to the single-table BN log-likelihood, where row counts in the data table are replaced by frequencies in the database. We introduce a new type of semantics based on the concept of random instantiations (groundings) from classic AI research [12, 1]: The measure L∗ is the expected log-likelihood of a random instantiation of the 1st-order variables in the PBN. The standard moralization method for converting a PBN to an MLN provides another interpretation of L∗: the measure is closely related to the log-likelihood and to the pseudo log-likelihood of the moralized PBN. For parameter learning, the L∗-maximizing estimates are the empirical conditional frequencies in the databases. For structure learning, we show that the state of the art learn-and-join method of Khosravi et al. [18] implicitly maximizes the L∗ measure. The measure provides a theoretical foundation for this algorithm, while the algorithm's empirical success provides experimental validation for its usefulness.
0 Replies
Loading