Abstract: Sum-product networks (SPNs) are an expressive class of deep probabilistic models in which inference takes time linear in their size, enabling them to be learned effectively. However, for certain challenging problems, such as scene understanding, the corresponding SPN has exponential size and is thus intractable. In this work, we introduce submodular sum-product networks (SSPNs), an extension of SPNs in which sum-node weights are defined by a submodular energy function. SSPNs combine the expressivity and depth of SPNs with the ability to efficiently compute the MAP state of a combinatorial number of labelings afforded by submodular energies. SSPNs for scene understanding can be understood as representing all possible parses of an image over arbitrary region shapes with respect to an image grammar. Despite this complexity, we develop an efficient and convergent algorithm based on graph cuts for computing the (approximate) MAP state of an SSPN, greatly increasing the expressivity of the SPN model class. Empirically, we show exponential improvements in parsing time compared to traditional inference algorithms such as alpha-expansion and belief propagation, while returning comparable minima.
TL;DR: A novel extension of sum-product networks that incorporates submodular Markov random fields into the sum nodes, resulting in a highly expressive class of models in which efficient inference is still possible.
Conflicts: cs.washington.edu, u.washington.edu
Keywords: Computer vision, Structured prediction
10 Replies
Loading