Induction of Selective Bayesian Networks from Data

Moninder Singh

1996 (modified: 16 Jul 2019)AAAI/IAAI, Vol. 2 1996Readers: Everyone

Abstract: Bayesian networks (Pearl 1988), which provide a compact graphical way to express complex probabilistic relationships among several random variables, are rapidly becoming the tool of choice for dealing with uncertainty in knowledge based systems. Amongst the many advantages offered by Bayesian networks over other representations such as decision trees and neural networks are the ease of comprehensibility to humans, effectiveness as complex decision making models and elicitability of informative prior distributions. However, approaches based on Bayesian networks have often been dismissed as unfit for many real-world applications because they are difficult to construct and probabilistic inference is intractable for most problems of realistic size. Given the increasing availability of large amounts of data in most domains, learning of Bayesian networks from data can circumvent the first problem. This research deals primarily with the second problem. We address this issue by learning selective Bayesian networks a variant of the Bayesian network that uses only a subset of the given attributes to model a domain. Our aim is to learn networks that are smaller, and hence computationally simpler to evaluate, but display accuracy comparable to that of networks induced using all attributes. We have developed two methods for inducing selective Bayesian networks from data. The first method, K2-AS (Singh & Provan 1995), selects a subset of attributes that maximizes predictive accuracy prior to the network learning phase.The idea behind this approach is that attributes which have little or no influence on the accuracy of learned networks can be discarded without significantly affecting their performance. The second method we have developed, InfoAS (Singh & Provan 1996), uses information-theoretic metrics to efficiently select a subset of attributes from which to learn the classifier. The aim is to discard those attributes which can give us little or no information about the class variable, given the other attributes in the network. We have showed that relative to networks learned using all attributes, networks learned by both K2-AS and Info-AS are significantly smaller and computationally simpler to evaluate but display comparable predictive accuracy. More-

0 Replies