Learning Diverse Gaussian Graphical Models and Interpreting EdgesOpen Website

19 Jan 2022OpenReview Archive Direct UploadReaders: Everyone
Abstract: Gaussian graphical models are used to discover patterns of variable dependencies in many scientific applications, yet there is no guarantee that the assumption of identically distributed (IID) samples is correct. In many cases, the non-IID nature of the data is due to the fact that some observations are produced by a different underlying process than the other samples. Therefore, it is informative to the analyst who is trying to understand patterns in their data to explore different graphical models produced by various subsets of the data. Learning a graphical model for every one of a combinatoric number of data subsets is intractable. We solve the problem with an interactive machine learning approach, by first learning a Gaussian graphical model from data, then finding a different subset of the data that would produce the most different Gaussian graphical model, allowing the user to explore the most diverse Gaussian graphical models that fit various subsets of their data. To find the most different Gaussian graphical model, we define an optimization problem that can be solved by gradient-based algorithms. Furthermore, to gain insight into the learned Gaussian graphical model, we explain an edge in the graph by finding the subset of observations in the dataset that are critical for defining the correlation that the edge represents. That is, we interpret edges by finding subsets of data such that their removal from the dataset will lead to eliminating an edge in the graph. This relational information enables analysts to interpret each edge in terms of its robustness and relationship to observations in the data. Our method finds patterns in data from the Mars rover and online recipes. By bringing transparency and interpretability, we enable the practitioners to create and use models that they are confident and insightful about.
0 Replies

Loading