Abstract: Computing basic network properties and machine learning (ML) model outputs, e.g., reachability, shortest path distance, triangle count, node classification, etc., are key to understand large and complex graphs. We study two fundamental problems: (1) Given a graph with uncertain edges and a real-valued network property or an ML model, estimate the uncertainty associated with evaluating the property or the ML model's output over the uncertain graph. (2) Given a limited budget on the number of edges, find the $k-\mathbf{best}$ edges whose probability update will reduce the aforementioned uncertainty maximally. We formulate both problems using the information-theoretic notion of entropy and then characterize the hardness of our problems. We next devise approximate solutions with theoretical soundness and greedy subgraph selection-based efficient algorithms. Our empirical evaluation and case study with real-world and synthetic datasets demonstrate that the proposed solutions are more effective and efficient than baselines and are several orders of magnitude faster than exact approaches.
Loading