Abstract: Understanding complex systems requires understanding interactions between different domains and different scales. Pandemic science serves as an exemplar of such complex systems. During the COVID-19 pandemic, a significant amount of health surveillance infrastructure had to be created on the fly. This infrastructure, while useful in many cases, was unable to provide individual-level data across relevant domains due to limitations and privacy barriers. Finding technical solutions to these barriers requires a careful evaluation at scale. Synthetic information (sometimes known as digital twins) coupled with detailed mechanistic models are potent, but underutilized, tools for representing and analyzing complex societal systems. In this paper, we describe how synthetic information can be used to evaluate these technical solutions, and thereby support pandemic preparedness and response. As an illustration, we describe a problem and synthetic data sets that have recently been successfully used as part of a joint US-UK challenge on evaluating privacy-enhancing technologies (PETs). We additionally describe several key decisions we had to make, particularly in navigating the boundary between realism and artificiality. This competition is part of a larger vision for using synthetic data to break the vicious cycle of data stagnation. However, to fully realize this vision for a larger set of domain areas will require advancements in several domains, as we describe in this vision paper.
Loading