everyone
since 13 Oct 2023">EveryoneRevisionsBibTeX
Real-world data in fields such as economics, finance and neuroscience often exhibit a lower resolution compared to the underlying causal process, with temporally aggregated data being a common example. While the impact of temporally aggregated time series on temporal causal discovery has received attention, the effects of highly aggregated data, which yield independent and identically distributed (i.i.d.) observations, on instantaneous (non-temporal) causal discovery have been largely overlooked by the research community. There is substantial evidence suggesting that temporally aggregated i.i.d. data are prevalent in reality. This prevalence arises because the time required for causal interactions is often considerably shorter than the observational interval, leading to a large aggregation factor and subsequently rendering the temporally aggregated data i.i.d. The critical question arises: are causal discovery results obtained from such data consistent with the true causal process? In this paper, we provide theoretical conditions necessary to ensure the consistency of causal discovery results when analyzing temporally aggregated i.i.d. data. Through a combination of theoretical analysis and experimental validation, we demonstrate that conducting causal discovery on such data often leads to erroneous results. Our primary objective is to bring attention to the risks associated with performing causal discovery on highly aggregated i.i.d. data and advocate for a cautious and meticulous approach when interpreting causal discovery outcomes derived from such data.