Abstract: Inferring causal relationships from observational data is a key challenge in understanding the
interpretability of Machine Learning models. Given the ever‑increasing amount of observational
data available in many areas, Machine Learning algorithms used for forecasting have become more
complex, leading to a less understandable path of how a decision is made by the model. To address
this issue, we propose leveraging ensemble models, e.g., Random Forest, to assess which input
features the trained model prioritizes when making a forecast and, in this way, establish causal
relationships between the variables. The advantage of these algorithms lies in their ability to provide
feature importance, which allows us to build the causal network. We present our methodology to
estimate causality in time series from oil field production. As it is difficult to extract causal relations
from a real field, we also included a synthetic oil production dataset and a weather dataset, which
is also synthetic, to provide the ground truth. We aim to perform causal discovery, i.e., establish
the existing connections between the variables in each dataset. Through an iterative process of
improving the forecasting of a target’s value, we evaluate whether the forecasting improves by adding
information from a new potential driver; if so, we state that the driver causally affects the target. On
the oil field‑related datasets, our causal analysis results agree with the interwell connections already
confirmed by tracer information; whenever the tracer data are available, we used it as our ground
truth. This consistency between both estimated and confirmed connections provides us the confidence
about the effectiveness of our proposed methodology. To our knowledge, this is the first time causal
analysis using solely production data is employed to discover interwell connections in an oil field
dataset
0 Replies
Loading