%%%%%%%% NACI Workshop at ICML 2021 EXAMPLE LATEX SUBMISSION FILE %%%%%%%%%%%%%%%%%
\documentclass[accepted]{uai2022}

\usepackage[american]{babel}
% \usepackage[british]{babel}

%% Some suggested packages, as needed:
\usepackage{natbib} % has a nice set of citation styles and commands
    \bibliographystyle{plainnat}
    \renewcommand{\bibsection}{\subsubsection*{References}}
\usepackage{mathtools} % amsmath with fixes and additions
% \usepackage{siunitx} % for proper typesetting of numbers and units
\usepackage{booktabs} % commands to create good-looking tables
\usepackage{tikz} % nice language for creating drawings and diagrams
%% Provided macros
% \smaller: Because the class footnote size is essentially LaTeX's \small,
%           redefining \footnotesize, we provide the original \footnotesize
%           using this macro.
%           (Use only sparingly, e.g., in drawings, as it is quite small.)

%% Self-defined macros
\newcommand{\swap}[3][-]{#3#1#2} % just an example

\title{Instructions for Authors: Title in Title Case}


%\usepackage[round]{natbib}
%\renewcommand{\bibname}{References}
%\renewcommand{\bibsection}{\subsubsection*{\bibname}}

%\usepackage{aistats2022arxiv}
\usepackage{notations}
%
%\documentclass{article}
%\usepackage{iclr2022_conference,times}

% Recommended, but optional, packages for figures and better typesetting:
\usepackage{microtype}
\usepackage{graphicx}
%\usepackage{subfigure}

%\usepackage{booktabs} % for professional tables
%\usepackage{xr-hyper}

\usepackage{hyperref}
\usepackage{url}

\externaldocument{besserve_593-supp}


%\usepackage[accepted]{icml2021}
\usepackage{amsfonts}       % blackboard math symbols
\usepackage{amsthm}
\usepackage{nicefrac}       % compact symbols for 1/2, etc.
\usepackage{microtype}      % microtypography
\usepackage{xcolor}         % colors
\usepackage{subcaption}
\usepackage{mathtools}
\usepackage{comment}

\newtheorem{defn}{Definition}
\newtheorem{prop}{Proposition}
\newtheorem{corol}{Corollary}
\newtheorem{expl}{Example}
\newtheorem{model}{Model}
\newtheorem{thm}{\protect\theoremname}
\newtheorem{lem}[thm]{Lemma}
\newtheorem{post}{Postulate}


\newcommand{\michel}[1]{{\color{red} \textbf{Michel}:#1}}
\renewcommand{\michel}[1]{}

\newcommand{\bx}{\boldsymbol{x}}
\newcommand{\by}{\boldsymbol{y}}
\newcommand{\bs}{\boldsymbol{s}}
\newcommand{\be}{\boldsymbol{e}}
\newcommand{\bp}{\boldsymbol{p}}

\newcommand{\btheta}{\boldsymbol{\theta}}
\newcommand{\balpha}{\boldsymbol{\alpha}}
\newcommand{\parents}{\textbf{Pa}}
\newcommand{\rf}{{\rm ref}}

\newcommand{\bg}{{\rm \textbf{g}}}

\newcommand{\Bf}{{\rm {\bf f}}}

%sets
\newcommand{\G}{G}
\newcommand{\X}{\mathcal{X}}
\newcommand{\R}{\mathbb{R}}

\newcommand{\bernhard}[1]{\textbf{\color{red}~B:}{~\color{blue}#1}}
\renewcommand{\bernhard}[1]{}

%\icmltitlerunning{Lie interventions}


%\title{Learning soft interventions \\in complex systems.}

% Authors must not appear in the submitted version. They should be hidden
% as long as the \iclrfinalcopy macro remains commented out below.
% Non-anonymous submissions will be rejected without review.


% The \author macro works with any number of authors. There are two commands
% used to separate the names and addresses of multiple authors: \And and \AND.
%
% Using \And between authors leaves it to \LaTeX{} to determine where to break
% the lines. Using \AND forces a linebreak at that point. So, if \LaTeX{}
% puts 3 of 4 authors names on the first line, and the last on the second
% line, try using \AND instead of \And before the third author name.

\newcommand{\fix}{\marginpar{FIX}}
\newcommand{\new}{\marginpar{NEW}}


%\twocolumn[
%\icmltitle{Learning optimal interventions in complex systems.}

%\icmlsetsymbol{equal}{*}

%\begin{icmlauthorlist}
%\icmlauthor{Michel Besserve}{to}
%\icmlauthor{Bernhard Sch\"olkopf}{to}
%\end{icmlauthorlist}

%\icmlaffiliation{to}{Department of Empirical Inference, Max Planck Institute for Intelligent Systems, T\"ubingen, Germany}

%\icmlcorrespondingauthor{Michel Besserve}{mbesserve@gmail.com}

%\icmlkeywords{Machine Learning, ICML}

%\vskip 0.3in
%]

%\twocolumn[

\title{Learning soft interventions in complex equilibrium systems}

\author[1]{\href{mailto:<mbesserve@gmail.com>?Subject=Your UAI 2022 paper}{ Michel Besserve}}
\author[1]{Bernhard Sch\"olkopf}

\affil[1]{Department of Empirical Inference, Max Planck Institute for Intelligent Systems, T\"ubingen, Germany.} 
\begin{document}
\maketitle

\begin{abstract}
  Complex systems often contain feedback loops that can be described as cyclic causal models. Intervening in such systems may lead to counterintuitive effects, which cannot be inferred directly from the graph structure. After establishing a framework for differentiable soft interventions based on Lie groups, we take advantage of modern automatic differentiation techniques and their application to implicit functions in order to optimize interventions in cyclic causal models. We illustrate the use of this framework by investigating scenarios of transition to sustainable economies. 
\end{abstract}

	
\section{Introduction}
Designing optimal interventions in complex systems, composed of many interacting parts, is a key objective in multiple fields. In the context of socio-economic systems, the design of public policies to improve economic and social welfare is a major source of scientific and political debate. 
Moreover, the positive aspects of socio-economic activities need to be traded-off with their environmental impacts, as their long term consequences may considerably affect societies  \citep{dearing2014safe,sherwood2010adaptability}. Interestingly, a priori intuitive interventions in such systems may lead to paradoxical outcomes. The rebound effect in energy economy, first reported by \citet{jevons1866coal}, is paradigmatic: while the energy efficiency of devices may considerably increase due to technological  improvements, this may trigger an overall increase of energy consumption due to increases in demand \citep{brockway2021energy}. This suggests in particular that efficiency alone may not be the best way to foster a transition towards sustainability, and calls for a quantitative study of optimal interventions in such complex systems \citep{arrobbio2018}. As argued for the case of rebound effects \citep{wallenborn2018}, such unexpected behaviors may reflect balanced causal relationships designed by evolution \citep{andersen2013expect}, and feedback loops \citep{blom2021causality} that maintain a system at an optimal ``equilibrium'' operating point independent from external perturbations, challenging classical causal inference assumptions of faithfulness and acyclicity. 

While interventions have been extensively investigated theoretically in the field of causality \citep{pearl2000causality,imbens2015causal}, the case of systems incorporating feedback loops remains particularly challenging, and therefore led to only limited applications to real-life complex systems.
A possible first step to study such systems is to approximate them by a model that operates at an equilibrium point, and can thus be described by a cyclic structural causal model \citep{bongers2016foundations}. Such models satisfy a self consistent set of equations that, under unique solvability assumptions, fully identifies the operating point, and allows to study interventions. For practical and ethical reasons, interventions that do not change the causal structure, called soft interventions, arguably provide a more realistic account of changes that can be performed in real life systems.  While a restricted set of qualitative results exist for such interventions \citep{blom2020conditional}\michel{add more, bareinboim...}, their quantitative assessment and design in complex systems is made difficult by the analytical intractability of the self-consistency relations. 
\begin{comment}
While interventions in such models are typically not directly readable from the causal graph, the latter can be turned into a causal ordering graph, in which the effects of interventions that do not change the causal structure, referred to as soft interventions, are reflected in the directed paths \citep{blom2020conditional}. For practical and ethical reasons, soft interventions also arguably provide a more realistic account of changes that can be performed in real life systems.  While a restricted set of qualitative results exist for such interventions, their quantitative assessment and design in complex systems is made difficult by the analytical intractability of the self-consistency relations. 
\end{comment}


In this paper, we propose a framework for a general class of differentiable parametric soft interventions based on Lie groups and leverage recent technical and algorithmic developments allowing learning implicit functional relationships \citep{bai2019deep} to optimize such interventions. After defining Lie interventions and assaying their theoretical properties, we provide a computational framework to optimize them. We illustrate its application to economic models derived from real data, offering a novel approach to computational sustainability. Proofs are provided in Appendix~\ref{app:proofs}. Code is available at \href{https://github.com/mbesserve/lie-inter}{https://github.com/mbesserve/lie-inter}.
% 

\paragraph{Related work.} Various types of economic equilibrium models (EEM) have been used to investigate macroeconomic effect of specific interventions \citep{wiebe2018implementing,wood2018}. Also, experimental design in two-sided marketplaces has been investigated in \citep{johari2022experimental}. %In particular, interventions to reduce carbon footprint have been investigated in the field of industrial economics, and energy economics investigates various models of rebound effects. 
In contrast to such work, we develop a general optimization framework that allows the optimal design of interventions to achieve specific goals. A restricted set of EEMs have been investigated more extensively from an optimization perspective (see, e.g., \citealt{esteban2004computing}); however, these are restricted to rather specific assumptions and constraints that allow to address optimization with linear programming approaches. 
Instead, we rely on automated differentiation and backpropagation algorithms that allow studying mechanisms and interventions with a broad range of non-linearities. In the field of causality, several studies investigate the relationship between the equilibrium of dynamical systems and structural causal models (SCM) \citep{MooJanSch13,peters2020causal} and how the causal structure can be learnt from data. 
In contrast, we focus on designing soft interventions in an known SCM at equilibrium. %While most work in the field of causality has focused on hard interventions, 
While the specificity of soft interventions have started to be investigated theoretically in %general (possibly cyclic) 
structural causal models \citep{rothenhausler2015backshift,kocaoglu2019characterization,jaber2020causal,correa2020general,blom2020conditional}, %Morevover, recent work has investigated theoretically and empirically the design of extrapolations in generative feedforward (thus acyclic) neural network using a soft intervention perspective \citep{Besserve_Sun_Janzing_Scholkopf_2021}. 
the present work is to the best of our knowledge the first to investigate theoretically and algorithmically the design of soft interventions in cyclic causal models. The algorithmic approach relies on modeling economic equilibrium with deep equilibrium models \citep{bai2019deep}. This approach belongs to the category of implicit deep learning \citep{el2021implicit}, which has been used in a variety of applications such as model predictive control \citep{amos2018differentiable} and multi-agent trajectory modelling \citep{geiger2020learning}. \michel{reduce?}

\section{Motivation and Background}
\label{sec:backgrnd}
\subsection{Environmental Economic models}\label{sec:mrio}
In the face of the increasing severity of climate change and further environmental impacts of human activities, our societies face challenges to transition to more sustainable economies. An overarching difficulty %that makes this problem one of the most difficult in history 
is the complexity of the systems that need to be intervened on, which comprise tightly intertwined components, ranging from economic agents to a broad variety of ecosystems \citep{haberl2019contributions}. 

A classical way to represent the economy and its impacts are input-output (IO) multi-sector economic equilibrium models \citep{stadler2018exiobase}, in which economic activities are divided in $d$ interdependent \textit{sectors} and described by a positive $d$-dimensional \textit{output} vector $\bx$ (see Appendix~\ref{app:back}).
We take as a guiding example the demand-driven model introduced by \citet{leontief1951structure}, which is the basis of the \textit{Input-Output analysis} approach to environmental impact assessment. In such models, the sectors' outputs at economic equilibrium $\bx^*$ are dependent on the vector $\by$ gathering final demand for each product (consumed by users instead of being used to make another product). Satisfying the demand of all sectors implies the self-consistent equation
\begin{equation}\label{eq:leontief}
	\bx^*=A\bx^* +\by\,,
\end{equation}
where $A$ is the so-called  \textit{technical  coefficients  matrix}, with $A_{ij}$  the amount  of  each  product $i$ used as input to  produce product $j$. %\footnote{In this simplified model, each sector is in charge of the production of a single homogeneous product.\michel{remove?}}
An example of technical coefficient matrix estimated from economic data is provided in Fig.~\ref{fig:leontief}.
While such equilibria can be thought of as the asymptotic value of $\bx$ in a dynamic model (see~Appendix~\ref{app:back})
%\[
%\frac{d\bx}{dt} = A \bx +\by - \bx\,,
%\]
%where the increase or decrease of the sectors' activity is controlled by the imbalance between their demand $A\bx+y$ and their current output $\bx$. However, 
we focus our analysis on the equilibrium equations without consideration for the dynamics that gives rise to it. %(see e.g., \cite{peters2020causal} for investigations of the relationship between causal models and dynamical systems). 
In turn, the socio-economic impacts (e.g., employment) and environmental stressors (e.g., GHG emissions, water use, ...) of each sector's activity is gathered in a vector of \textit{impacts} $\bs$ such that
\begin{equation}\label{eq:impact}
	\bs = R\bx
\end{equation}
where $R$ is a \textit{footprint intensity} matrix such that $R_{ij}$ is the amount of impact of type $i$ generated by unit of output $j$. %In addition to stressors, this vector may also contain macro economic variables related to the activity, such as employment or added value.
\begin{comment}
In the long run, impacts of activities on the environment and planetary resources are likely to trigger feedback loops in various forms: shortage of renewable and non-renewable resources, drop in agricultural yields and major environmental migration, to name a few (see, e.g., \citealt{dearing2014safe,sherwood2010adaptability}). To mitigate them, a reorganization of the global economy is required, which, for instance, may consist in intervening on the interactions between sectors reflected in the matrix $A$. However, choosing such interventions faces three challenges. 
\end{comment}
To mitigate major long term negative consequences of environmental stressors (see, e.g., \citealt{dearing2014safe,sherwood2010adaptability}), a reorganization of the global economy is required, which may consist in intervening on economic sectors, their impacts and their interactions reflected in the matrices $A$ and $R$. However, this faces three challenges. 

\begin{figure*}
	\begin{subfigure}{.3\linewidth}
		\includegraphics[width=\linewidth]{figures/leontief.pdf}
		\subcaption{\label{fig:leontief}}
	\end{subfigure}
	\hfill
	\begin{subfigure}{.31\linewidth}
		\includegraphics[width=\linewidth]{figures/priceModel.pdf}
		\subcaption{\label{fig:pricemod}}
	\end{subfigure}
	\hfill 
	\begin{subfigure}{.37\linewidth}
		\includegraphics[width=\linewidth]{figures/deepEq.pdf}
		\subcaption{\label{fig:deepEq}}
	\end{subfigure}
	\caption{(a) Top left: technical coefficient matrix between 200 sectors and 49 world regions for 2011 (source: Exiobase 3, \citealt{stadler2018exiobase}). Top right: magnification of the top left corner of this matrix. Diagonal blocks reflect the stronger sector dependency within a country. Bottom right: putative example of cyclic dependency between different sectors. (b) Illustration of the causal graph for \textit{rebound trough prices} in a two sector economy. (c) Principle of deep equilibrium models.%: the (fixed) equilibrium point of $f_\theta$ is obtained by iterating its application to an initial point, while the gradient of this fixed point with respect to the parameters $\theta$ can be back-propagated (both passes are accelerated by specific procedures, see \cite{bai2019deep}).
	\label{fig:1}}
\end{figure*}

%\bernhard{maybe better 'societal'?}->checked, it should be social
\paragraph{\textit{Challenge 1}: social acceptability.} Reducing a sector's activity may lead to both positive environmental effects (yielding lower footprints) and negative socio-economic impacts (such as reducing economic growth and employment, see Appendix~\ref{sec:supdisc}). Decision makers thus trade off environmental goals with the social acceptability of the chosen policies. 

\paragraph{\textit{Challenge 2}: recurrence between sectors.} The sectors' activities are tightly intertwined by their reciprocal demands, as illustrated %by the graphical model 
at the bottom of Fig.~\ref{fig:leontief}: electricity production through renewable energy requires wind turbines, which require metals, while the metal industry requires itself electricity to extract metals from ores and transform them. Such cycles make it challenging to anticipate the system-wide consequences of interventions a particular sector. %to reduce the activity of sectors with strong environmental footprints. 

\paragraph{\textit{Challenge 3}: rebound effects.} The complexity of the economic system also manifests itself through balancing mechanisms that reflect the utility maximization behavior of economic agents, such as rebound effects. Consider $\bx^*$ in eq.~\ref{eq:leontief}, which can be written as a function of final demand
\begin{equation}\label{eq:leoninv}
\bx^* = (I-A)^{-1} \by\,.
\end{equation}
In practice, final demand is influenced by prices of each good and often modeled by a static demand curve $d_i$ for good $i$ such that
$y_i = d_i(p_i)$. 
A final demand rebound through prices can be simulated in the Leontief model as follows. 
Energy efficiency of the production of a particular good $j$ corresponds to a decrease of $A_{ej}$, where $e$ indicates the energy sector, but this modification also affects the unit price through energy costs. For simplicity, we define the price vector $\bp$ of goods such that it is proportional to the energy required in all sectors involved in the production of one unit of this good. It can thus be modelled by a self-consistent relation involving the technical coefficient matrix:
\[
\bp^* = A^\top \bp^* +\beta \boldsymbol{\delta}_e\,,
\]
where $\delta_e$ is a canonical basis vector which takes value $1$ for the energy sector, and value $0$ for all other sectors. For illustrative purposes, the overall causal model %encompassing economic activity, prices and final demand 
is shown in Fig.~\ref{fig:pricemod} in the case of a two sector economy, with sector 1 being the energy sector. The price-based rebound mechanisms then operates as follows: a decrease of $A_{ej}$ will decrease energy demand on sector $e$, but will also decrease the unit price of goods for sector $j$ (and downstream sectors consuming its goods). 
Because the demand curves $d_j$ are monotonically decreasing, the price drop increases the final demand for these products, which in turn increases economic activity according to eq.~\ref{eq:leoninv}, and their environmental footprint. 
The rebound may thus be avoided by simultaneously intervening on the unit price of energy $\beta$ through a tax policy, so that price level is maintained high and prevents increases of final demand (see Fig.~\ref{fig:pricemod}). Importantly, while eq.~(\ref{eq:leoninv}) provides a linear relationship between activity and final demand, once we assume $\bp$ is price dependent, the system of equations becomes non-linear and finding an analytic expression of the economic equilibrium is nontrivial.
Our approach to designing interventions in cyclic causal models will be applied to models illustrating the above three challenges. 

\subsection{Cyclic causal models}
Interventions and their effects on systems have been investigated using Structural Causal Models (SCM) \citep{pearl2000causality}. In this framework, relationships between observed variables $X_k$ are described by a set of structural assignments 
\[
X_k \coloneqq f_k(\parents_k,\epsilon_k)\,,
\]
where $\parents_k$ indicates the parents of variable $X_k$ in an associated directed causal graph, such as the one illustrated in Fig.~\ref{fig:leontief}. Interventions turn an SCM into a different one, by applying a modification to at least one of its elements. Broadly construed, interventions range from ``hard'' interventions that modify the structure of the graph to ``soft'' interventions that do not \citep{eberhardt2007interventions}. 
%\bernhard{the following 'comment' in latex does not appear. Is that deliberate?}-> yes, space optimizaiton...

While in acyclic graphs, interventions have generic effects on their descendants in the causal graph, and no effects on the parents, \cite{blom2020conditional} have shown that causal effects are less easy to read in graphs containing cycles. %While some qualitative information can be gathered through the use of a causal ordering graph (see Appendix~\ref{app:back}), it is limited to specific graph structures. 
Anticipating the effect of interventions in cyclic graphs overall requires to estimate the changes in the equilibrium point, which is typically non-trivial. While a variety of approaches may be used (e.g., based on root finding approaches), designing optimal interventions for self-consistent equations that cannot be handled analytically is challenging, especially in high dimensional systems. Recent work in deep neural network has come up with techniques allowing gradient descent based optimization of such equilibrium models \citep{bai2019deep}. 
% SCM definition
% extra assumptions: smooth functions?
% soft interventions
% uniquelysolvabel
% causal ordering graph

%\michel{
%side: Proposition 3.7 in Foundation of cyclic model-> self-cycle are problematic for unique solvability
%side: Section 3.4 in the same: preservation of sovability under intervention...}

\subsection{Deep equilibrium models}

 \Citet{bai2019deep} introduced deep learning architecture elements with input-output functional relationships $\bx^*=\bg(\btheta)$ between variables $\bx^*$ and parameters $\btheta$ that are only defined through %an implicit function of the form $\Bf_{\btheta}(\bx,\by)=0$. Indeed, the relationship between $\bf$ and $\bg$ can be derived through the \textit{implicit function theorem} (see Appendix~\ref{app:back}), such that gradient backpropagation  of parameters $\btheta$ or input $\by$ through the function $\bg_\theta$ can be implemented based only on the knowledge of $\Bf_{\btheta}$. The case of deep equilibrium models specifically addresses self-consistent relations that are encoded in the cyclic causal models discussed above. Consider 
 a self-consistent equation 
\[
\bx^*=\Bf_{\btheta} (\bx^*).
\]
Assuming that for each value $\btheta$ %
%\footnote{The distinction between input and parameters does not pertain to the model itself but to the overall architecture that embeds it. Formally, they can be considered to form a unique extended set of parameter, as inFig.~\ref{fig:deepEq}} 
there is a unique solution $\bx^*$, the gradient with respect to one parameter component $\theta_k$ can be obtained through another self-consistent equation
\begin{equation*}
\frac{\partial \bx^*}{\partial \theta_k} = \frac{\partial \Bf_{\btheta}}{\partial \theta_k}(\bx^*)+\frac{\partial \Bf_{\btheta}}{\partial \bx}\frac{\partial \bx^*}{\partial \theta_k} \,.
\end{equation*}
Overall, $\bg$ can be integrated as a layer in more complex differentiable models, which, as depicted in Fig.~\ref{fig:deepEq}, can be understood as a cascade of multiple layers with identical functions and shared parameters, with specific accelerated fixed point iteration approaches to compute the forward and backward passes \citep{bai2019deep}. In this paper, we use Anderson's acceleration \citep{walker2011anderson}, which essentially generalizes the forward iteration approach (i.e. iterating $\bx_{k+1}=\Bf(\bx_k)$ until convergence) by leveraging the $m$ previous estimates in order to find a better estimate. Overall, these layers offer a differentiable framework for investigating the behavior of cyclic graphs that we use to design interventions.

% definition, forward, backward pass
% comments on solvability

\subsection{Lie groups}
%As stated above, the effects of soft interventions are easier to interpret in cyclic causal models than perfect interventions. 
Using deep equilibrium models, we can learn differentiable soft interventions compatible with classical optimization frameworks. We will use the concept of Lie groups, which are smooth manifolds of transformations (see Appendix~\ref{app:back} for more background), 
in order to implement smooth soft interventions. 
In short, a group $\G$ is a set of objects equipped with a group ``multiplication'' operation mapping $(g_1,g_2)\in\G^2$ to $g_1 g_2\in\G$ and an inverse operation $g^{-1}$ with the following properties: 
%\renewcommand{\@listI}{%
%\leftmargin=0pt
%\rightmargin=0pt
%\labelsep=5pt
%\labelwidth=20pt
%\itemindent=0pt
%\listparindent=0pt
%\topsep=0pt plus 2pt minus 4pt
%\partopsep=0pt plus 1pt minus 1pt
%\parsep=0pt plus 1pt
%\itemsep=0em}
%\vspace{-2\topsep}
\begin{itemize}
\itemsep0em
\parskip0em
	\item (associativity) $(g_1 g_2) g_3 = g_1 (g_2 g_3)$,
	\item (identity element) there exist a unique identity element $e$ such that for all $g$, $eg=ge=g$,
	\item (inverse) for all $g\in \G$, there exists a unique element $g^{-1}$ such that $g g^{-1}=g^{-1} g=e$.
\end{itemize}
\vspace{-1\topsep}
%\bernhard{it is slightly odd to define groups, but not to define manifolds. But I guess it is OK.} -> referred to appendix...
Groups perform transformations on objects in a set $\X$ through the definition of a group action operation $\varphi$ mapping $(g,x)\in \G\times \X$ to $\varphi_g(x)= g\cdot x\in \X$, such that 
\vspace{-\topsep}
\begin{itemize}
\itemsep0em
\parskip0em
	\item (identity) for all $x\in \X$, $e\cdot x=x$,
	\item (compatibility) for all $(g,h)\in \G\times\G$, for all $x\in \X$, $g\cdot (h\cdot x)=(gh)\cdot x$.
\end{itemize}
\vspace{-\topsep}
A real Lie group is a group that is also a finite-dimensional real smooth manifold (see Appendix~\ref{app:back}), in which the group operations of multiplication and inversion are smooth maps. % (see Appendix~\ref{app:back} for more details). 
%Groups have been used in various branchs of Machine learning and causality, notably due to their ability to represent complex structures and invariance properties \cite{besserve2018aistats,higgins2018towards,falorsi2019reparameterizing,rao1999learning,kondor2008group,cohen2016group}. 
The differentiability of Lie groups will be leveraged to design smooth interventions.


\section{Intervening in smooth models}
\subsection{Smooth causal graphical models}

We define a smooth structural causal model (SSCM)
as a set of variables $\{x_j\}$ related to each other through structural equations and vertices in a directed graph as follows.
\begin{defn}[SSCM]\label{def:SCM}
	A $d$-dimensional smooth structural causal model is a 4-tuple $(\mathcal{X},\mathcal{T},\mathbb{S},\mathcal{G})$ consisting of
	\vspace{-\topsep}
	\begin{itemize}
	\itemsep0em
	\parskip0em
		\item two collections of smooth manifolds $\mathcal{X}=\{\mathcal{X}_i\}_{i=1..d}$ and $\mathcal{T}=\{\mathcal{T}_j\}_{j=1..d}$ ,	
		\item a directed graph $\mathcal{G}=(V,E)$ with set $V$ of $d$ vertices and set $E$ of directed edges between them, each vertex being associated to one variable $x_i\in\mathcal{X}_j$ ,
		\item a set $\mathbb{S}$ of structural assignments 
		$
		\{x_j \coloneqq f_j(\parents_j,\theta_j),  \theta_j \in\mathcal{T}_j\}_{j=1,\dots,d}\, ,
		$
		where $f_k$ are smooth maps, and $\parents_j$ are the variables indexed by the set of parents of vertex $j$ in $\mathcal{G}$.
	\end{itemize} 
	\vspace{-1\topsep}
\end{defn}
Compared to classical definitions of SCMs (see, e.g., \citealt{causality_book}), we have replaced exogenous random variables by deterministic parameters living on a manifold. This general definition does not prevent assigning random variables to some (components of) these parameters. In the cases considered here, $\mathcal{T}_i$ are subsets of Euclidean spaces. We are particularly interested in cyclic SCMs, where there exists at least one directed path linking one vertex to itself. As a consequence, the possible values achieved by each variable have to be chosen among the solutions of the $d$ self-consistent structural equation constraints. We assume the unintervened causal model is locally uniquely solvable.
\begin{defn}
	A SSCM is locally uniquely solvable around a reference point $(\bx^{\rf},\btheta^{\rf})$ whenever there exists a neighborhood $U_{\btheta}$ of $\btheta^{\rf}$ and a neighborhood $U_{\bx}$ of $\bx^{\rf}$ such that for all $\btheta\in U_{\btheta}$ there exists a  unique (self-consistent) solution to the set of structural assignments $\bx^*(\btheta)\in U_{\bx}$.  
\end{defn}
Note that this is adapted to our SSCM definition and differs from the unique solvability definition of \cite{bongers2016foundations}, which was formulated for causal models with random exogenous variables.  
This property is guaranteed by a condition on the Jacobian of the structural equations.
\begin{prop}\label{prop:localSolv}
	We say the SSCM is locally diffeomorphic at $(\bx^{\rf},\btheta^{\rf})$ when $(\bx^{\rf},\btheta^{\rf})$ is a solution and the Jacobian of the mapping $\bx\rightarrow \bx-\Bf(\bx,\btheta^{\rf})$ is invertible. Such a SSCM is uniquely solvable around this reference point and the local mapping	$\btheta \mapsto\bx^*(\btheta)$
%	\begin{align*}
%		\bx^*: \phantom{+}U_{\btheta}&\rightarrow U_{\bx}\\
%		\btheta &\mapsto\bx^*(\btheta)
%	\end{align*}
	is smooth. 
\end{prop}

%One sufficient condition, classically used in the design of invertible neural network, is to have a bound on the spectral norm $\|.\|_s$ of the Jacobian of the structural assignments.
%\begin{corol}\label{corol:spectnorm}
%	If $\|\frac{\partial f}{\partial \bx^*}(\bx^{ref},\btheta^{ref})\|_s<1$ the SCGM is uniquely solvable around this point.
%\end{corol}
%\michel{here potentially provide intuition with picture of diffeomorphism}
In the context of IO analysis presented in Section~\ref{sec:mrio}, the variables can be the sector's outputs and unit prices. For eq.~(\ref{eq:leontief}), the resulting SSCM thus contains the affine structural assignments associated to each component of $\bx$ 
%\vspace{-\topsep}
\[
\mathbb{S}=\{x_k\coloneqq\sum_j A_{kj} x_j +y_k\}\,,
\]
which are clearly smooth, and the $\{A_{kj},y_k\}$'s may be assumed fixed or free parameters within an interval.

\subsection{Lie interventions}
We will consider interventions parameterized by an element $u$ that turns the unintervened equilibrium solution $\bx^*(\btheta)$ into the \textit{intervened equilibrium solution} $\bx^{(u)}(\btheta)$ over a range of values of $\btheta$. In particular,
we define Lie interventions implemented through the action of a Lie group. 
\begin{defn}[Lie intervention]\label{def:lieInter}
	A Lie intervention on an SSCM $\mathcal{M}=(\mathcal{X},\mathcal{T},\mathbb{S},\mathcal{G})$ with a set of smooth structural assignments $\mathbb{S}$ is a pair $(L,\varphi)$ where $L$ is a Lie group and  a smooth group action $\varphi\,:\,L\times \mathbb{S}\rightarrow \mathbb{S}$. The action defines a family of intervened SSCMs $\mathcal{M}^{(g)}=(\mathcal{X},\mathcal{T},\varphi(g,\mathbb{S}),\mathcal{G})$, for $g$ in a neighborhood of the identity within $L$. 
\end{defn}
% a form of "parametric intervention" in Interventions and Causal InferenceFrederick Eberhardt1Department of Philosophy  and Richard Scheines
Note in particular that applying the identity element of the group leads to the original (unintervened) causal model.
Such interventions preserve unique solvability. 

%\bernhard{SSCM? (singular)}-> there is one for each group value
\begin{prop}[Solvability]\label{prop:liesolv}
	For a Lie intervention on a locally diffeomorphic SSCM, there is a neighborhood $U_{L}$ of the identity $e$ in $L$ such that the intervention is soft, the family of intervened SSCMs is locally uniquely solvable and the local mapping to the intervened solution $(g,\btheta) \mapsto \bx^{(g)}(\btheta)$ 
%	\begin{align*}
%		L\times\mathcal{T}&\rightarrow \mathcal{X}\\
%		(g,\btheta)&\mapsto \bx^{(g)}(\btheta)
%	\end{align*}
	is smooth.
\end{prop}

\begin{comment}
\michel{dimensionalities are also preserved under additional assumptions}

Lie interventions are thus well-suited to study interventions on cyclic causal models, as they do not modify the structure of the original sets of structural assignments, as long as the modifications they induce remain close to the identity. This is in line with 
limiting the magnitude of real-life interventions to modify the behavior of the system while avoiding triggering its uncontrollable reorganization.

It is even possible to learn more about the structure of the equilibrium solutions when we assume the Lie group action $\varphi$ associated to the interventions is transitive.\michel{shall we make this definition local?}
\begin{defn}[Transitive action]\label{dfn:transit}
A Lie intervention acts transitively on a SSCM  when for all pairs of parameters $(\btheta,\btheta')$ there is a $g\in L$ such that for all $j\in\{1,\dots,d\}$
$$
g \cdot f_j(\textbf{PA}_j,\theta_j), = f_j(\textbf{PA}_j,\theta_j')\, ,
$$
\end{defn}
Then the following result holds.
\begin{prop}[Constant rank theorem]
Consider the Lie intervention acting transitively on an SSCM that is locally uniquely solvable, then there is a neighborhood $U_{L}$ of the identity $e$ in $L$ such that the mapping
\begin{align*}
\mathcal{T}&\rightarrow \mathcal{X}\\
\btheta &\mapsto \bx^*_g(\btheta)
\end{align*}
has constant rank given by the rank of $\frac{\partial \Bf}{\partial \btheta}$.
\end{prop}
\begin{proof}
This is a simple application of \cite[Theorem 7.25]{lee2013smooth}, using the results of previous theorems: the map from $(g,\btheta)$ to the solution is smooth and it is equivariant with respect to the Lie intervention action, and to the action on the solution, defined for all $g'\in L$ as
\[
g' \cdot \bx^*_{g}(\btheta)=\bx^*_{g'g}(\btheta)\,.
\]	
\michel{check this is a well defined smooth action...}
Moreover, because
\[
\frac{\partial \bx^*}{\partial (.)} = \frac{\partial \Bf}{\partial (.)}(\bx^*,\by)+\frac{\partial \Bf}{\partial \bx^*}\frac{\partial \bx^*}{\partial(.)} \,,
\]
leads to 
\[
\left(Id - \frac{\partial \Bf}{\partial \bx}\right)\frac{\partial \bx^*}{\partial \btheta}=\frac{\partial \Bf}{\partial \btheta}(\bx^*,\by) \,,
\]
As by unique solvability the left hand side matrix is invertible on a neighborhood, then the rank of $\frac{\partial \bx^*}{\partial \btheta}$ is the same as the rank of $\frac{\partial \Bf^*}{\partial \btheta}(\bx^*,\by)$.

\end{proof}
Of course, transitivity requires a careful adjustment of the parameter structure to the Lie intervention. In the simplest case, one can start with a singleton parameter leading to a single set of structural assignement $S$ and define the parametric set of structural equations as its orbit $L\cdot S$.
This result can be used to investigate the dimensionality of the manifolds of solutions, such that appropriate optimization objectives are designed.

\michel{possibly go further by checking when we can have full rank, immersion, submersion..}

\begin{prop}
Given the above constant rank $r$, there exists a diffeomorphism 
\[
h:\mathcal{X}\rightarrow L
\]
such that $h(\bx^*_{1..r})=\bx^*_{1..r}$ in some neighborhood.
\end{prop}

\michel{can we have some results on ``bottlenecks'' of the system that need to be kept constant to limit maximal changes to the equilibrium.}

\end{comment}
%\subsection{Examples}

\paragraph{Multiplicative Lie interventions.}
A simple way of intervening on an arbitrary system is to multiply one selected assignment by a strictly positive scalar coefficient. We can consider $\R^*_+$ equipped with multiplication as a Lie group, that acts on  a node by rescaling its structural assignment. 
Several such \textit{scalar} Lie interventions can then be combined into a \textit{distributed} intervention on a set of nodes instead of a single one. A group element is then a strictly positive vector $\balpha>0$ acting on assignments indexed by $I$ such that
\[
\balpha \cdot \mathbb{S}_{| I} = \{x_k \coloneqq \alpha_k f_k(x_k,\theta_k), k\in  I\}\,.
\]
%\vspace{-\topsep}
%\[
%\alpha \cdot \{x_k \coloneqq f_k(x_k,\theta_k)\} = \{x_k \coloneqq \alpha f_k(x_k,\theta_k)\}\,.
%\]
In the context of Input-Output models presented in Section~\ref{sec:mrio}, applying this intervention can be seen as reducing or increasing the demand for products of specific sectors. Reducing the demand for a sector with large GHG emissions is for example a relevant objective for the transition to a sustainable economy and may be implemented by public policy in various ways (taxes, norms, ...). Such interventions are investigated in  industrial ecology \citep{wood2018}.

In the context of our guiding example, the influence of multiplicative interventions has an intuitive real world interpretation. However, \textit{shift interventions} (using the additive group, acting by addition on a structural assignment) may also be an easily interpretable choice in some settings, and have been exploited for causal inference \citep{rothenhausler2015backshift}. Moreover, some settings may require other classical, possibly multidimensional, Lie groups (e.g. \citet{besserve2018aistats} exploit the group of rotations of the $n$-dimensional Euclidean space $SO(n)$). 
Finally, in contexts where the model stems from a mechanistic model, e.g. relying on physics equations, Lie interventions that change meaningful model parameters may act on structural equations in more complex ways. 

%For example, this can model simultaneous interventions on multiple economic sectors that may be optimized to achieve a global objective for the economic system.

%\bernhard{is it deliberate that the comment following in the LaTeX should not appear in the pdf?}-> this is space optimization, maybe will move to supplemental...


%\paragraph{Distributed multiplicative interventions.}
%Several scalar Lie interventions can be combined to act on a set of nodes instead of a single one. A group element is a strictly positive vector $\balpha>0$ acting on assignments index by $I$ such that
%\[
%\balpha \cdot \mathbb{S}_{| I} = \{x_k \coloneqq \alpha_k f_k(x_k,\theta_k), k\in  I\}\,.
%\]
%Such interventions can be seen as a combination of scalar interventions with elements from different groups applied sequentially. As well, this can be seen as an intervention with a single element from a multidimensional group. 
%In the context of IO analysis, we might want to jointly intervene on several sector to have more degrees of freedom to achieve a specific goal. For example, one might want to increase the demand of a product/sector with low CO2 emission to compensate the a decrease in demand imposed to another sector with high emissions. Such \textit{demand substitution} scenario is a classic phenomenon in economy and may be leveraged by public policy.

\begin{figure*}
\begin{subfigure}{.32\linewidth}
	\includegraphics[width=\linewidth]{figures/comparInter.pdf}
	\subcaption{\label{fig:comparInter}}
\end{subfigure}
\hfill
		\begin{subfigure}{.33\linewidth}
	\includegraphics[width=\linewidth]{figures/multOptimv2.pdf}
	\subcaption{\label{fig:multOptim}}
\end{subfigure}
\hfill
	\begin{subfigure}{.32\linewidth}
	\includegraphics[width=\linewidth]{figures/invarInter.pdf}
		\subcaption{\label{fig:invarInter}}
	\end{subfigure}
	\caption{(a) Illustration of a compartmentalized intervention: enforcing invariance of the green nodes allows each compartment to be independently influenced by two (invariant) interventions $u$ and $v$. (b) Architecture for Lie intervention optimization. The equilibrium layer is controlled by intervention parameters and a loss is applied to its output. (c) Schematic representation of the procedure to learn invariant soft intervention ($y$: intervened node, $z$: invariant and auxiliary node). A multilayer perceptron (MLP) learns the soft intervention enforcing invariance of $z^{(u)}$ over a range of parameter values. \label{fig:design}}
\end{figure*}


\subsection{Invariant soft interventions}\label{sec:invar}
The rebound effect is paradigmatic of interventions that may trigger undesired effects that we wish to prevent. To this end, simultaneous interventions on other parts of a system have been considered in applications. For example, a rebound through prices can be prevented by a simultaneous auxiliary intervention of prices through taxes, such that the prices remain invariant to the overall intervention. Using the SSCM framework, we theoretically investigate the conditions under which some variables of the causal model can be maintained invariant to the Lie intervention on others. 


\paragraph{Motivating example.}
Consider the following SSCM with parameters $\btheta=(\tau,\alpha,\beta,\gamma)$ with distributed multiplicative Lie intervention $\boldsymbol{u}$:
\vspace{-\topsep}
\begin{equation}\label{eq:motivex}
\setlength{\jot}{0pt}
\left\{
\begin{array}{ccl}
     	x &= &\tau \,,\\
	y &= &u_y (\alpha x +\beta z) \,, \\
	z &=& u_z \gamma y \,.
\end{array}
\right.
\end{equation}
By choosing $u_z=\frac{1}{u_y}$, the intervened equilibrium solution component $z^{(u)}$ becomes insensitive to multiplicative interventions $(u_y,u_z)$, such that $z^{(\boldsymbol{u})}(\btheta)=z^*(\btheta)$ for arbitrary values of parameters $\btheta$ in a neighborhood of the reference parameter (see Appendix~\ref{app:add}). This result suggests that the influence of soft interventions ($u_y$ in this example) can be restricted to a subset of nodes, by choosing a second intervention ($u_z$ in this example) on an auxiliary variable. However, it is unclear whether this result still holds when the functional assignment of $z$ becomes non-linear. 

To frame this question in a general setting, we introduce a class of soft interventions under invariance constraint.
\begin{defn}[Invariant soft interventions]
	Given an SSCM with Lie intervention from group $L$ on node $i$. The intervention leaves node $j$ \textit{invariant} by \textit{leveraging} node $k$ if for all group elements $u$ in a neighborhood $\mathcal{N}$ of the identity, there exists a soft intervention on node $k$, $f_k^{(u)}(\parents_k,\btheta)$, replacing functional assignment $f_k$ such that the intervened node value $x^{(u)}_j$ satisfies $x^{(u)}_j(\btheta) = x^*_j(\btheta)$ in a neighborhood of the reference parameter. Node $i$ is called the intervened node, node $j$ is called the invariant node, and node $k$ is called the auxiliary node.
\end{defn}
\paragraph{Remarks:} The soft intervention property is key, as it entails that the use of an auxiliary variable to enforce the invariance constraint must only exploit the information available to this node as defined by its parents in the unintervened graph (and no parameter values). This constraint makes deployment more realistic in a complex system, as intervening does not require supervision by an external entity monitoring the whole system. Unless otherwise stated, the auxiliary node will be chosen identical to the invariant node.

Let us denote $\bx_{-j}$ and $\Bf_{-j}$ the vector and mapping with the $j$-th component removed. We also define two quantities important for the existence of such interventions. The partial derivative $\frac{\partial x^*_j}{\partial x_k }_{|\btheta=\btheta^{\rf}}$ is obtained by performing a hard intervention $x_k=\lambda$ leading to equilibrium value $x^{(\lambda)}_j(\btheta^{\rf})$, and computing the derivative $\frac{d x^{(\lambda)}_j}{d \lambda}_{|\lambda = x^*_k(\btheta^{\rf})}$. The Jacobian $J^{\btheta}_{x^*_{\parents_k}}(\btheta^{\rf})$ is the Jacobian of the mapping from the parameters $\btheta$ to the vector consisting of the parent nodes of $k$ at equilibrium. Based on these two quantities, we have the following sufficient condition.
\begin{prop}\label{prop:invar}
	Consider an SSCM locally diffeomorphic at $(\bx^{\rf},\btheta^{\rf})$ with intervened/invariant/auxiliary triplet of nodes $(i,j\neq i,k\neq i)$. If the Jacobian of the mapping $\bx_{-j}\rightarrow \bx_{-j}-\Bf_{-j}(\bx_{-j},\btheta^{\rf})$ is invertible, $J^{\btheta}_{x^*_{\parents_k}}(\btheta^{\rf})$ has full column rank, and $\frac{\partial x^*_j}{\partial x_k }_{|\btheta=\btheta^{\rf}}\neq 0$, then the intervention on $i$ leaves node $j$ invariant by leveraging node $k$. 
\end{prop} 
This result suggests that the motivating example of eq.~(\ref{eq:motivex}) can be extended, in a neighborhood of the identity, beyond the linear case, when the number of free parameters considered remains low relative to the number of parents of the auxiliary node. However, as can be seen in the proof, the soft intervention on the auxiliary variable is given by an implicit function theorem, suggesting non-parametric models are necessary to learn it (based e.g. on automated differentiation methods).  This will be described in Sec.~\ref{sec:design}.


\begin{figure*}[h]
	\begin{subfigure}{.33\linewidth}
	\includegraphics[width=\linewidth]{figures//eqerror.pdf}
		\subcaption{\label{fig:conv}}
	\end{subfigure}	
	\hfill
		\begin{subfigure}{.29\linewidth}
	\includegraphics[width=\linewidth]{figures//frUAI.pdf}
		\subcaption{\label{fig:countOpt}}
	\end{subfigure}	
	\hfill
	\begin{subfigure}{.29\linewidth}
	\includegraphics[width=\linewidth]{figures//deUAI.pdf}
		\subcaption{\label{fig:countOptDE}}
	\end{subfigure}	
	%\begin{subfigure}{.33\linewidth}
	%\includegraphics[width=\linewidth]{figures//invaExp.pdf}
	%	\subcaption{\label{fig:invaExp}}
	%\end{subfigure}	
	\caption{(a) Equilibrium relative error for different methods and SCM dimensions (solid: mean, dashed: mean+std). (b-c) Outcome of Lie intervention optimization on country models of GHG emission reduction in France (b) and Germany (c) for varying values of $\lambda$ in eq.~(\ref{eq:joboptim}), based on economic models estimated from different years. For year 2018, dashed lines indicates 5\% reduction in employment and the cross the corresponding $\lambda$ choice. Tables show sectors with largest employment reduction for 2018. \label{fig:exps}}
\end{figure*}


\subsection{Compartmentalized interventions}
Invariant interventions allow to restrict the propagation of effects to a subset of nodes. If a complex system can be partitioned into sparsely connected subsets of nodes, we can consider designing such interventions in order to modify the equilibrium values of each compartment  independently from each other. 
\begin{defn}
Given a partition of the SSCM nodes into $K$ compartments $\{C_k\}_{k=1,\dots,K}$. Given interventions on each compartment, parameterized by respective parameters
$u_k$, leading to the intervened SSCM equilibrium solution $\bx^{(u_1,\dots,u_k,\dots,u_K)}(\btheta)$. Interventions are compartmentalized when for all $k$, for all nodes
$j\in C_k$, component $\bx_j^{(u_1,\dots,u_k,\dots,u_K)}(\btheta)$
does not depend on $u_m$
for $m\neq k$.
%	Given a partition  of the SSCM nodes into compartments $\{C_k\}$.
%	Interventions are compartmentalized when they affect only values of a single compartment.
\end{defn}
The following result guarantees that if the nodes influencing other compartments are made invariant, interventions on each compartment can be designed and performed independently from each other as their effects remain confined to their own compartment.

\begin{prop}\label{prop:compart}
	Given a partition $\{C_k\}$ of the SSCM nodes. If for each compartment $k$ there exists one invariant soft intervention performed on structural equations such that intervened, auxiliary and invariant nodes belong to the compartment, and all nodes of this compartment that have an outgoing arrow pointing to a different compartment are invariant, then those interventions are compartmentalized. 
\end{prop}
\michel{following is unclear, maybe point to the full column rank condition limiting the number of free parameters}A fundamental aspect of this result is that, from the definition of invariant interventions, compartmentalization is valid over a range of parameters of the causal model (a neighborhood of the reference point) and a range of Lie interventions parameters (a neighborhood of the identity). This can be seen as a  %form of robustness of the interventional framework but also 
way to enforce interpretability of interventions by restricting their influence to a specific subsystem, at least for a range of parameter values.  An illustration of a setting compatible with Prop.~\ref{prop:compart} is provided in Fig.~\ref{fig:comparInter}, where the equilibria of two sparsely connected compartments are interdependent (notably, the causal ordering algorithm described in \citet{blom2020conditional} returns a single cluster merging both compartments). Enforcing invariance of the green nodes, each associated to one intervention ($u$ and $v$) within their compartment allows applicability of Prop.~\ref{prop:compart}.


\begin{comment}
\subsection{Causal ordering via interventional clustering} \label{sec:causOrdClus}
One can argue that the most general result regarding predicting outcome of interventions can be found in \cite{blom2020conditional}, where they show that the causal ordering graph, originally introduced by \cite{simon1953causal}, provides a set of clusters connected by directed edges that allows to predict the generic effect of soft interventions. \michel{add illstrative example (or in background...)} By using Lie interventions, we provide an differential criterion for identifying these clusters and their causal ordering. 
\begin{prop}\label{prop:causalOrdering}
The local causal ordering graph that predicts the generic effects of soft interventions corresponds to the perfect clustering of the support of the colons of the absolute Jacobian and their ordering by putting an edge from cluster $k$ to cluster $j$, whenever one element of cluster $2$ lies in the support of the colons of cluster $1$. 
\end{prop}
\michel{be more explicit/clear}
In the context of differentiable computational models, using the Jacobian to perform clustering may be more efficient that evaluating the effect of soft interventions on the graph. We will exploit this result to cluster nodes of the causal model.
\end{comment}

\begin{comment}
\subsection{Constrained Lie interventions*}
High level idea: in order to allow more causality interpretable intervention, we consider ``breaking'' a cycle by designing interventions that won't propagate.

A good target is a node that once removed, creates finer clusters in the causal ordering graph. Then maintaining its value entails a new constraint on the graph, and we design distributed interventions that make sure this constraint still holds....
This entails designing a latent mapping from the reduced degrees of freedom to the original nodes, where we can imagine enforcing different properties for the interventions...

Existence of cluster restricted distributed interventions
\end{comment}
%\subsection{Steerability of Lie interventions} \label{sec:causOrdClus}



\section{%Differentiable
Intervention design}\label{sec:design}
To address \textit{Challenge 2} of Sec.~\ref{sec:mrio}, we design interventions with implicit layers (see Appendix~\ref{app:meth} for additional details). 
\paragraph{Differentiable architecture.}
Base optimization relies on a differentiable architecture comprising one central module representing the cyclic SSCM. Essentially, the cyclic model is represented by an equilibrium layer following \cite{bai2019deep}, schematized in Fig.~\ref{fig:deepEq}: the differentiable module is designed such that forward and backward passes through the equilibrium layer use Anderson acceleration to solve a fixed point equation. This equilibrium layer is cascaded if necessary with parametric layers to achieve specific goals. The architectures are implemented using the PyTorch library.

\paragraph{Lie intervention optimization}
We design an architecture around the equilibrium module to optimize multiplicative intervention according to a loss, as represented in Fig.~\ref{fig:multOptim}. Parameters $\boldsymbol{u}$ of the Lie group element are optimized in order to minimize an objective $\mathcal{L}(\bx^{(u)})$ achieved by the equilibrium solution of the SSCM. %, and possibly controlled by external parameters. 
This objective may include an additional regularization term, $D(\bx^{(u)},\bx^*)$ with regularization parameter $\lambda$, to enforce that some properties of the intervened system remain invariant or close to the original, non-intervened, equilibrium solution $\bx^*$. 

\paragraph{Learning invariant interventions.}
In order to enforce invariance of interventions based on Sec.~\ref{sec:invar}, we follow the procedure exemplified in Fig.~\ref{fig:invarInter}. We design two implicit layers with shared parameters $\btheta$, the first layer being unintervened giving the corresponding equilibrium values of the nodes, and the second one being invariantly intervened, for a fixed value of Lie intervention $u$ on the intervened node. In the intervened layer, we replace putative incoming arrows from the invariant node by arrows from the same node in the unintervened graph (as this replacement encodes the invariance assumption) and we replace the functional assignment of the auxiliary node by a Multi-Layer Perceptron (MLP), relying on universal approximation properties to learn a soft intervention that satisfies invariance. We use a least square loss between the intervened and unintervened equilibrium values of the invariant node in order to train the MLP.


\begin{comment}
\subsection{Clustering of equilibrium variables}\label{sec:causClust}
As explained in Section~\ref{sec:causOrdClus}, theory suggests that variables of the cyclic models may be clustered to obtain a coarser graph, where soft interventions can be read directly. In order to perform such coarsening, that can help interpretability and can inform intervention design, we may use several approaches:
\begin{itemize}
\item \textbf{Random Intervention Clustering}: perform random scalar Lie interventions on each node (by drawing at random Lie group elements in a neighborhood of the identity), and measure the magnitude of the causal effect by the mean square distance to the unperturbed points,
\item \textbf{Closed loop Lie derivative Clustering}: estimate the Lie derivative at the identity element $e$ of the equilibrium point for the scalar Lie interventions on  each node in a neighborhood of the equilibrium point, and measure the magnitude of the causal effect with the mean squared Jacobian components,
\item \textbf{Open loop Jacobian Clustering}: as justified by Proposition~\ref{prop:causalOrdering}, the Jacobian with respect to the variables of the causal model should reflect Lie interventions, and avoids evaluating backpropagations through the equilibrium equation with iterative techniques. This can be easily evaluate with automatic differentiation tools, and the magnitude of the causal effect can again be measured with the mean square of each Jacobian component.
\end{itemize}
\end{comment}
%\subsection{Learning intervention on clusters}

\begin{comment}
Effect of linear scalar interventions reflect the causal ordering, and are reflected by the block diagonal structure of the original jacobian 
-> we can cluster the lines/columns of the jacobian, but colums make more sense according the taylor expansion of the inverse and that the effects mediated depend on the actual variation of the effect... (that is more for self consistency between interventions and gradients?, actually one good experiment would be to compare those effects, and using either deep equil or the analytical expression)

how to cluster: nmf for example? makes sense because positive

side: we can combine A and its powers, who should have the same block triangular structure

once the number of clusters is fixed, %we can determine the ordering by the summed variation  outside the cluster (or just the max variation outside the cluster?), or use ordering by inclusion...
we can set the support using the ratio of the variation within cluster (approximated using the trace or operator norm, trace appear for lie algebra I think) to the *effect* variation (this makes sense because they should be proportional

effect might vanish downstream, but it is okay because if effects are mediated the ordering will be correct

the graph between nodes is set be the inclusion order between cluster (bidirectional arrow if necessary)

it makes more sense to cluster the A array than the resulting graph because it reflects the dependencies that we can cut. 
Side: We could show that independence of mechanisms entails that the number of coefficients typically increases and does not decrease (just use the continuity of the map and lesgues measure?).


side: we can also iteratively recompute the downstream influence of each block, by computing the jacobian operator norm (largest singular value), under indpendence of mechansism assumptions that should yield a good downstream perturbation...
however this may require normalisation of the self-consistency equation, as we get $(1-A)^-1 x^*$ for the variation...
this mean that converting $x=f(x)$ to $alpha x = f(alpha^{-1} \alpha x)=\tilde{f}(alpha x)$ might make sense...

side: we can think of two-way clustering of the matrice (netflix related stuff?)
\end{comment}


%\subsection{Learning distributed Lie interventions}
\begin{comment}
learn an upstream mapping that minimizes the downstream effect,
we know there should exist one because (under some assumption on the Lie group action), the equilibrium point can be diffeormorphically driven anywhere.
\end{comment}

\section{Experiments}\label{sec:exp}
The following toy and semi-synthetic experiments illustrate how our framework contributes addressing sustainability challenges exposed in Sec.~\ref{sec:mrio}. %\michel{Supplemental experiments can be found in Appendix~\ref{app:expe}.}
\paragraph{Evaluation of equilibrium estimation.} We first evaluate the performance of equilibrium layers in computing an accurate estimate of the SSCM solution $\bx^*$. For that we use the SSCM associated to the economic equilibrium of equation~(\ref{eq:leontief}) where we select a subset of sectors in order to vary the dimension of $A$. The full matrices $A$, as well as the final demands $\by$ are estimated from the Exiobase 3 dataset \citep{stadler2018exiobase} for years 2012-2018, using the \textit{Pymrio} library \citep{stadler2021pymrio} for five countries (France, Germany, Italy, USA, Great-Britain). We compare Anderson acceleration (see \citep{walker2011anderson}) for two different choices of the mixing parameter $\beta$, together with the baseline forward iteration approach that simply consists in iterating $\bx_{k+1}=\Bf(\bx_k)$. For each choice of dimension and fixed-point iteration algorithm, we compute the relative error
$
\frac{\|\bx^*-\Bf_{\btheta}(\bx^*)\|}{\|\bx^*\|}\,.
$
The results, averaged across countries and years, show that although forward iteration is the most accurate in lower dimensions, Anderson acceleration with a relaxation parameter $\beta=2.0$ performs better for SSCM dimensions larger than 50. Interestingly, Anderson acceleration with $\beta=1.0$ gives the worst performance, suggesting an appropriate choice of $\beta$ is key. 

\paragraph{Optimization of multiplicative Lie interventions.} In order to investigate \textit{Challenge 1}, we optimize the IO demand driven model of eq.~(\ref{eq:leontief}). The matrices $A$ and $R$, as well as the final demands $\by$ and sector output at equilibrium $\bx^*$ are estimated from yearly activity available in the Exiobase 3 dataset \citep{stadler2018exiobase}, using the \textit{Pymrio} library \citep{stadler2021pymrio}. While the data describes economic interactions across multiple countries, we design an economic equilibrium model of each country by neglecting those interactions, and extracting the blocs of matrices $A$ and $R$ relevant to the country under consideration. We design a distributed multiplicative Lie intervention on the activity of all 200 sectors of the database. The coefficient vector $\boldsymbol{\alpha}$ is then optimized in order to reduce the overall greenhouse gas (GHG) emissions cumulated across sectors (estimated by one component of the stressor vector $\bs$), while enforcing that the overall employment distribution over the sectors stays closest to the non-intervened economy, in order to mitigate challenges associated to reorganizing of economic activities (e.g. mass unemployment and the need for large scale professional reorientation programs). Using the $\ell_1$ norm for regularization, this leads to the following loss:
\begin{equation}\label{eq:joboptim}
\mathcal{L}(\boldsymbol{u}) = \boldsymbol{c}^\top \bx^{(u)} + \lambda \|\be^{(u)}-\be^*\|_1
\end{equation}
where  $\boldsymbol{c}$ is the GHG emission intensity of each sector, and $\be^{(u)}$ and $\be^*$ the intervened and unintervened distributions of employment across sectors (estimated by entry wise multiplication of $\bx^*$ with one row of matrix $R$). 
%Addiona experiments are provided in Appendix~\ref{app:expe}.
The graphs shown in Figs.~\ref{fig:countOpt}-\ref{fig:countOptDE} (top), illustrate the trade off between employment preservation and GHG emission reduction achieved by varying $\lambda$ for two different countries. Interestingly, the left tail of these curves reflect differences across countries, with Germany having less room than France for reducing emissions before starting reducing employment significantly. The sectors yielding the largest employment reduction also differ across countries, likely influenced both by the overall structure of each economy. %, and technology-based differences in GHG intensities. %achieved by each country in a given sector. 
%
%\begin{figure}
%	\begin{subfigure}{.45\linewidth}
%		
%	\end{subfigure}
%	\hfill
%	\begin{subfigure}{.45\linewidth}
%		\includegraphics[width=\linewidth]{figures/clustIte.pdf}
%		\subcaption{\label{fig:clustIte}}
%	\end{subfigure}
%	\caption{Toy experiments. (a)   (b) . Shaded areas indicate standard error (N=20).}
%\end{figure}
%
%\paragraph{Estimation of the clustered causal ordering graph}
%We compared the three different approaches proposed in Section~\ref{sec:causClust} in order to estimate the causal ordering graph and its associated clusters. As shown in Fig.~\ref{fig:clustIte}. While random intervention reflect well the true underlying causal structure, closed loop Lie derivative estimation is too noisy to reflect well this structure. The open loop Jacobian, as predicted by theory, is very close to the ground truth (errro comes from few non-zero value below the threshold).


%
%\subsection{Design optimal transition scenarios in a multi-sector economy}
%In Appendix~\ref{app:}, we illustrate how the above methods can be applied to design scenario based on actual macro-economic data. We use the EXOBASE 3 multi-regional input-ouput (MRIO) database collecting flows of goods across a large number of countries in order to design a data-based IO model, from which we can design and compare transition scenarios. 

\paragraph{Control of rebound effects.}
\begin{figure}
	\begin{subfigure}{.49\linewidth}
		\includegraphics[width=\linewidth]{figures/invar_price.png}
		\subcaption{\label{fig:invarprice}}
	\end{subfigure}
	\hfill
	\begin{subfigure}{.49\linewidth}
		\includegraphics[width=\linewidth]{figures/invar_energy.png}
		\subcaption{\label{fig:invarenergy}}
	\end{subfigure}
	\caption{Outcome of (non-invariant) Lie and invariant interventions on energy efficiency, compared to reference (unintervened) values, in the rebound model described in Fig.~\ref{fig:pricemod}: (a) unit price of the target sector, (b) total energy demand.\label{fig:priceExp}}
\end{figure}

\begin{figure*}
	\hspace*{1cm}
	\begin{subfigure}{.4\linewidth}
		\includegraphics[width=\linewidth]{figures/compart_invar.pdf}
		\subcaption{\label{fig:compart_invar}}
	\end{subfigure}
	\hfill
	\begin{subfigure}{.4\linewidth}
		\includegraphics[width=\linewidth]{figures/compart_inter.pdf}
		\subcaption{\label{fig:compart_inter}}
	\end{subfigure}
\hspace*{1cm}
	\caption{ (a-b) Outcome of the design of compartmentalized interventions for model of Fig.~\ref{fig:comparInter}. The unintervened node values (in black) are compared to invariant interventions (II, solid lines) and their corresponding Lie intervention (LI, dashed lines) (without enforcing invariance), for multiple values of the Lie interventions' parameters $(u,v)$. (a) Value of the invariant node in compartment 1. (b) Value of the intervened node in compartment 1.
		\label{fig:comparExp}}
\end{figure*}


To illustrate how \textit{Challenge 3} of Sec.~\ref{sec:mrio} can be addressed, we used our invariant intervention framework to prevent price rebound effects. We use a toy 3-sector model, with one energy sector and one target sector for which energy efficiency is increased, modeled by a multiplicative Lie intervention on the energy requirements coefficient of the Leontief matrix. The final demand of this target sector is taken as invariant node, and controlled by softly intervening on it through a modification of the unit price of this sector. The invariant intervention is learnt using an MLP with two hidden layers (see Appendix~\ref{app:meth} for details). Fig.~\ref{fig:pricemod} describes the two quantities that are intervened on: we make a multiplicative Lie intervention on the parameter represented by the red node (energy efficiency), and make sure there is no rebound by making the node
invariant to the drop of energy costs using an adaptive taxing policy. Fig.~\ref{fig:invarprice}-\ref{fig:invarenergy} compares 3 models: unintervened (called ``reference'' in the figure), Lie intervened (without enforcing invariance), and invariantly intervened. 
For a range of one parameter left free in the Leontief matrix, the results show the invariant intervention maintains the price close to the unintervened model (Fig.~\ref{fig:invarprice}), while this price is much lower for the Lie intervention (due to the rebound effect). The benefit of  invariance is demonstrated by the effect on the activity of overall energy demand of the economy (Fig.~\ref{fig:invarenergy}): for the Lie intervention, the rebound through prices leads to the so-called \textit{backfire} scenario: the actual energy savings are negative because usage increased beyond potential savings. In contrast, invariant intervention leads to a reduction of energy demand (relative to the unintervened system), as the rebound through prices is prevented. 

\paragraph{Compartmentalized interventions design.}
We further implement compartmentalized interventions and show its benefits for addressing \textit{Challenge~2} in Sec.~\ref{sec:mrio} in multi-sector economic models. We design a two compartment Leontiev model according to Fig.~\ref{fig:comparInter}. We optimize two invariant interventions, $u$ on compartment 1 and $v$ on compartment 2, to follow the conditions of Prop.~\ref{prop:compart}. The results provided in Fig.~\ref{fig:comparExp}, show  the invariant node of compartment 1 is unchanged by both values of $u$ and $v$  (Fig.~\ref{fig:compart_invar}), while the intervened node of this compartment changes value only as a function of its corresponding intervention $u$ (Fig.~\ref{fig:compart_inter}), in a way similar to the (non-invariant) Lie intervention. 



\vspace*{-.3cm}%\michel{intervene on decoder and encoder to preserve realistic image?}
\section{Discussion}
We discuss here some limitations of our approach.
\paragraph{Linearity of the economic model}
%The linearity of economic models widely varies depending on the considered subfield and mechanism under study. Such modeling choices are subjected to similar tradeoffs as in other fields: possibility of analytical treatment (interpretability), identifiability of the parameters from observational data, etc… 
The linear input-output model that we use should be understood as one way of modeling interactions between economic sectors, commonly used in environmental economics. It was chosen for its interpretability, illustrative purpose, practical relevance, and because there are established approaches to estimate parameters from economic data. However, it should not be understood that economic models are always linear. Note also that we combine this model with a non-linear demand mechanism to study rebound effects (see Challenge 3 and Section 5). Overall, moving towards non-linear models, as allowed by our setting, is in line with the development of computational models in economy, and notably Integrated Assessment Models (IAM) investigating the complex interactions between climate change and societies. 
\paragraph{The case of multiple equilibria.}
The equilibrium picked by the equilibrium layer depends on the initialization of the estimate of the equilibrium point in the fixed point iteration algorithm implemented by this layer (this can be described using the notion of ``basin of attraction''). While the theory and algorithms developed in this paper focus on the behavior of the causal model in a neighborhood of an given unintervened equilibrium, a prealable grid search for all equilibria may be performed in the most general setting. This may be avoided for the following reasons. From a theoretical perspective, conditions of existence and uniqueness of equilibria are available for many classical models. For example in our application, the Hawkins–Simon condition guarantees the existence of a non-negative output vector that solves the equilibrium relation \citep{hawkins1949note}. 
%Moreover, a straightforward way to check uniqueness of the equilibrium in the setting of equation (1) is assessing invertibility of the matrix in equation (3). 
From a practical perspective, we are often interested mainly in intervening on the empirically observed equilibrium. For models based on unintervened observed data, we can thus check that the simulated unintervened equilibrium matches the observed data. If however there is a mismatch between the equilibrium obtained by the deep equilibrium layer and the one we are interested in, we can enforce the initialization of the fixed point iteration algorithm in the neighborhood of the expected equilibrium. Our experiments were run with a fixed initialization of the equilibrium point (zero). 

\section{Conclusion}
\vspace*{-.2cm}
We introduced a differentiable soft intervention design framework for general equilibrium systems. We argue those are more likely to approximate deployable interventions in real-world complex systems, e.g. to address key challenges of the transition to sustainable economies. Theoretical results and algorithmic tools are provided to design interventions with desirable invariance properties under the assumption that the considered system is in equilibrium and model parameters are known. Further work in this direction will need to address identifiability of the considered models from observational or experimental data. %, and extend the result to non-equilibrium settings. 

%\section*{Acknowledgments}
%The authors would like to thank Philipp Geiger for insightful discussions. This work was supported by the German Federal Ministry of Education and Research (BMBF): T\"ubingen AI Center, FKZ: 01IS18039B; and by the Machine Learning Cluster of Excellence, EXC number 2064/1 - Project number 390727645. 

\begin{acknowledgements} 
MB is grateful to Philipp Geiger for insightful discussions. This work was supported by the German Federal Ministry of Education and Research (BMBF): T\"ubingen AI Center, FKZ: 01IS18039B; and by the Machine Learning Cluster of Excellence, EXC number 2064/1 - Project number 390727645. 
\end{acknowledgements}


%\bibliographystyle{abbrvnat}
\bibliography{cyclic}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\end{document}

