Abstract: High sample complexity hampers the successful application of reinforcement learning methods, especially in real-world problems where simulating complex dynamics is computationally demanding. Influence-based abstraction (IBA) was proposed to mitigate this issue by breaking down the global model of large-scale distributed systems, such as traffic control problems, into small local sub-models. Each local model includes only a few state variables and a representation of the influence exerted by the external portion of the system. This approach allows converting a complex simulator into local lightweight simulators, enabling more effective applications of planning and reinforcement learning methods. However, the effectiveness of IBA critically depends on the ability to accurately approximate the influence of each local model. While there are a few examples showing promising results in benchmark problems, the question of whether this approach is feasible in more practical scenarios remains open. In this work, we take steps towards addressing this question by conducting an extensive empirical study of learning models for influence approximations in various realistic domains, and evaluating how these models generalize over long horizons. We find that learning the influence is often a manageable learning task, even for complex and large systems. Additionally, we demonstrate the efficacy of the approximation models for long-horizon problems. By using short trajectories, we can learn accurate influence approximations for much longer horizons.
Submission Length: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=Jn3hvY5FZv
Changes Since Last Submission: We would like to recommend Pascal Poupart as Action Editor, as he acted in this role for our last submission of this paper.
We addressed the concerns of the reviewers as follows:
**1. Formal description of the IBA concepts, influence learning problem and local model.**
“...no formal definition of what constitutes a local model nor how the local, source and external variables are obtained/defined. The notion of influence learning is not made precise either”; “ the main problem with the paper is a lack of clarity regarding the notion of local models and influence learning”;“Could the authors provide a more precise definition of influence learning, preferably in a mathematical form?”)
- We include the formal definitions for the IBA concepts (state decomposition, influence, influence augmented local model) in Section 2.3 Influence-based abstraction.
- We formally introduce the influence learning task in Section 2.4 Approximate influence-based abstraction.
- We add Section 2.5 State decomposition and a paragraph in Section 4.3 Key observations and limitations, to discuss the limitations related to the choice of the local model and give an intuition on potential trade-offs.
- We restructure Section 2. Background in subsections and include the running example in it to enhance the clarity of the notions introduced.
**2. Conciseness and readability of the paper.** “the paper is long and repetitive in some sections”; “Some of the writing could be sharpened and condensed… move some of the discussions to the appendix and keep the main paper shorter and more focused.”; “the main part of the paper is 23 pages long (references excluded) and the same messages could be delivered in half this size”;“one could describe the evaluation environments in a few lines and put the long description in appendices… Most images are too large, etc”
- We critically revise the entire manuscript to improve conciseness, avoid repetitions and improve the language (length of the main part of the paper reduced from 23 to 17 pages).
- We rearrange the figures more compactly and change colors to make them more distinguishable.
- We move the details on the empirical domains in the appendix and replace them with a shorter description.
- We apply significant changes to the paper structure: we aligned the subsections of the section explaining the experiments (Section 3. Empirical study of influence learning) and the section presenting the results (Section 4. Results and discussion); we restructure the background in subsections.
**3. Clarity on the contribution of the paper.** “There is no proposed/new algorithm. Hence, the contribution is the empirical evaluation of the claims.”;“Could the authors elaborate on the challenges associated with influence learning for long-horizon tasks?”
- We explicitly mention the contributions of the paper as bullet points in 1. Introduction and in the introduction of Section 3. Empirical study of influence learning.
- We summarize the main observations/contributions in 4.3 Key observations and limitations.
- We sharpen the motivation for studying long-horizon problems and related difficulties in 1.Introduction and the introduction of Section3. Empirical study of influence learning.
**4. Limitations of the work.** “limitations of the empirical evaluation should be stated”;“Discussion on Limitations: Provide a more detailed discussion on the limitations of the proposed method”
- We add one paragraph in Section 4.3 Key observations and limitations, to discuss limitations.
**5. Improve explanation on specific technical details.**
- “The reviewer is not convinced that modeling an environment with multiple interacting agents as a POMDP is appropriate when other agents' policies are fixed.” We clarify this approach in Section 2.1 Factored POMDPs and best response problems, and refer to previous works that adopt the same perspective.
- “The reviewer is confused about the d-set, how did the authors determine the formula of dset”. We clarify that the d-set is defined uniquely according to the definition of d-separation for causal graph in a 2DBN once a local model decomposition has been chosen in Section 2.3 Influence-based abstraction.
- “what if the variables are not binary? Can you extend the 2TBN formalism to deal with environments shoing scalar or continuous variables?” We specify in Section 4.3 Key observations and limitation, that even though the IBA formalism can be naturally extended to continuous state variables (and distributions over continuous variables), we limit our study to discrete variables.
- “the choice of highlighted results in Tables 1, 2 and 3 is only explained late, it should be explained in the caption of these tables, as it does not correspond to the most standard choice (highlighting the best performances).” In the introduction of Section 4.1 Comparison of learning models, we elaborate on how we choose the network size as one Pareto optimal solution considering performance and training time.
Assigned Action Editor: ~Pascal_Poupart2
Submission Number: 3337
Loading