"index","Title","Authors","Affiliations","pdf_url","abstract"
"13030","Ranking Sets of Objects: The Complexity of Avoiding Impossibility Results","Jan Maly","TU Wien","https://www.jair.org/index.php/jair/article/download/13030/26753","The problem of lifting a preference order on a set of objects to a preference order on a family of subsets of this set is a fundamental problem with a wide variety of applications in AI. The process is often guided by axioms postulating properties the lifted order should have. Well-known impossibility results by Kannai and Peleg and by Barbera and Pattanaik tell us that some desirable axioms – namely dominance and (strict) independence – are not jointly satisfiable for any linear order on the objects if all non-empty sets of objects are to be ordered. On the other hand, if not all non-empty sets of objects are to be ordered, the axioms are jointly satisfiable for all linear orders on the objects for some families of sets. Such families are very important for applications as they allow for the use of lifted orders, for example, in combinatorial voting. In this paper, we determine the computational complexity of recognizing such families. We show that it is \Pi_2^p-complete to decide for a given family of subsets whether dominance and independence or dominance and strict independence are jointly satisfiable for all linear orders on the objects if the lifted order needs to be total. Furthermore, we show that the problem remains coNP-complete if the lifted order can be incomplete. Additionally, we show that the complexity of these problems can increase exponentially if the family of sets is not given explicitly but via a succinct domain restriction. Finally, we show that it is NP-complete to decide for a family of subsets whether dominance and independence or dominance and strict independence are jointly satisfiable for at least one linear order on the objects."
"13153","Online Relaxation Refinement for Satisficing Planning: On Partial Delete Relaxation, Complete Hill-Climbing, and Novelty Pruning","Maximilian Fickert, Jörg Hoffmann","Saarland University, Saarland University","https://www.jair.org/index.php/jair/article/download/13153/26754","In classical AI planning, heuristic functions typically base their estimates on a relaxation of the input task. Such relaxations can be more or less precise, and many heuristic functions have a refinement procedure that can be iteratively applied until the desired degree of precision is reached. Traditionally, such refinement is performed offline to instantiate the heuristic for the search. However, a natural idea is to perform such refinement online instead, in situations where the heuristic is not sufficiently accurate. We introduce several online-refinement search algorithms, based on hill-climbing and greedy best-first search. Our hill-climbing algorithms perform a bounded lookahead, proceeding to a state with lower heuristic value than the root state of the lookahead if such a state exists, or refining the heuristic otherwise to remove such a local minimum from the search space surface. These algorithms are complete if the refinement procedure satisfies a suitable convergence property. We transfer the idea of bounded lookaheads to greedy best-first search with a lightweight lookahead after each expansion, serving both as a method to boost search progress and to detect when the heuristic is inaccurate, identifying an opportunity for online refinement. We evaluate our algorithms with the partial delete relaxation heuristic hCFF, which can be refined by treating additional conjunctions of facts as atomic, and whose refinement operation satisfies the convergence property required for completeness. On both the IPC domains as well as on the recently published Autoscale benchmarks, our online-refinement search algorithms significantly beat state-of-the-art satisficing planners, and are competitive even with complex portfolios."
"13350","Jointly Learning Environments and Control Policies with Projected Stochastic Gradient Ascent","Adrien Bolland, Ioannis Boukas, Mathias Berger, Damien Ernst","University of Liège, , , ","https://www.jair.org/index.php/jair/article/download/13350/26757","We consider the joint design and control of discrete-time stochastic dynamical systems over a finite time horizon. We formulate the problem as a multi-step optimization problem under uncertainty seeking to identify a system design and a control policy that jointly maximize the expected sum of rewards collected over the time horizon considered. The transition function, the reward function and the policy are all parametrized, assumed known and differentiable with respect to their parameters. We then introduce a deep reinforcement learning algorithm combining policy gradient methods with model-based optimization techniques to solve this problem. In essence, our algorithm iteratively approximates the gradient of the expected return via Monte-Carlo sampling and automatic differentiation and takes projected gradient ascent steps in the space of environment and policy parameters. This algorithm is referred to as Direct Environment and Policy Search (DEPS). We assess the performance of our algorithm in three environments concerned with the design and control of a mass-spring-damper system, a small-scale off-grid power system and a drone, respectively. In addition, our algorithm is benchmarked against a state-of-the-art deep reinforcement learning algorithm used to tackle joint design and control problems. We show that DEPS performs at least as well or better in all three environments, consistently yielding solutions with higher returns in fewer iterations. Finally, solutions produced by our algorithm are also compared with solutions produced by an algorithm that does not jointly optimize environment and policy parameters, highlighting the fact that higher returns can be achieved when joint optimization is performed."
"12440","Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning","Rodrigo  Toro Icarte, Toryn Q.  Klassen, Richard Valenzano, Sheila A. McIlraith","University of Toronto and Vector Institute, University of Toronto and Vector Institute, Element AI, University of Toronto and Vector Institute","https://www.jair.org/index.php/jair/article/download/12440/26759","Reinforcement learning (RL) methods usually treat reward functions as black boxes. As such, these methods must extensively interact with the environment in order to discover rewards and optimal policies. In most RL applications, however, users have to program the reward function and, hence, there is the opportunity to make the reward function visible – to show the reward function’s code to the RL agent so it can exploit the function’s internal structure to learn optimal policies in a more sample efficient manner. In this paper, we show how to accomplish this idea in two steps. First, we propose reward machines, a type of finite state machine that supports the specification of reward functions while exposing reward function structure. We then describe different methodologies to exploit this structure to support learning, including automated reward shaping, task decomposition, and counterfactual reasoning with off-policy learning. Experiments on tabular and continuous domains, across different tasks and RL agents, show the benefits of exploiting reward structure with respect to sample efficiency and the quality of resultant policies. Finally, by virtue of being a form of finite state machine, reward machines have the expressive power of a regular language and as such support loops, sequences and conditionals, as well as the expression of temporally extended properties typical of linear temporal logic and non-Markovian reward specification."
"13304","Doubly Robust Crowdsourcing","Chong Liu, Yu-Xiang Wang","University of California, Santa Barbara, ","https://www.jair.org/index.php/jair/article/download/13304/26760","Large-scale labeled dataset is the indispensable fuel that ignites the AI revolution as we see today. Most such datasets are constructed using crowdsourcing services such as Amazon Mechanical Turk which provides noisy labels from non-experts at a fair price. The sheer size of such datasets mandates that it is only feasible to collect a few labels per data point. We formulate the problem of test-time label aggregation as a statistical estimation problem of inferring the expected voting score. By imitating workers with supervised learners and using them in a doubly robust estimation framework, we prove that the variance of estimation can be substantially reduced, even if the learner is a poor approximation. Synthetic and real-world experiments show that by combining the doubly robust approach with adaptive worker/item selection rules, we often need much lower label cost to achieve nearly the same accuracy as in the ideal world where all workers label all data points."
"12332","Preferences Single-Peaked on a Tree: Multiwinner Elections and Structural Results","Dominik Peters, Lan Yu, Hau Chan, Edith Elkind","University of Oxford, , University of Nebraska–Lincoln, University of Oxford","https://www.jair.org/index.php/jair/article/download/12332/26761","A preference profile is single-peaked on a tree if the candidate set can be equipped with a tree structure so that the preferences of each voter are decreasing from their top candidate along all paths in the tree. This notion was introduced by Demange (1982), and subsequently Trick (1989b) described an efficient algorithm for deciding if a given profile is single-peaked on a tree. We study the complexity of multiwinner elections under several variants of the Chamberlin–Courant rule for preferences single-peaked on trees. We show that in this setting the egalitarian version of this rule admits a polynomial-time winner determination algorithm. For the utilitarian version, we prove that winner determination remains NP-hard for the Borda scoring function; indeed, this hardness results extends to a large family of scoring functions. However, a winning committee can be found in polynomial time if either the number of leaves or the number of internal vertices of the underlying tree is bounded by a constant. To benefit from these positive results, we need a procedure that can determine whether a given profile is single-peaked on a tree that has additional desirable properties (such as, e.g., a small number of leaves). To address this challenge, we develop a structural approach that enables us to compactly represent all trees with respect to which a given profile is single-peaked. We show how to use this representation to efficiently find the best tree for a given profile for use with our winner determination algorithms: Given a profile, we can efficiently find a tree with the minimum number of leaves, or a tree with the minimum number of internal vertices among trees on which the profile is single-peaked. We then explore the power and limitations of this framework: we develop polynomial-time algorithms to find trees with the smallest maximum degree, diameter, or pathwidth, but show that it is NP-hard to check whether a given profile is single-peaked on a tree that is isomorphic to a given tree, or on a regular tree."
"12889","A Survey of Opponent Modeling in Adversarial Domains","Samer Nashed, Shlomo Zilberstein","UMass Amherst, UMass Amherst","https://www.jair.org/index.php/jair/article/download/12889/26762","Opponent modeling is the ability to use prior knowledge and observations in order to predict the behavior of an opponent. This survey presents a comprehensive overview of existing opponent modeling techniques for adversarial domains, many of which must address stochastic, continuous, or concurrent actions, and sparse, partially observable payoff structures. We discuss all the components of opponent modeling systems, including feature extraction, learning algorithms, and strategy abstractions. These discussions lead us to propose a new form of analysis for describing and predicting the evolution of game states over time. We then introduce a new framework that facilitates method comparison, analyze a representative selection of techniques using the proposed framework, and highlight common trends among recently proposed methods. Finally, we list several open problems and discuss future research directions inspired by AI research on opponent modeling and related research in other disciplines."
"13200","Explainable Deep Learning: A Field Guide for the Uninitiated","Gabrielle Ras, Ning Xie, Marcel van Gerven, Derek Doran","Radboud University, , , ","https://www.jair.org/index.php/jair/article/download/13200/26763","Deep neural networks (DNNs) are an indispensable machine learning tool despite the difficulty of diagnosing what aspects of a model’s input drive its decisions. In countless real-world domains, from legislation and law enforcement to healthcare, such diagnosis is essential to ensure that DNN decisions are driven by aspects appropriate in the context of its use. The development of methods and studies enabling the explanation of a DNN’s decisions has thus blossomed into an active and broad area of research. The field’s complexity is exacerbated by competing definitions of what it means “to explain” the actions of a DNN and to evaluate an approach’s “ability to explain”. This article offers a field guide to explore the space of explainable deep learning for those in the AI/ML field who are uninitiated. The field guide: i) Introduces three simple dimensions defining the space of foundational methods that contribute to explainable deep learning, ii) discusses the evaluations for model explanations, iii) places explainability in the context of other related deep learning research areas, and iv) discusses user-oriented explanation design and future directions. We hope the guide is seen as a starting point for those embarking on this research field."
"13280","Automatic Recognition of the General-Purpose Communicative Functions Defined by the ISO 24617-2 Standard for Dialog Act Annotation","Eugénio Ribeiro, Ricardo Ribeiro, David Martins de Matos","INESC-ID / Instituto Superior Técnico, INESC-ID / Instituto Universitário de Lisboa (ISCTE-IUL), INESC-ID / Instituto Superior T´écnico","https://www.jair.org/index.php/jair/article/download/13280/26764","From the perspective of a dialog system, it is important to identify the intention behind the segments in a dialog, since it provides an important cue regarding the information that is present in the segments and how they should be interpreted. ISO 24617-2, the standard for dialog act annotation, defines a hierarchically organized set of general-purpose communicative functions which correspond to different intentions that are relevant in the context of a dialog. We explore the automatic recognition of these communicative functions in the DialogBank, which is a reference set of dialogs annotated according to this standard. To do so, we propose adaptations of existing approaches to flat dialog act recognition that allow them to deal with the hierarchical classification problem. More specifically, we propose the use of an end-to-end hierarchical network with cascading outputs and maximum a posteriori path estimation to predict the communicative function at each level of the hierarchy, preserve the dependencies between the functions in the path, and decide at which level to stop. Furthermore, since the amount of dialogs in the DialogBank is small, we rely on transfer learning processes to reduce overfitting and improve performance. The results of our experiments show that our approach outperforms both a flat one and hierarchical approaches based on multiple classifiers and that each of its components plays an important role towards the recognition of general-purpose communicative functions."
"13113","Image Captioning as an Assistive Technology: Lessons Learned from VizWiz 2020 Challenge","Pierre  Dognin, Igor Melnyk, Youssef Mroueh, Inkit Padhi, Mattia  Rigotti, Jarret  Ross, Yair  Schiff, Richard A. Young, Brian  Belgodere",", , , , , , , , ","https://www.jair.org/index.php/jair/article/download/13113/26765","Image captioning has recently demonstrated impressive progress largely owing to the introduction of neural network algorithms trained on curated dataset like MS-COCO. Often work in this field is motivated by the promise of deployment of captioning systems in practical applications. However, the scarcity of data and contexts in many competition datasets renders the utility of systems trained on these datasets limited as an assistive technology in real-world settings, such as helping visually impaired people navigate and accomplish everyday tasks. This gap motivated the introduction of the novel VizWiz dataset, which consists of images taken by the visually impaired and captions that have useful, task-oriented information. In an attempt to help the machine learning computer vision field realize its promise of producing technologies that have positive social impact, the curators of the VizWiz dataset host several competitions, including one for image captioning. This work details the theory and engineering from our winning submission to the 2020 captioning competition. Our work provides a step towards improved assistive image captioning systems. This article appears in the special track on AI & Society."
"13052","Neural Character-Level Syntactic Parsing for Chinese","Zuchao Li, Junru Zhou, Hai Zhao, Zhisong Zhang, Haonan Li, Yuqi Ju","Shanghai Jiao Tong University, , , , , ","https://www.jair.org/index.php/jair/article/download/13052/26766","In this work, we explore character-level neural syntactic parsing for Chinese with two typical syntactic formalisms: the constituent formalism and a dependency formalism based on a newly released character-level dependency treebank. Prior works in Chinese parsing have struggled with whether to de ne words when modeling character interactions. We choose to integrate full character-level syntactic dependency relationships using neural representations from character embeddings and richer linguistic syntactic information from human-annotated character-level Parts-Of-Speech and dependency labels. This has the potential to better understand the deeper structure of Chinese sentences and provides a better structural formalism for avoiding unnecessary structural ambiguities. Specifically, we  first compare two different character-level syntax annotation styles: constituency and dependency. Then, we discuss two key problems for character-level parsing: (1) how to combine constituent and dependency syntactic structure in full character-level trees and (2) how to convert from character-level to word-level for both constituent and dependency trees. In addition, we also explore several other key parsing aspects, including di erent character-level dependency annotations and joint learning of Parts-Of-Speech and syntactic parsing. Finally, we evaluate our models on the Chinese Penn Treebank (CTB) and our published Shanghai Jiao Tong University Chinese Character Dependency Treebank (SCDT). The results show the e effectiveness of our model on both constituent and dependency parsing. We further provide empirical analysis and suggest several directions for future study."
"12802","CASA: Conversational Aspect Sentiment Analysis for Dialogue Understanding","Linfeng Song, Chunlei Xin, Shaopeng Lai, Ante Wang, Jinsong Su, Kun Xu","Tencent AI Lab, , , , , ","https://www.jair.org/index.php/jair/article/download/12802/26767","Dialogue understanding has always been a bottleneck for many conversational tasks, such as dialogue response generation and conversational question answering. To expedite the progress in this area, we introduce the task of conversational aspect sentiment analysis (CASA) that can provide useful fine-grained sentiment information for dialogue understanding and planning. Overall, this task extends the standard aspect-based sentiment analysis to the conversational scenario with several major adaptations. To aid the training and evaluation of data-driven methods, we annotate 3,000 chit-chat dialogues (27,198 sentences) with fine-grained sentiment information, including all sentiment expressions, their polarities and the corresponding target mentions. We also annotate an out-of-domain test set of 200 dialogues for robustness evaluation. Besides, we develop multiple baselines based on either pretrained BERT or self-attention for preliminary study. Experimental results show that our BERT-based model has strong performances for both in-domain and out-of-domain datasets, and thorough analysis indicates several potential directions for further improvements."
"12370","Sum-of-Products with Default Values: Algorithms and Complexity Results","Robert Ganian, Eun Jung Kim, Friedrich Slivovsky, Stefan Szeider","Algorithms and Complexity Group, TU Wien, LAMSADE/CNRS, Université Paris-Dauphin, TU Wien, Algorithms and Complexity Group, TU Wien","https://www.jair.org/index.php/jair/article/download/12370/26768","Weighted Counting for Constraint Satisfaction with Default Values (#CSPD) is a powerful special case of the sum-of-products problem that admits succinct encodings of #CSP, #SAT, and inference in probabilistic graphical models. We investigate #CSPD under the fundamental parameter of incidence treewidth (i.e., the treewidth of the incidence graph of the constraint hypergraph). We show that if the incidence treewidth is bounded, #CSPD can be solved in polynomial time. More specifically, we show that the problem is fixed-parameter tractable for the combined parameter incidence treewidth, domain size, and support size (the maximum number of non-default tuples in a constraint). This generalizes known results on the fixed-parameter tractability of #CSPD under the combined parameter primal treewidth and domain size. We further prove that the problem is not fixed-parameter tractable if any of the three components is dropped from the parameterization."
"13318","Migrating Techniques from Search-based Multi-Agent Path Finding Solvers to SAT-based Approach","Pavel Surynek, Roni Stern, Eli Boyarski, Ariel Felner",", , , ","https://www.jair.org/index.php/jair/article/download/13318/26769","In the multi-agent path finding problem (MAPF) we are given a set of agents each with respective start and goal positions. The task is to find paths for all agents while avoiding collisions, aiming to minimize a given objective function. Many MAPF solvers were introduced in the past decade for optimizing two specific objective functions: sum-of-costs and makespan. Two prominent categories of solvers can be distinguished: search-based solvers and compilation-based solvers. Search-based solvers were developed and tested for the sum-of-costs objective, while the most prominent compilation-based solvers that are built around Boolean satisfiability (SAT) were designed for the makespan objective. Very little is known on the performance and relevance of solvers from the compilation-based approach on the sum-of-costs objective. In this paper, we start to close the gap between these cost functions in the compilation-based approach. Our main contribution is a new SAT-based MAPF solver called MDD-SAT, that is directly aimed to optimally solve the MAPF problem under the sum-of-costs objective function. Using both a lower bound on the sum-of-costs and an upper bound on the makespan, MDD-SAT is able to generate a reasonable number of Boolean variables in our SAT encoding. We then further improve the encoding by borrowing ideas from ICTS, a search-based solver. In addition, we show that concepts applicable in search-based solvers like ICTS and ICBS are applicable in the SAT-based approach as well. Specifically, we integrate independence detection, a generic technique for decomposing an MAPF instance into independent subproblems, into our SAT-based approach, and we design a relaxation of our optimal SAT-based solver that results in a bounded suboptimal SAT-based solver. Experimental evaluation on several domains shows that there are many scenarios where our SAT-based methods outperform state-of-the-art sum-of-costs search-based solvers, such as variants of the ICTS and ICBS algorithms."
"13135","Viewpoint: Ethical By Designer - How to Grow Ethical Designers of Artificial Intelligence","Loïs Vanhée, Melania Borit",", ","https://www.jair.org/index.php/jair/article/download/13135/26770","Ethical concerns regarding Artificial Intelligence (AI) technology have fueled discussions around the ethics training received by AI designers. We claim that training designers for ethical behaviour, understood as habitual application of ethical principles in any situation, can make a significant difference in the practice of research, development, and application of AI systems. Building on interdisciplinary knowledge and practical experience from computer science, moral psychology and development, and pedagogy, we propose a functional way to provide this training.    This article appears in the special track on AI & Society."
"13112","Fine-grained Prediction of Political Leaning on Social Media with Unsupervised Deep Learning","Tiziano Fagni, Stefano Cresci",", IIT-CNR","https://www.jair.org/index.php/jair/article/download/13112/26771","Predicting the political leaning of social media users is an increasingly popular task, given its usefulness for electoral forecasts, opinion dynamics models and for studying the political dimension of polarization and disinformation. Here, we propose a novel unsupervised technique for learning fine-grained political leaning from the textual content of social media posts. Our technique leverages a deep neural network for learning latent political ideologies in a representation learning task. Then, users are projected in a low-dimensional ideology space where they are subsequently clustered. The political leaning of a user is automatically derived from the cluster to which the user is assigned. We evaluated our technique in two challenging classification tasks and we compared it to baselines and other state-of-the-art approaches. Our technique obtains the best results among all unsupervised techniques, with micro F1 = 0.426 in the 8-class task and micro F1 = 0.772 in the 3-class task. Other than being interesting on their own, our results also pave the way for the development of new and better unsupervised approaches for the detection of fine-grained political leaning."
"12967","Visually Grounded Models of Spoken Language: A Survey of Datasets, Architectures and Evaluation Techniques","Grzegorz Chrupała","Tilburg University","https://www.jair.org/index.php/jair/article/download/12967/26772","This survey provides an overview of the evolution of visually grounded models of spoken language over the last 20 years. Such models are inspired by the observation that when children pick up a language, they rely on a wide range of indirect and noisy clues, crucially including signals from the visual modality co-occurring with spoken utterances. Several fields have made important contributions to this approach to modeling or mimicking the process of learning language: Machine Learning, Natural Language and Speech Processing, Computer Vision and Cognitive Science. The current paper brings together these contributions in order to provide a useful introduction and overview for practitioners in all these areas. We discuss the central research questions addressed, the timeline of developments, and the datasets which enabled much of this work. We then summarize the main modeling architectures and offer an exhaustive overview of the evaluation metrics and analysis techniques."
"13288","Some Inapproximability Results of MAP Inference and Exponentiated Determinantal Point Processes","Naoto Ohsaka","","https://www.jair.org/index.php/jair/article/download/13288/26773","We study the computational complexity of two hard problems on determinantal point processes (DPPs). One is maximum a posteriori (MAP) inference, i.e., to find a principal submatrix having the maximum determinant. The other is probabilistic inference on exponentiated DPPs (E-DPPs), which can sharpen or weaken the diversity preference of DPPs with an exponent parameter p. We present several complexity-theoretic hardness results that explain the difficulty in approximating MAP inference and the normalizing constant for E-DPPs. We first prove that unconstrained MAP inference for an n × n matrix is NP-hard to approximate within a factor of 2βn, where β = 10−1013 . This result improves upon the best-known inapproximability factor of (9/8 − ϵ), and rules out the existence of any polynomial-factor approximation algorithm assuming P ≠ NP. We then show that log-determinant maximization is NP-hard to approximate within a factor of 5/4 for the unconstrained case and within a factor of 1 + 10−1013 for the size-constrained monotone case. In particular, log-determinant maximization does not admit a polynomial-time approximation scheme unless P = NP. As a corollary of the first result, we demonstrate that the normalizing constant for E-DPPs of any (fixed) constant exponent p ≥ β-1 = 101013 is NP-hard to approximate within a factor of 2βpn, which is in contrast to the case of p ≤ 1 admitting a fully polynomial-time randomized approximation scheme."
"13163","SAMBA: A Generic Framework for Secure Federated Multi-Armed Bandits","Radu Ciucanu, Pascal Lafourcade, Gael Marcadet, Marta Soare","INSA Centre Val de Loire & LIFO, Univ. Clermont Auvergne & LIMOS, Univ. Orléans, LIFO/LIMOS, Univ. Orléans, LIFO","https://www.jair.org/index.php/jair/article/download/13163/26774","The multi-armed bandit is a reinforcement learning model where a learning agent repeatedly chooses an action (pull a bandit arm) and the environment responds with a stochastic outcome (reward) coming from an unknown distribution associated with the chosen arm. Bandits have a wide-range of application such as Web recommendation systems. We address the cumulative reward maximization problem in a secure federated learning setting, where multiple data owners keep their data stored locally and collaborate under the coordination of a central orchestration server. We rely on cryptographic schemes and propose Samba, a generic framework for Secure federAted Multi-armed BAndits. Each data owner has data associated to a bandit arm and the bandit algorithm has to sequentially select which data owner is solicited at each time step. We instantiate Samba for five bandit algorithms. We show that Samba returns the same cumulative reward as the nonsecure versions of bandit algorithms, while satisfying formally proven security properties. We also show that the overhead due to cryptographic primitives is linear in the size of the input, which is confirmed by our proof-of-concept implementation."
"13428","Survey and Evaluation of Causal Discovery Methods for Time Series","Charles K. Assaad, Emilie Devijver, Eric Gaussier","Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG, EasyVista, Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG, Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG","https://www.jair.org/index.php/jair/article/download/13428/26917","We introduce in this survey the major concepts, models, and algorithms proposed so far to infer causal relations from observational time series, a task usually referred to as causal discovery in time series. To do so, after a description of the underlying concepts and modelling assumptions, we present different methods according to the family of approaches they belong to: Granger causality, constraint-based approaches, noise-based approaches, score-based approaches, logic-based approaches, topology-based approaches, and difference-based approaches. We then evaluate several representative methods to illustrate the behaviour of different families of approaches. This illustration is conducted on both artificial and real datasets, with different characteristics. The main conclusions one can draw from this survey is that causal discovery in times series is an active research field in which new methods (in every family of approaches) are regularly proposed, and that no family or method stands out in all situations. Indeed, they all rely on assumptions that may or may not be appropriate for a particular dataset."
"13261","Scalable Online Planning for Multi-Agent MDPs","Shushman Choudhury, Jayesh K. Gupta, Peter Morales, Mykel J. Kochenderfer","Lacuna, Microsoft, Microsoft, Stanford University","https://www.jair.org/index.php/jair/article/download/13261/26776","We present a scalable tree search planning algorithm for large multi-agent sequential decision problems that require dynamic collaboration. Teams of agents need to coordinate decisions in many domains, but naive approaches fail due to the exponential growth of the joint action space with the number of agents. We circumvent this complexity through an approach that allows us to trade computation for approximation quality and dynamically coordinate actions. Our algorithm comprises three elements: online planning with Monte Carlo Tree Search (MCTS), factored representations of local agent interactions with coordination graphs, and the iterative Max-Plus method for joint action selection. We evaluate our approach on the benchmark SysAdmin domain with static coordination graphs and achieve comparable performance with much lower computation cost than our MCTS baselines. We also introduce a multi-drone delivery domain with dynamic coordination graphs, and demonstrate how our approach scales to large problems on this domain that are intractable for other MCTS methods. We provide an open-source implementation of our algorithm at https://github.com/JuliaPOMDP/FactoredValueMCTS.jl."
"13326","Computational Benefits of Intermediate Rewards for Goal-Reaching Policy Learning","Yuexiang Zhai, Christina Baek, Zhengyuan Zhou, Jiantao Jiao, Yi Ma","UC Berkeley, , , , ","https://www.jair.org/index.php/jair/article/download/13326/26777","Many goal-reaching reinforcement learning (RL) tasks have empirically verified that rewarding the agent on subgoals improves convergence speed and practical performance. We attempt to provide a theoretical framework to quantify the computational benefits of rewarding the completion of subgoals, in terms of the number of synchronous value iterations. In particular, we consider subgoals as one-way intermediate states, which can only be visited once per episode and propose two settings that consider these one-way intermediate states: the one-way single-path (OWSP) and the one-way multi-path (OWMP) settings. In both OWSP and OWMP settings, we demonstrate that adding intermediate rewards to subgoals is more computationally efficient than only rewarding the agent once it completes the goal of reaching a terminal state. We also reveal a trade-off between computational complexity and the pursuit of the shortest path in the OWMP setting: adding intermediate rewards significantly reduces the computational complexity of reaching the goal but the agent may not find the shortest path, whereas with sparse terminal rewards, the agent finds the shortest path at a significantly higher computational cost. We also corroborate our theoretical results with extensive experiments on the MiniGrid environments using Q-learning and some popular deep RL algorithms."
"12539","Approximating Perfect Recall when Model Checking Strategic Abilities: Theory and Applications","Francesco Belardinelli, Alessio Lomuscio, Vadim Malvone, Emily Yu","Imperial College London, United Kingdom and Laboratoire IBISC, Université d'Evry, France, Imperial College London United Kingdom, Télécom Paris, Johannes Kepler University Linz, Austria","https://www.jair.org/index.php/jair/article/download/12539/26778","The model checking problem for multi-agent systems against specifications in the alternating-time temporal logic ATL, hence ATL∗, under perfect recall and imperfect information is known to be undecidable. To tackle this problem, in this paper we investigate a notion of bounded recall under incomplete information. We present a novel three-valued semantics for ATL∗ in this setting and analyse the corresponding model checking problem. We show that the three-valued semantics here introduced is an approximation of the classic two-valued semantics, then give a sound, albeit partial, algorithm for model checking two-valued perfect recall via its approximation as three-valued bounded recall. Finally, we extend MCMAS, an open-source model checker for ATL and other agent specifications, to incorporate bounded recall; we illustrate its use and present experimental results."
"13550","Get out of the BAG! Silos in AI Ethics Education: Unsupervised Topic Modeling Analysis of Global AI Curricula","Rana Tallal Javed, Osama Nasir, Melania Borit, Loïs Vanhée, Elias  Zea, Shivam Gupta, Ricardo Vinuesa, Junaid Qadir",", , , , , , , ","https://www.jair.org/index.php/jair/article/download/13550/26779","The domain of Artificial Intelligence (AI) ethics is not new, with discussions going back at least 40 years. Teaching the principles and requirements of ethical AI to students is considered an essential part of this domain, with an increasing number of technical AI courses taught at several higher-education institutions around the globe including content related to ethics. By using Latent Dirichlet Allocation (LDA), a generative probabilistic topic model, this study uncovers topics in teaching ethics in AI courses and their trends related to where the courses are taught, by whom, and at what level of cognitive complexity and specificity according to Bloom’s taxonomy. In this exploratory study based on unsupervised machine learning, we analyzed a total of 166 courses: 116 from North American universities, 11 from Asia, 36 from Europe, and 10 from other regions. Based on this analysis, we were able to synthesize a model of teaching approaches, which we call BAG (Build, Assess, and Govern), that combines specific cognitive levels, course content topics, and disciplines affiliated with the department(s) in charge of the course. We critically assess the implications of this teaching paradigm and provide suggestions about how to move away from these practices. We challenge teaching practitioners and program coordinators to reflect on their usual procedures so that they may expand their methodology beyond the confines of stereotypical thought and traditional biases regarding what disciplines should teach and how. This article appears in the AI & Society track."
"12695","Incremental Event Calculus for Run-Time Reasoning","Efthimis Tsilionis, Alexander Artikis, Georgios Paliouras","NCSR Demokritos, Department of Maritime Studies, University of Piraeus, Greece Institute of Informatics & Telecommunications, NCSR “Demokritos”, Greece, Institute of Informatics & Telecommunications, NCSR “Demokritos”, Greece","https://www.jair.org/index.php/jair/article/download/12695/26780","We present a system for online, incremental composite event recognition. In streaming environments, the usual case is for data to arrive with a (variable) delay from, and to be revised by, the underlying sources. We propose RTECinc, an incremental version of RTEC, a composite event recognition engine with formal, declarative semantics, that has been shown to scale to several real-world data streams. RTEC deals with delayed arrival and revision of events by computing all queries from scratch. This is often inefficient since it results in redundant computations. Instead, RTECinc deals with delays and revisions in a more efficient way, by updating only the affected queries. We examine RTECinc theoretically, presenting a complexity analysis, and show the conditions in which it outperforms RTEC. Moreover, we compare RTECinc and RTEC experimentally using real-world and synthetic datasets. The results are compatible with our theoretical analysis and show that RTECinc outperforms RTEC in many practical cases."
"13510","Predicting Decisions in Language Based Persuasion Games","Reut Apel, Ido Erev, Roi Reichart, Moshe Tennenholtz","Technion, Technion, Technion, Israel Institute of Technology, ","https://www.jair.org/index.php/jair/article/download/13510/26781","Sender-receiver interactions, and specifically persuasion games, are widely researched in economic modeling and artificial intelligence, and serve as a solid foundation for powerful applications. However, in the classic persuasion games setting, the messages sent from the expert to the decision-maker are abstract or well-structured application-specific signals rather than natural (human) language messages, although natural language is a very common communication signal in real-world persuasion setups. This paper addresses the use of natural language in persuasion games, exploring its impact on the decisions made by the players and aiming to construct effective models for the prediction of these decisions. For this purpose, we conduct an online repeated interaction experiment. At each trial of the interaction, an informed expert aims to sell an uninformed decision-maker a vacation in a hotel, by sending her a review that describes the hotel. While the expert is exposed to several scored reviews, the decision-maker observes only the single review sent by the expert, and her payoff in case she chooses to take the hotel is a random draw from the review score distribution available to the expert only. The expert’s payoff, in turn, depends on the number of times the decision-maker chooses the hotel. We also compare the behavioral patterns in this experiment to the equivalent patterns in similar experiments where the communication is based on the numerical values of the reviews rather than the reviews’ text, and observe substantial differences which can be explained through an equilibrium analysis of the game. We consider a number of modeling approaches for our verbal communication setup, differing from each other in the model type (deep neural network (DNN) vs. linear classifier), the type of features used by the model (textual, behavioral or both) and the source of the textual features (DNN-based vs. hand-crafted). Our results demonstrate that given a prefix of the interaction sequence, our models can predict the future decisions of the decision-maker, particularly when a sequential modeling approach and hand-crafted textual features are applied. Further analysis of the hand-crafted textual features allows us to make initial observations about the aspects of text that drive decision making in our setup."
"13449","On the Indecisiveness of Kelly-Strategyproof Social Choice Functions","Felix Brandt, Martin Bullinger, Patrick Lederer",", Technical University of Munich, ","https://www.jair.org/index.php/jair/article/download/13449/26782","Social choice functions (SCFs) map the preferences of a group of agents over some set of alternatives to a non-empty subset of alternatives. The Gibbard-Satterthwaite theorem has shown that only extremely restrictive SCFs are strategyproof when there are more than two alternatives. For set-valued SCFs, or so-called social choice correspondences, the situation is less clear. There are miscellaneous -- mostly negative -- results using a variety of strategyproofness notions and additional requirements. The simple and intuitive notion of Kelly-strategyproofness has turned out to be particularly compelling because it is weak enough to still allow for positive results. For example, the Pareto rule is strategyproof even when preferences are weak, and a number of attractive SCFs (such as the top cycle, the uncovered set, and the essential set) are strategyproof for strict preferences. In this paper, we show that, for weak preferences, only indecisive SCFs can satisfy strategyproofness. In particular, (i) every strategyproof rank-based SCF violates Pareto-optimality, (ii) every strategyproof support-based SCF (which generalize Fishburn's C2 SCFs) that satisfies Pareto-optimality returns at least one most preferred alternative of every voter, and (iii) every strategyproof non-imposing SCF returns the Condorcet loser in at least one profile. We also discuss the consequences of these results for randomized social choice."
"12918","Neural Natural Language Generation: A Survey on Multilinguality, Multimodality, Controllability and Learning","Erkut Erdem, Menekse Kuyu, Semih Yagcioglu, Anette Frank, Letitia Parcalabescu, Barbara Plank, Andrii Babii, Oleksii Turuta, Aykut Erdem, Iacer Calixto, Elena Lloret, Elena-Simona Apostol, Ciprian-Octavian  Truică, Branislava  Šandrih, Sanda  Martinčić-Ipšić, Gábor Berend, Albert Gatt, Grăzina  Korvel","Hacettepe University, Ankara, Turkey, Hacettepe University, Ankara, Turkey, Hacettepe University, Ankara, Turkey, Heidelberg University, Heidelberg, Germany, Heidelberg University, Heidelberg, Germany, IT University of Copenhagen, Copenhagen, Denmark, Kharkiv National University of Radio Electronics, Ukraine, Kharkiv National University of Radio Electronics, Ukraine, Koç University, New York University, U.S.A. / University of Amsterdam, Netherlands, University of Alicante, Alicante, Spain, University Politehnica of Bucharest, Bucharest, Romania, University Politehnica of Bucharest, Bucharest, Romania, University of Belgrade, Belgrade, Serbia, University of Rijeka, Rijeka, Croatia, University of Szeged, Szeged, Hungary, University of Malta, Malta, Vilnius University, Vilnius, Lithuania","https://www.jair.org/index.php/jair/article/download/12918/26783","Developing artificial learning systems that can understand and generate natural language has been one of the long-standing goals of artificial intelligence. Recent decades have witnessed an impressive progress on both of these problems, giving rise to a new family of approaches. Especially, the advances in deep learning over the past couple of years have led to neural approaches to natural language generation (NLG). These methods combine generative language learning techniques with neural-networks based frameworks. With a wide range of applications in natural language processing, neural NLG (NNLG) is a new and fast growing field of research. In this state-of-the-art report, we investigate the recent developments and applications of NNLG in its full extent from a multidimensional view, covering critical perspectives such as multimodality, multilinguality, controllability and learning strategies. We summarize the fundamental building blocks of NNLG approaches from these aspects and provide detailed reviews of commonly used preprocessing steps and basic neural architectures. This report also focuses on the seminal applications of these NNLG models such as machine translation, description generation, automatic speech recognition, abstractive summarization, text simplification, question answering and generation, and dialogue generation. Finally, we conclude with a thorough discussion of the described frameworks by pointing out some open research directions."
"13188","Multiobjective Tree-Structured Parzen Estimator","Yoshihiko Ozaki, Yuki Tanigaki, Shuhei Watanabe, Masahiro Nomura, Masaki Onishi","National Institute of Advanced Industrial Science and Technology, National Institute of Advanced Industrial Science and Technology, University of Freiburg, CyberAgent, Inc., National Institute of Advanced Industrial Science and Technology","https://www.jair.org/index.php/jair/article/download/13188/26784","Practitioners often encounter challenging real-world problems that involve a simultaneous optimization of multiple objectives in a complex search space. To address these problems, we propose a practical multiobjective Bayesian optimization algorithm. It is an extension of the widely used Tree-structured Parzen Estimator (TPE) algorithm, called Multiobjective Tree-structured Parzen Estimator (MOTPE). We demonstrate that MOTPE approximates the Pareto fronts of a variety of benchmark problems and a convolutional neural network design problem better than existing methods through the numerical results. We also investigate how the configuration of MOTPE affects the behavior and the performance of the method and the effectiveness of asynchronous parallelization of the method based on the empirical results."
"13367","Fairness in Influence Maximization through Randomization","Ruben Becker, Gianlorenzo D’Angelo, Sajjad Ghobadi, Hugo Gilbert",", , , ","https://www.jair.org/index.php/jair/article/download/13367/26785","The influence maximization paradigm has been used by researchers in various fields in order to study how information spreads in social networks. While previously the attention was mostly on efficiency, more recently fairness issues have been taken into account in this scope. In the present paper, we propose to use randomization as a mean for achieving fairness. While this general idea is not new, it has not been applied in this area. Similar to previous works like Fish et al. (WWW ’19) and Tsang et al. (IJCAI ’19), we study the maximin criterion for (group) fairness. In contrast to their work however, we model the problem in such a way that, when choosing the seed sets, probabilistic strategies are possible rather than only deterministic ones. We introduce two different variants of this probabilistic problem, one that entails probabilistic strategies over nodes (node-based problem) and a second one that entails probabilistic strategies over sets of nodes (set-based problem). After analyzing the relation between the two probabilistic problems, we show that, while the original deterministic maximin problem was inapproximable, both probabilistic variants permit approximation algorithms that achieve a constant multiplicative factor of 1 − 1/e minus an additive arbitrarily small error that is due to the simulation of the information spread. For the node-based problem, the approximation is achieved by observing that a polynomial-sized linear program approximates the problem well. For the set-based problem, we show that a multiplicative-weight routine can yield the approximation result. For an experimental study, we provide implementations of multiplicative-weight routines for both the set-based and the node-based problems and compare the achieved fairness values to existing methods. Maybe non-surprisingly, we show that the ex-ante values, i.e., minimum expected value of an individual (or group) to obtain the information, of the computed probabilistic strategies are significantly larger than the (ex-post) fairness values of previous methods. This indicates that studying fairness via randomization is a worthwhile path to follow. Interestingly and maybe more surprisingly, we observe that even the ex-post fairness values, i.e., fairness values of sets sampled according to the probabilistic strategies computed by our routines, dominate over the fairness achieved by previous methods on many of the instances tested."
"13509","The Application of Machine Learning Techniques for Predicting Match Results in Team Sport: A Review","Rory Bunker, Teo Susnjak","Nagoya University, ","https://www.jair.org/index.php/jair/article/download/13509/26786","Predicting the results of matches in sport is a challenging and interesting task. In this paper, we review a selection of studies from 1996 to 2019 that used machine learning for predicting match results in team sport. Considering both invasion sports and striking/fielding sports, we discuss commonly applied machine learning algorithms, as well as common approaches related to data and evaluation. Our study considers accuracies that have been achieved across different sports, and explores whether evidence exists to support the notion that outcomes of some sports may be inherently more difficult to predict. We also uncover common themes of future research directions and propose recommendations for future researchers. Although there remains a lack of benchmark datasets (apart from in soccer), and the differences between sports, datasets and features makes between-study comparisons difficult, as we discuss, it is possible to evaluate accuracy performance in other ways. Artificial Neural Networks were commonly applied in early studies, however, our findings suggest that a range of models should instead be compared. Selecting and engineering an appropriate feature set appears to be more important than having a large number of instances. For feature selection, we see potential for greater inter-disciplinary collaboration between sport performance analysis, a sub-discipline of sport science, and machine learning."
"13610","A Metric Space for Point Process Excitations","Myrl G. Marmarelis, Greg Ver Steeg, Aram Galstyan",", , ","https://www.jair.org/index.php/jair/article/download/13610/26787","A multivariate Hawkes process enables self- and cross-excitations through a triggering matrix that behaves like an asymmetrical covariance structure, characterizing pairwise interactions between the event types. Full-rank estimation of all interactions is often infeasible in empirical settings. Models that specialize on a spatiotemporal application alleviate this obstacle by exploiting spatial locality, allowing the dyadic relationships between events to depend only on separation in time and relative distances in real Euclidean space. Here we generalize this framework to any multivariate Hawkes process, and harness it as a vessel for embedding arbitrary event types in a hidden metric space. Specifically, we propose a Hidden Hawkes Geometry (HHG) model to uncover the hidden geometry between event excitations in a multivariate point process. The low dimensionality of the embedding regularizes the structure of the inferred interactions. We develop a number of estimators and validate the model by conducting several experiments. In particular, we investigate regional infectivity dynamics of COVID-19 in an early South Korean record and recent Los Angeles confirmed cases. By additionally performing synthetic experiments on short records as well as explorations into options markets and the Ebola epidemic, we demonstrate that learning the embedding alongside a point process uncovers salient interactions in a broad range of applications."
"13233","Marginal Distance and Hilbert-Schmidt Covariances-Based Independence Tests for Multivariate Functional Data","Mirosław Krzyśko, Łukasz Smaga, Piotr Kokoszka",", Faculty of Mathematics and Computer Science, Adam Mickiewicz University, ","https://www.jair.org/index.php/jair/article/download/13233/26788","We study the pairwise and mutual independence testing problem for multivariate functional data. Using a basis representation of functional data, we reduce this problem to testing the independence of multivariate data, which may be high-dimensional. For pairwise independence, we apply tests based on distance and Hilbert-Schmidt covariances as well as their marginal versions, which aggregate these covariances for coordinates of random processes. In the case of mutual independence, we study asymmetric and symmetric aggregating measures of pairwise dependence. A theoretical justification of the test procedures is established. In extensive simulation studies and examples based on a real economic data set, we investigate and compare the performance of the tests in terms of size control and power. An important finding is that tests based on distance and Hilbert-Schmidt covariances are usually more powerful than their marginal versions under linear dependence, while the reverse is true under non-linear dependence."
"13425","Agent-Based Modeling for Predicting Pedestrian Trajectories Around an Autonomous Vehicle","Manon Prédhumeau, Lyuba Mancheva, Julie Dugdale, Anne Spalanzani","Université Grenoble Alpes, , , ","https://www.jair.org/index.php/jair/article/download/13425/26789","This paper addresses modeling and simulating pedestrian trajectories when interacting with an autonomous vehicle in a shared space. Most pedestrian–vehicle interaction models are not suitable for predicting individual trajectories. Data-driven models yield accurate predictions but lack generalizability to new scenarios, usually do not run in real time and produce results that are poorly explainable. Current expert models do not deal with the diversity of possible pedestrian interactions with the vehicle in a shared space and lack microscopic validation. We propose an expert pedestrian model that combines the social force model and a new decision model for anticipating pedestrian–vehicle interactions. The proposed model integrates different observed pedestrian behaviors, as well as the behaviors of the social groups of pedestrians, in diverse interaction scenarios with a car. We calibrate the model by fitting the parameters values on a training set. We validate the model and evaluate its predictive potential through qualitative and quantitative comparisons with ground truth trajectories. The proposed model reproduces observed behaviors that have not been replicated by the social force model and outperforms the social force model at predicting pedestrian behavior around the vehicle on the used dataset. The model generates explainable and real-time trajectory predictions. Additional evaluation on a new dataset shows that the model generalizes well to new scenarios and can be applied to an autonomous vehicle embedded prediction."
"13543","Adversarial Framework with Certified Robustness for Time-Series Domain via Statistical Features","Taha Belkhouja, Janardhan Rao Doppa","Washington State University, ","https://www.jair.org/index.php/jair/article/download/13543/26790","Time-series data arises in many real-world applications (e.g., mobile health) and deep neural networks (DNNs) have shown great success in solving them. Despite their success, little is known about their robustness to adversarial attacks. In this paper, we propose a novel adversarial framework referred to as Time-Series Attacks via STATistical Features (TSA-STAT). To address the unique challenges of time-series domain, TSA-STAT employs constraints on statistical features of the time-series data to construct adversarial examples. Optimized polynomial transformations are used to create attacks that are more effective (in terms of successfully fooling DNNs) than those based on additive perturbations. We also provide certified bounds on the norm of the statistical features for constructing adversarial examples. Our experiments on diverse real-world benchmark datasets show the effectiveness of TSA-STAT in fooling DNNs for time-series domain and in improving their robustness."
"13431","A Logic-Based Explanation Generation Framework for Classical and Hybrid Planning Problems","Stylianos Loukas Vasileiou, William Yeoh, Tran Cao Son, Ashwin  Kumar, Michael Cashmore, Dianele Magazzeni","Washington University in St. Louis, , , , , ","https://www.jair.org/index.php/jair/article/download/13431/26791","In human-aware planning systems, a planning agent might need to explain its plan to a human user when that plan appears to be non-feasible or sub-optimal. A popular approach, called model reconciliation, has been proposed as a way to bring the model of the human user closer to the agent’s model. To do so, the agent provides an explanation that can be used to update the model of human such that the agent’s plan is feasible or optimal to the human user. Existing approaches to solve this problem have been based on automated planning methods and have been limited to classical planning problems only. In this paper, we approach the model reconciliation problem from a different perspective, that of knowledge representation and reasoning, and demonstrate that our approach can be applied not only to classical planning problems but also hybrid systems planning problems with durative actions and events/processes. In particular, we propose a logic-based framework for explanation generation, where given a knowledge base KBa (of an agent) and a knowledge base KBh (of a human user), each encoding their knowledge of a planning problem, and that KBa entails a query q (e.g., that a proposed plan of the agent is valid), the goal is to identify an explanation ε ⊆ KBa such that when it is used to update KBh, then the updated KBh also entails q. More specifically, we make the following contributions in this paper: (1) We formally define the notion of logic-based explanations in the context of model reconciliation problems; (2) We introduce a number of cost functions that can be used to reflect preferences between explanations; (3) We present algorithms to compute explanations for both classical planning and hybrid systems planning problems; and (4) We empirically evaluate their performance on such problems. Our empirical results demonstrate that, on classical planning problems, our approach is faster than the state of the art when the explanations are long or when the size of the knowledge base is small (e.g., the plans to be explained are short). They also demonstrate that our approach is efficient for hybrid systems planning problems. Finally, we evaluate the real-world efficacy of explanations generated by our algorithms through a controlled human user study, where we develop a proof-of-concept visualization system and use it as a medium for explanation communication."
"12699","Multilingual Machine Translation: Deep Analysis of Language-Specific Encoder-Decoders","Carlos Escolano, Marta R. Costa-jussà, José A. R. Fonollosa","Universitat Politècnica de Catalunya, , Universitat Politècnica de Catalunya","https://www.jair.org/index.php/jair/article/download/12699/26792","State-of-the-art multilingual machine translation relies on a shared encoder-decoder. In this paper, we propose an alternative approach based on language-specific encoder-decoders, which can be easily extended to new languages by learning their corresponding modules. To establish a common interlingua representation, we simultaneously train N initial languages. Our experiments show that the proposed approach improves over the shared encoder-decoder for the initial languages and when adding new languages, without the need to retrain the remaining modules. All in all, our work closes the gap between shared and language-specific encoder-decoders, advancing toward modular multilingual machine translation systems that can be flexibly extended in lifelong learning settings."
"13167","FFCI: A Framework for Interpretable Automatic Evaluation of Summarization","Fajri Koto, Timothy Baldwin, Jey Han Lau","University of Melbourne, University of Melbourne, University of Melbourne","https://www.jair.org/index.php/jair/article/download/13167/26793","In this paper, we propose FFCI, a framework for fine-grained summarization evaluation that comprises four elements: faithfulness (degree of factual consistency with the source), focus (precision of summary content relative to the reference), coverage (recall of summary content relative to the reference), and inter-sentential coherence (document fluency between adjacent sentences). We construct a novel dataset for focus, coverage, and inter-sentential coherence, and develop automatic methods for evaluating each of the four dimensions of FFCI based on cross-comparison of evaluation metrics and model-based evaluation methods, including question answering (QA) approaches, semantic textual similarity (STS), next-sentence prediction (NSP), and scores derived from 19 pre-trained language models. We then apply the developed metrics in evaluating a broad range of summarization models across two datasets, with some surprising findings."
"13445","Multi-Agent Advisor Q-Learning","Sriram Ganapathi Subramanian, Matthew E. Taylor, Kate Larson, Mark Crowley","U Waterloo, University of Alberta, University of Waterloo, University of Waterloo","https://www.jair.org/index.php/jair/article/download/13445/26794","In the last decade, there have been significant advances in multi-agent reinforcement learning (MARL) but there are still numerous challenges, such as high sample complexity and slow convergence to stable policies, that need to be overcome before wide-spread deployment is possible. However, many real-world environments already, in practice, deploy sub-optimal or heuristic approaches for generating policies. An interesting question that arises is how to best use such approaches as advisors to help improve reinforcement learning in multi-agent domains. In this paper, we provide a principled framework for incorporating action recommendations from online suboptimal advisors in multi-agent settings. We describe the problem of ADvising Multiple Intelligent Reinforcement Agents (ADMIRAL) in nonrestrictive general-sum stochastic game environments and present two novel Q-learning based algorithms: ADMIRAL - Decision Making (ADMIRAL-DM) and ADMIRAL - Advisor Evaluation (ADMIRAL-AE), which allow us to improve learning by appropriately incorporating advice from an advisor (ADMIRAL-DM), and evaluate the effectiveness of an advisor (ADMIRAL-AE). We analyze the algorithms theoretically and provide fixed point guarantees regarding their learning in general-sum stochastic games. Furthermore, extensive experiments illustrate that these algorithms: can be used in a variety of environments, have performances that compare favourably to other related baselines, can scale to large state-action spaces, and are robust to poor advice from advisors."
"13196","Rethinking Fairness: An Interdisciplinary Survey of Critiques of Hegemonic ML Fairness Approaches","Lindsay Weinberg","Purdue University","https://www.jair.org/index.php/jair/article/download/13196/26797","This survey article assesses and compares existing critiques of current fairness-enhancing technical interventions in machine learning (ML) that draw from a range of non-computing disciplines, including philosophy, feminist studies, critical race and ethnic studies, legal studies, anthropology, and science and technology studies. It bridges epistemic divides in order to offer an interdisciplinary understanding of the possibilities and limits of hegemonic computational approaches to ML fairness for producing just outcomes for society’s most marginalized. The article is organized according to nine major themes of critique wherein these different fields intersect: 1) how ""fairness"" in AI fairness research gets defined; 2) how problems for AI systems to address get formulated; 3) the impacts of abstraction on how AI tools function and its propensity to lead to technological solutionism; 4) how racial classification operates within AI fairness research; 5) the use of AI fairness measures to avoid regulation and engage in ethics washing; 6) an absence of participatory design and democratic deliberation in AI fairness considerations; 7) data collection practices that entrench “bias,” are non-consensual, and lack transparency; 8) the predatory inclusion of marginalized groups into AI systems; and 9) a lack of engagement with AI’s long-term social and ethical outcomes. Drawing from these critiques, the article concludes by imagining future ML fairness research directions that actively disrupt entrenched power dynamics and structural injustices in society."
"12911","Fair Division of Indivisible Goods for a Class of Concave Valuations","Bhaskar Ray Chaudhury, Yun Kuen Cheung, Jugal Garg, Naveen Garg, Martin Hoefer, Kurt Mehlhorn","MPI for Informatics, Saarland Informatics Campus, Royal Holloway University of London, University of Illinois at Urbana-Champaign, IIT Delhi, Goethe-Universität Frankfurt am Main, MPI for Informatics, Saarland Informatics Campus","https://www.jair.org/index.php/jair/article/download/12911/26798","We study the fair and efficient allocation of a set of indivisible goods among agents, where each good has several copies, and each agent has an additively separable concave valuation function with a threshold. These valuations capture the property of diminishing marginal returns, and they are more general than the well-studied case of additive valuations. We present a polynomial-time algorithm that approximates the optimal Nash social welfare (NSW) up to a factor of e1/e ≈ 1.445. This matches with the state-of-the-art approximation factor for additive valuations. The computed allocation also satisfies the popular fairness guarantee of envy-freeness up to one good (EF1) up to a factor of 2 + ε. For instances without thresholds, it is also approximately Pareto-optimal. For instances satisfying a large market property, we show an improved approximation factor. Lastly, we show that the upper bounds on the optimal NSW introduced in Cole and Gkatzelis (2018) and Barman et al. (2018) have the same value."
"13581","Avoiding Negative Side Effects of Autonomous Systems in the Open World","Sandhya Saisubramanian, Ece Kamar, Shlomo Zilberstein","Oregon State University, Microsoft Research, University of Massachusetts Amherst","https://www.jair.org/index.php/jair/article/download/13581/26799","Autonomous systems that operate in the open world often use incomplete models of their environment. Model incompleteness is inevitable due to the practical limitations in precise model specification and data collection about open-world environments. Due to the limited fidelity of the model, agent actions may produce negative side effects (NSEs) when deployed. Negative side effects are undesirable, unmodeled effects of agent actions on the environment. NSEs are inherently challenging to identify at design time and may affect the reliability, usability and safety of the system. We present two complementary approaches to mitigate the NSE via: (1) learning from feedback, and (2) environment shaping. The solution approaches target settings with different assumptions and agent responsibilities. In learning from feedback, the agent learns a penalty function associated with a NSE. We investigate the efficiency of different feedback mechanisms, including human feedback and autonomous exploration. The problem is formulated as a multi-objective Markov decision process such that optimizing the agent’s assigned task is prioritized over mitigating NSE. A slack parameter denotes the maximum allowed deviation from the optimal expected reward for the agent’s task in order to mitigate NSE. In environment shaping, we examine how a human can assist an agent, beyond providing feedback, and utilize their broader scope of knowledge to mitigate the impacts of NSE. We formulate the problem as a human-agent collaboration with decoupled objectives. The agent optimizes its assigned task and may produce NSE during its operation. The human assists the agent by performing modest reconfigurations of the environment so as to mitigate the impacts of NSE, without affecting the agent’s ability to complete its assigned task. We present an algorithm for shaping and analyze its properties. Empirical evaluations demonstrate the trade-offs in the performance of different approaches in mitigating NSE in different settings."
"13499","Proactive Dynamic Distributed Constraint Optimization Problems","Khoi D. Hoang, Ferdinando Fioretto, Ping Hou, William Yeoh, Makoto Yokoo, Roie Zivan",", Syracuse University, Uber Advanced Technologies Group, Washington University in St. Louis, Kyushu University, Ben-Gurion University of the Negev","https://www.jair.org/index.php/jair/article/download/13499/26800","The Distributed Constraint Optimization Problem (DCOP) formulation is a powerful tool for modeling multi-agent coordination problems. To solve DCOPs in a dynamic environment, Dynamic DCOPs (D-DCOPs) have been proposed to model the inherent dynamism present in many coordination problems. D-DCOPs solve a sequence of static problems by reacting to changes in the environment as the agents observe them. Such reactive approaches ignore knowledge about future changes of the problem. To overcome this limitation, we introduce Proactive Dynamic DCOPs (PD-DCOPs), a novel formalism to model D-DCOPs in the presence of exogenous uncertainty. In contrast to reactive approaches, PD-DCOPs are able to explicitly model possible changes of the problem and take such information into account when solving the dynamically changing problem in a proactive manner. The additional expressivity of this formalism allows it to model a wider variety of distributed optimization problems. Our work presents both theoretical and practical contributions that advance current dynamic DCOP models: (i) We introduce Proactive Dynamic DCOPs (PD-DCOPs), which explicitly model how the DCOP will change over time; (ii) We develop exact and heuristic algorithms to solve PD-DCOPs in a proactive manner; (iii) We provide theoretical results about the complexity of this new class of DCOPs; and (iv) We empirically evaluate both proactive and reactive algorithms to determine the trade-offs between the two classes. The final contribution is important as our results are the first that identify the characteristics of the problems that the two classes of algorithms excel in."
"12690","A Few Queries Go a Long Way: Information-Distortion Tradeoffs in Matching","Georgios Amanatidis, Georgios Birmpas, Aris Filos-Ratsikas, Alexandros A. Voudouris","University of Essex, Sapienza University of Rome, University of Liverpool, University of Essex","https://www.jair.org/index.php/jair/article/download/12690/26801","We consider the One-Sided Matching problem, where n agents have preferences over n items, and these preferences are induced by underlying cardinal valuation functions. The goal is to match every agent to a single item so as to maximize the social welfare. Most of the related literature, however, assumes that the values of the agents are not a priori known, and only access to the ordinal preferences of the agents over the items is provided. Consequently, this incomplete information leads to loss of efficiency, which is measured by the notion of distortion. In this paper, we further assume that the agents can answer a small number of queries, allowing us partial access to their values. We study the interplay between elicited cardinal information (measured by the number of queries per agent) and distortion for One-Sided Matching, as well as a wide range of well-studied related problems. Qualitatively, our results show that with a limited number of queries, it is possible to obtain significant improvements over the classic setting, where only access to ordinal information is given."
"12670","Constraint Solving Approaches to the Business-to-Business Meeting Scheduling Problem","Miquel Bofill, Jordi Coll, Marc Garcia, Jesús Giráldez-Cru, Gilles Pesant, Josep Suy, Mateu Villaret",", , , , , , University of Girona","https://www.jair.org/index.php/jair/article/download/12670/26802","The Business-to-Business Meeting Scheduling problem consists of scheduling a set of meetings between given pairs of participants to an event, while taking into account participants’ availability and accommodation capacity. A crucial aspect of this problem is that breaks in participants’ schedules should be avoided as much as possible. It constitutes a challenging combinatorial problem that needs to be solved for many real world brokerage events. In this paper we present a comparative study of Constraint Programming (CP), MixedInteger Programming (MIP) and Maximum Satisfiability (MaxSAT) approaches to this problem. The CP approach relies on using global constraints and has been implemented in MiniZinc to be able to compare CP, Lazy Clause Generation and MIP as solving technologies in this setting. We also present a pure MIP encoding. Finally, an alternative viewpoint is considered under MaxSAT, showing best performance when considering some implied constraints. Experiments conducted on real world instances, as well as on crafted ones, show that the MaxSAT approach is the one with the best performance for this problem, exhibiting better solving times, sometimes even orders of magnitude smaller than CP and MIP."
"12997","Adaptive Greedy versus Non-adaptive Greedy for Influence Maximization","Wei Chen, Binghui Peng, Grant Schoenebeck, Biaoshuai Tao","Microsoft Research, Columbia University, University of Michigan, Shanghai Jiao Tong University","https://www.jair.org/index.php/jair/article/download/12997/26803","We consider the adaptive influence maximization problem: given a network and a budget k, iteratively select k seeds in the network to maximize the expected number of adopters. In the full-adoption feedback model, after selecting each seed, the seed-picker observes all the resulting adoptions. In the myopic feedback model, the seed-picker only observes whether each neighbor of the chosen seed adopts. Motivated by the extreme success of greedy-based algorithms/heuristics for influence maximization, we propose the concept of greedy adaptivity gap, which compares the performance of the adaptive greedy algorithm to its non-adaptive counterpart. Our first result shows that, for submodular influence maximization, the adaptive greedy algorithm can perform up to a (1 − 1/e)-fraction worse than the non-adaptive greedy algorithm, and that this ratio is tight. More specifically, on one side we provide examples where the performance of the adaptive greedy algorithm is only a (1−1/e) fraction of the performance of the non-adaptive greedy algorithm in four settings: for both feedback models and both the independent cascade model and the linear threshold model. On the other side, we prove that in any submodular cascade, the adaptive greedy algorithm always outputs a (1 − 1/e)-approximation to the expected number of adoptions in the optimal non-adaptive seed choice. Our second result shows that, for the general submodular diffusion model with full-adoption feedback, the adaptive greedy algorithm can outperform the non-adaptive greedy algorithm by an unbounded factor. Finally, we propose a risk-free variant of the adaptive greedy algorithm that always performs no worse than the non-adaptive greedy algorithm."
"13317","Ordinal Maximin Share Approximation for Goods","Hadi Hosseini, Andrew Searns, Erel Segal-Halevi","Pennsylvania State University, Pennsylvania, Haley Marketing, Williamsville, New York, ","https://www.jair.org/index.php/jair/article/download/13317/26804","In fair division of indivisible goods,  ℓ-out-of-d maximin share (MMS) is the value that an agent can guarantee by partitioning the goods into d bundles and choosing the ℓ least preferred bundles. Most existing works aim to guarantee to all agents a constant fraction of their 1-out-of-n MMS. But this guarantee is sensitive to small perturbation in agents' cardinal valuations. We consider a more robust approximation notion, which depends only on the agents' ordinal rankings of bundles. We prove the existence of ℓ-out-of-⌊(ℓ + 1/2)n⌋ MMS allocations of goods for any integer ℓ ≥ 1, and present a polynomial-time algorithm that finds a 1-out-of-⌈3n/2⌉ MMS allocation when ℓ=1. We further develop an algorithm that provides a weaker ordinal approximation to MMS for any ℓ > 1."
"13363","Objective Bayesian Nets for Integrating Consistent Datasets","Juergen Landes, Jon Williamson","LMU Munich, University of Kent","https://www.jair.org/index.php/jair/article/download/13363/26805","This paper addresses a data integration problem: given several mutually consistent datasets each of which measures a subset of the variables of interest, how can one construct a probabilistic model that fits the data and gives reasonable answers to questions which are under-determined by the data? Here we show how to obtain a Bayesian network model which represents the unique probability function that agrees with the probability distributions measured by the datasets and otherwise has maximum entropy. We provide a general algorithm, OBN-cDS, which offers substantial efficiency savings over the standard brute-force approach to determining the maximum entropy probability function. Furthermore, we develop modifications to the general algorithm which enable further efficiency savings but which are only applicable in particular situations. We show that there are circumstances in which one can obtain the model (i) directly from the data; (ii) by solving algebraic problems; and (iii) by solving relatively simple independent optimisation problems."
"13646","Core Challenges in Embodied Vision-Language Planning","Jonathan Francis, Nariaki Kitamura, Felix Labelle, Xiaopeng Lu, Ingrid Navarro, Jean Oh","Carnegie Mellon University, Carnegie Mellon University, Carnegie Mellon University, Carnegie Mellon University, Carnegie Mellon University, ","https://www.jair.org/index.php/jair/article/download/13646/26807","Recent advances in the areas of multimodal machine learning and artificial intelligence (AI) have led to the development of challenging tasks at the intersection of Computer Vision, Natural Language Processing, and Embodied AI. Whereas many approaches and previous survey pursuits have characterised one or two of these dimensions, there has not been a holistic analysis at the center of all three. Moreover, even when combinations of these topics are considered, more focus is placed on describing, e.g., current architectural methods, as opposed to also illustrating high-level challenges and opportunities for the field. In this survey paper, we discuss Embodied Vision-Language Planning (EVLP) tasks, a family of prominent embodied navigation and manipulation problems that jointly use computer vision and natural language. We propose a taxonomy to unify these tasks and provide an in-depth analysis and comparison of the new and current algorithmic approaches, metrics, simulated environments, as well as the datasets used for EVLP tasks. Finally, we present the core challenges that we believe new EVLP works should seek to address, and we advocate for task construction that enables model generalizability and furthers real-world deployment."
"13596","Automated Reinforcement Learning (AutoRL): A Survey and Open Problems","Jack Parker-Holder, Raghu Rajan, Xingyou Song, André Biedenkapp, Yingjie Miao, Theresa Eimer, Baohe Zhang, Vu Nguyen, Roberto Calandra, Aleksandra Faust, Frank Hutter, Marius Lindauer",", , , , , , , , , , , ","https://www.jair.org/index.php/jair/article/download/13596/26808","The combination of Reinforcement Learning (RL) with deep learning has led to a series of impressive feats, with many believing (deep) RL provides a path towards generally capable agents. However, the success of RL agents is often highly sensitive to design choices in the training process, which may require tedious and error-prone manual tuning. This makes it challenging to use RL for new problems and also limits its full potential. In many other areas of machine learning, AutoML has shown that it is possible to automate such design choices, and AutoML has also yielded promising initial results when applied to RL. However, Automated Reinforcement Learning (AutoRL) involves not only standard applications of AutoML but also includes additional challenges unique to RL, that naturally produce a different set of methods. As such, AutoRL has been emerging as an important area of research in RL, providing promise in a variety of applications from RNA design to playing games, such as Go. Given the diversity of methods and environments considered in RL, much of the research has been conducted in distinct subfields, ranging from meta-learning to evolution. In this survey, we seek to unify the field of AutoRL, provide a common taxonomy, discuss each area in detail and pose open problems of interest to researchers going forward."
"13470","Finding and Recognizing Popular Coalition Structures","Felix Brandt, Martin Bullinger",", Technical University of Munich","https://www.jair.org/index.php/jair/article/download/13470/26809","An important aspect of multi-agent systems concerns the formation of coalitions that are stable or optimal in some well-defined way. The notion of popularity has recently received a lot of attention in this context. A partition is popular if there is no other partition in which more agents are better off than worse off. In this paper, we study popularity, strong popularity, and mixed popularity (which is particularly attractive because existence is guaranteed by the Minimax Theorem) in a variety of coalition formation settings. Extending previous work on marriage games, we show that mixed popular partitions in roommate games can be found efficiently via linear programming and a separation oracle. This approach is quite universal, leading to efficient algorithms for verifying whether a given partition is popular and for finding strongly popular partitions (resolving an open problem). By contrast, we prove that both problems become computationally intractable when moving from coalitions of size 2 to coalitions of size 3, even when preferences are strict and globally ranked. Moreover, we show that finding popular, strongly popular, and mixed popular partitions in symmetric additively separable hedonic games and symmetric fractional hedonic games is NP-hard. Together, these results indicate strong boundaries to the tractability of popularity in both ordinal and cardinal models of hedonic games."
"13410","Out of Context: A New Clue for Context Modeling of Aspect-based Sentiment Analysis","Bowen Xing, Ivor W. Tsang","University of Technology Sydney, University of Technology Sydney","https://www.jair.org/index.php/jair/article/download/13410/26810","Aspect-based sentiment analysis (ABSA) aims to predict the sentiment expressed in a review with respect to a given aspect. The core of ABSA is to model the interaction between the context and given aspect to extract aspect-related information. In prior work, attention mechanisms and dependency graph networks are commonly adopted to capture the relations between the context and given aspect. And the weighted sum of context hidden states is used as the final representation fed to the classifier. However, the information related to the given aspect may be already discarded and adverse information may be retained in the context modeling processes of existing models. Such a problem cannot be solved by subsequent modules due to two reasons. First, their operations are conducted on the encoder-generated context hidden states, whose value cannot be changed after the encoder. Second, existing encoders only consider the context while not the given aspect. To address this problem, we argue the given aspect should be considered as a new clue out of context in the context modeling process. As for solutions, we design three streams of aspect-aware context encoders: an aspect-aware LSTM, an aspect-aware GCN, and three aspect-aware BERTs. They are dedicated to generating aspect-aware hidden states which are tailored for the ABSA task. In these aspect-aware context encoders, the semantics of the given aspect is used to regulate the information flow. Consequently, the aspect-related information can be retained and aspect-irrelevant information can be excluded in the generated hidden states. We conduct extensive experiments on several benchmark datasets with empirical analysis, demonstrating the efficacies and advantages of our proposed aspect-aware context encoders."
"13472","Fast Adaptive Non-Monotone Submodular Maximization Subject to a Knapsack Constraint","Georgios Amanatidis, Federico Fusco, Philip Lazos, Stefano Leonardi, Rebecca Reiffenhäuser",", Sapienza University of Rome, , , ","https://www.jair.org/index.php/jair/article/download/13472/26811","Constrained submodular maximization problems encompass a wide variety of applications, including personalized recommendation, team formation, and revenue maximization via viral marketing. The massive instances occurring in modern-day applications can render existing algorithms prohibitively slow. Moreover, frequently those instances are also inherently stochastic. Focusing on these challenges, we revisit the classic problem of maximizing a (possibly non-monotone) submodular function subject to a knapsack constraint. We present a simple randomized greedy algorithm that achieves a 5.83-approximation and runs in O(n log n) time, i.e., at least a factor n faster than other state-of-the-art algorithms. The versatility of our approach allows us to further transfer it to a stochastic version of the problem. There, we obtain a (9 + ε)-approximation to the best adaptive policy, which is the first constant approximation for non-monotone objectives. Experimental evaluation of our algorithms showcases their improved performance on real and synthetic data."
"13269","Planning with Critical Section Macros: Theory and Practice","Lukas Chrpa, Mauro Vallati","Czech Technical Univeristy in Prague, University of Huddersfield","https://www.jair.org/index.php/jair/article/download/13269/26812","Macro-operators (macros) are a well-known technique for enhancing performance of planning engines by providing “short-cuts” in the state space. Existing macro learning systems usually generate macros by considering most frequent action sequences in training plans. Unfortunately, frequent action sequences might not capture meaningful activities as a whole, leading to a limited beneficial impact for the planning process. In this paper, inspired by resource locking in critical sections in parallel computing, we propose a technique that generates macros able to capture whole activities in which limited resources (e.g., a robotic hand, or a truck) are used. Specifically, such a Critical Section macro starts by locking the resource (e.g., grabbing an object), continues by using the resource (e.g., manipulating the object) and finishes by releasing the resource (e.g., dropping the object). Hence, such a macro bridges states in which the resource is locked and cannot be used. We also introduce versions of Critical Section macros dealing with multiple resources and phased locks. Usefulness of macros is evaluated using a range of state-of-the-art planners, and a large number of benchmarks from the deterministic and learning tracks of recent editions of the International Planning Competition."
"13519","Cooperation and Learning Dynamics under Wealth Inequality and Diversity in Individual Risk","Ramona Merhej, Fernando P. Santos, Francisco S. Melo, Francisco C. Santos",", Informatics Institute, University of Amsterdam, INESC-ID and Instituto Superior Tecnico, Universidade de Lisboa, INESC-ID and Instituto Superior Tecnico, Universidade de Lisboa","https://www.jair.org/index.php/jair/article/download/13519/26813","We examine how wealth inequality and diversity in the perception of risk of a collective disaster impact cooperation levels in the context of a public goods game with uncertain and non-linear returns. In this game, individuals face a collective-risk dilemma where they may contribute or not to a common pool to reduce their chances of future losses. We draw our conclusions based on social simulations with populations of independent reinforcement learners with diverse levels of risk and wealth. We find that both wealth inequality and diversity in risk assessment can hinder cooperation and augment collective losses. Additionally, wealth inequality further exacerbates long term inequality, causing rich agents to become richer and poor agents to become poorer. On the other hand, diversity in risk only amplifies inequality when combined with bias in group assortment—i.e., high probability that agents from the same risk class play together. Our results also suggest that taking wealth inequality into account can help to design effective policies aiming at leveraging cooperation in large group sizes, a configuration where collective action is harder to achieve. Finally, we characterize the circumstances under which risk perception alignment is crucial and those under which reducing wealth inequality constitutes a deciding factor for collective welfare."
"13507","Inductive Logic Programming At 30: A New Introduction","Andrew Cropper, Sebastijan Dumančić","University of Oxford, TU Delft","https://www.jair.org/index.php/jair/article/download/13507/26814","Inductive logic programming (ILP) is a form of machine learning. The goal of ILP is to induce a hypothesis (a set of logical rules) that generalises training examples. As ILP turns 30, we provide a new introduction to the field. We introduce the necessary logical notation and the main learning settings; describe the building blocks of an ILP system; compare several systems on several dimensions; describe four systems (Aleph, TILDE, ASPAL, and Metagol); highlight key application areas; and, finally, summarise current limitations and directions for future research."
"13283","On the Tractability of SHAP Explanations","Guy Van den Broeck, Anton Lykov, Maximilian Schleich, Dan Suciu",", , , ","https://www.jair.org/index.php/jair/article/download/13283/26815","SHAP explanations are a popular feature-attribution mechanism for explainable AI. They use game-theoretic notions to measure the influence of individual features on the prediction of a machine learning model. Despite a lot of recent interest from both academia and industry, it is not known whether SHAP explanations of common machine learning models can be computed efficiently. In this paper, we establish the complexity of computing the SHAP explanation in three important settings. First, we consider fully-factorized data distributions, and show that the complexity of computing the SHAP explanation is the same as the complexity of computing the expected value of the model. This fully-factorized setting is often used to simplify the SHAP computation, yet our results show that the computation can be intractable for commonly used models such as logistic regression. Going beyond fully-factorized distributions, we show that computing SHAP explanations is already intractable for a very simple setting: computing SHAP explanations of trivial classifiers over naive Bayes distributions. Finally, we show that even computing SHAP over the empirical distribution is #P-hard."
"13599","FOND Planning with Explicit Fairness Assumptions","Ivan D. Rodriguez, Blai Bonet, Sebastian Sardina, Hector Geffner",", , , ","https://www.jair.org/index.php/jair/article/download/13599/26816","We consider the problem of reaching a propositional goal condition in fully-observable nondeterministic (FOND) planning under a general class of fairness assumptions that are given explicitly. The fairness assumptions are of the form A/B and say that state trajectories that contain infinite occurrences of an action a from A in a state s and finite occurrence of actions from B, must also contain infinite occurrences of action a in s followed by each one of its possible outcomes. The infinite trajectories that violate this condition are deemed as unfair, and the solutions are policies for which all the fair trajectories reach a goal state. We show that strong and strong-cyclic FOND planning, as well as QNP planning, a planning model introduced recently for generalized planning, are all special cases of FOND planning with fairness assumptions of this form which can also be combined. FOND+ planning, as this form of planning is called, combines the syntax of FOND planning with some of the versatility of LTL for expressing fairness constraints. A sound and complete FOND+ planner is implemented by reducing FOND+ planning to answer set programs, and its performance is evaluated in comparison with FOND and QNP planners, and LTL synthesis tools. Two other FOND+ planners are introduced as well which are more scalable but are not complete."
"13544","Path Counting for Grid-Based Navigation","Rhys Goldstein, Kean Walmsley, Jacobo Bibliowicz, Alexander Tessier, Simon Breslav, Azam Khan","Autodesk Research, Autodesk Research, Autodesk Research, , Trax.GD, Trax.GD","https://www.jair.org/index.php/jair/article/download/13544/26817","Counting the number of shortest paths on a grid is a simple procedure with close ties to Pascal’s triangle. We show how path counting can be used to select relatively direct grid paths for AI-related applications involving navigation through spatial environments. Typical implementations of Dijkstra’s algorithm and A* prioritize grid moves in an arbitrary manner, producing paths which stray conspicuously far from line-of-sight trajectories. We find that by counting the number of paths which traverse each vertex, then selecting the vertices with the highest counts, one obtains a path that is reasonably direct in practice and can be improved by refining the grid resolution. Central Dijkstra and Central A* are introduced as the basic methods for computing these central grid paths. Theoretical analysis reveals that the proposed grid-based navigation approach is related to an existing grid-based visibility approach, and establishes that central grid paths converge on clear sightlines as the grid spacing approaches zero. A more general property, that central paths converge on direct paths, is formulated as a conjecture."
"13530","Admissibility in Probabilistic Argumentation","Nikolai Käfer, Christel Baier, Martin Diller, Clemens Dubslaff, Sarah Alice Gaggl, Holger Hermanns","Technische Universit¨at Dresden, Faculty of Computer Science, Dresden, Germany, Technische Universit¨at Dresden, Faculty of Computer Science, Dresden, Germany, Technische Universit¨at Dresden, Faculty of Computer Science, Dresden, Germany, Technische Universit¨at Dresden, Faculty of Computer Science, Dresden, Germany, Technische Universit¨at Dresden, Faculty of Computer Science, Dresden, Germany, Saarland University, Saarland Informatics Campus, Saarbr¨ucken, Germany","https://www.jair.org/index.php/jair/article/download/13530/26818","Abstract argumentation is a prominent reasoning framework. It comes with a variety of semantics and has lately been enhanced by probabilities to enable a quantitative treatment of argumentation. While admissibility is a fundamental notion for classical reasoning in abstract argumentation frameworks, it has barely been reflected so far in the probabilistic setting. In this paper, we address the quantitative treatment of abstract argumentation based on probabilistic notions of admissibility. Our approach follows the natural idea of defining probabilistic semantics for abstract argumentation by systematically imposing constraints on the joint probability distribution on the sets of arguments, rather than on probabilities of single arguments. As a result, there might be either a uniquely defined distribution satisfying the constraints, but also none, many, or even an infinite number of satisfying distributions are possible. We provide probabilistic semantics corresponding to the classical complete and stable semantics and show how labeling schemes provide a bridge from distributions back to argument labelings. In relation to existing work on probabilistic argumentation, we present a taxonomy of semantic notions. Enabled by the constraint-based approach, standard reasoning problems for probabilistic semantics can be tackled by SMT solvers, as we demonstrate by a proof-of-concept implementation."
"13197","Impact of Imputation Strategies on Fairness in Machine Learning","Simon Caton, Saiteja Malisetty, Christian Haas","School of Computer Science, University College Dublin, University of Nebraska at Omaha, Department of Strategy and Innovation, Vienna University of Economics and Business (WU)","https://www.jair.org/index.php/jair/article/download/13197/26819","Research on Fairness and Bias Mitigation in Machine Learning often uses a set of reference datasets for the design and evaluation of novel approaches or definitions. While these datasets are well structured and useful for the comparison of various approaches, they do not reflect that datasets commonly used in real-world applications can have missing values. When such missing values are encountered, the use of imputation strategies is commonplace. However, as imputation strategies potentially alter the distribution of data they can also affect the performance, and potentially the fairness, of the resulting predictions, a topic not yet well understood in the fairness literature. In this article, we investigate the impact of different imputation strategies on classical performance and fairness in classification settings. We find that the selected imputation strategy, along with other factors including the type of classification algorithm, can significantly affect performance and fairness outcomes. The results of our experiments indicate that the choice of imputation strategy is an important factor when considering fairness in Machine Learning. We also provide some insights and guidance for researchers to help navigate imputation approaches for fairness."
"13267","Two-phase Multi-document Event Summarization on Core Event Graphs","Zengjian Chen, Jin Xu, Meng Liao, Tong Xue, Kun He",", , , , ","https://www.jair.org/index.php/jair/article/download/13267/26820","Succinct event description based on multiple documents is critical to news systems as well as search engines. Different from existing summarization or event tasks, Multi-document Event Summarization (MES) aims at the query-level event sequence generation, which has extra constraints on event expression and conciseness. Identifying and summarizing the key event from a set of related articles is a challenging task that has not been sufficiently studied, mainly because online articles exhibit characteristics of redundancy and sparsity, and a perfect event summarization needs high level information fusion among diverse sentences and articles. To address these challenges, we propose a two-phase framework for the MES task, that first performs event semantic graph construction and dominant event detection via graph-sequence matching, then summarizes the extracted key event by an event-aware pointer generator. For experiments in the new task, we construct two large-scale real-world datasets for training and assessment. Extensive evaluations show that the proposed framework significantly outperforms the related baseline methods, with the most dominant event of the articles effectively identified and correctly summarized."
"13546","Supervised Visual Attention for Simultaneous Multimodal Machine Translation","Veneta Haralampieva, Ozan Caglayan, Lucia Specia","Imperial College London, Imperial College London, Imperial College London","https://www.jair.org/index.php/jair/article/download/13546/26821","There has been a surge in research in multimodal machine translation (MMT), where additional modalities such as images are used to improve translation quality of textual systems. A particular use for such multimodal systems is the task of simultaneous machine translation, where visual context has been shown to complement the partial information provided by the source sentence, especially in the early phases of translation. In this paper, we propose the first Transformer-based simultaneous MMT architecture, which has not been previously explored in simultaneous translation. Additionally, we extend this model with an auxiliary supervision signal that guides the visual attention mechanism using labelled phrase-region alignments. We perform comprehensive experiments on three language directions and conduct thorough quantitative and qualitative analyses using both automatic metrics and manual inspection. Our results show that (i) supervised visual attention consistently improves the translation quality of the simultaneous MMT models, and (ii) fine-tuning the MMT with supervision loss enabled leads to better performance than training the MMT from scratch. Compared to the state-of-the-art, our proposed model achieves improvements of up to 2.3 BLEU and 3.5 METEOR points."
"13073","A Comprehensive Framework for Learning Declarative Action Models","Diego Aineto, Sergio Jiménez, Eva Onaindia","Universitat Politècnica de Valencia, Universitat Politècnica de València, Universitat Politècnica de València","https://www.jair.org/index.php/jair/article/download/13073/26822","A declarative action model is a compact representation of the state transitions of dynamic systems that generalizes over world objects. The specification of declarative action models is often a complex hand-crafted task. In this paper we formulate declarative action models via state constraints, and present the learning of such models as a combinatorial search. The comprehensive framework presented here allows us to connect the learning of declarative action models to well-known problem solving tasks. In addition, our framework allows us to characterize the existing work in the literature according to four dimensions: (1) the target action models, in terms of the state transitions they define; (2) the available learning examples; (3) the functions used to guide the learning process, and to evaluate the quality of the learned action models; (4) the learning algorithm. Last, the paper lists relevant successful applications of the learning of declarative actions models and discusses some open challenges with the aim of encouraging future research work."
"13187","Evolutionary Dynamics and Phi-Regret Minimization in Games","Georgios Piliouras, Mark Rowland, Shayegan Omidshafiei, Romuald Elie, Daniel Hennes, Jerome Connor, Karl Tuyls",", DeepMind, DeepMind, DeepMind, DeepMind, DeepMind, DeepMind","https://www.jair.org/index.php/jair/article/download/13187/26823","Regret has been established as a foundational concept in online learning, and likewise has important applications in the analysis of learning dynamics in games. Regret quantifies the difference between a learner’s performance against a baseline in hindsight. It is well known that regret-minimizing algorithms converge to certain classes of equilibria in games; however, traditional forms of regret used in game theory predominantly consider baselines that permit deviations to deterministic actions or strategies. In this paper, we revisit our understanding of regret from the perspective of deviations over partitions of the full mixed strategy space (i.e., probability distributions over pure strategies), under the lens of the previously-established Φ-regret framework, which provides a continuum of stronger regret measures. Importantly, Φ-regret enables learning agents to consider deviations from and to mixed strategies, generalizing several existing notions of regret such as external, internal, and swap regret, and thus broadening the insights gained from regret-based analysis of learning algorithms. We prove here that the well-studied evolutionary learning algorithm of replicator dynamics (RD) seamlessly minimizes the strongest possible form of Φ-regret in generic 2 × 2 games, without any modification of the underlying algorithm itself. We subsequently conduct experiments validating our theoretical results in a suite of 144 2 × 2 games wherein RD exhibits a diverse set of behaviors. We conclude by providing empirical evidence of Φ-regret minimization by RD in some larger games, hinting at further opportunity for Φ-regret based study of such algorithms from both a theoretical and empirical perspective."
"13554","Autotelic Agents with Intrinsically Motivated Goal-Conditioned Reinforcement Learning: A Short Survey","Cédric Colas, Tristan Karch, Olivier Sigaud, Pierre-Yves Oudeyer","INRIA, , , ","https://www.jair.org/index.php/jair/article/download/13554/26824","Building autonomous machines that can explore open-ended environments, discover possible interactions and build repertoires of skills is a general objective of artificial intelligence. Developmental approaches argue that this can only be achieved by autotelic agents: intrinsically motivated learning agents that can learn to represent, generate, select and solve their own problems. In recent years, the convergence of developmental approaches with deep reinforcement learning (RL) methods has been leading to the emergence of a new field: developmental reinforcement learning. Developmental RL is concerned with the use of deep RL algorithms to tackle a developmental problem— the intrinsically motivated acquisition of open-ended repertoires of skills. The self-generation of goals requires the learning of compact goal encodings as well as their associated goal-achievement functions. This raises new challenges compared to standard RL algorithms originally designed to tackle pre-defined sets of goals using external reward signals. The present paper introduces developmental RL and proposes a computational framework based on goal-conditioned RL to tackle the intrinsically motivated skills acquisition problem. It proceeds to present a typology of the various goal representations used in the literature, before reviewing existing methods to learn to represent and prioritize goals in autonomous systems. We finally close the paper by discussing some open challenges in the quest of intrinsically motivated skills acquisition."
"13689","CoLLIE: Continual Learning of Language Grounding from Language-Image Embeddings","Gabriel Skantze, Bram Willemsen","KTH, KTH Royal Institute of Technology","https://www.jair.org/index.php/jair/article/download/13689/26825","This paper presents CoLLIE: a simple, yet effective model for continual learning of how language is grounded in vision. Given a pre-trained multimodal embedding model, where language and images are projected in the same semantic space (in this case CLIP by OpenAI), CoLLIE learns a transformation function that adjusts the language embeddings when needed to accommodate new language use. This is done by predicting the difference vector that needs to be applied, as well as a scaling factor for this vector, so that the adjustment is only applied when needed. Unlike traditional few-shot learning, the model does not just learn new classes and labels, but can also generalize to similar language use and leverage semantic compositionality. We verify the model’s performance on two different tasks of identifying the targets of referring expressions, where it has to learn new language use. The results show that the model can efficiently learn and generalize from only a few examples, with little interference with the model’s original zero-shot performance."
"13138","Learning Bayesian Networks Under Sparsity Constraints: A Parameterized Complexity Analysis","Niels Grüttemeier, Christian Komusiewicz","Philipps-Universität Marburg, Philipps-Universität Marburg","https://www.jair.org/index.php/jair/article/download/13138/26826","We study the problem of learning the structure of an optimal Bayesian network when additional constraints are posed on the network or on its moralized graph. More precisely, we consider the constraint that the network or its moralized graph are close, in terms of vertex or edge deletions, to a sparse graph class Π. For example, we show that learning an optimal network whose moralized graph has vertex deletion distance at most k from a graph with maximum degree 1 can be computed in polynomial time when k is constant. This extends previous work that gave an algorithm with such a running time for the vertex deletion distance to edgeless graphs. We then show that further extensions or improvements are presumably impossible. For example, we show that learning optimal networks where the network or its moralized graph have maximum degree 2 or connected components of size at most c, c ≥ 3, is NP-hard. Finally, we show that learning an optimal network with at most k edges in the moralized graph presumably has no f(k) · |I|O(1)-time algorithm and that, in contrast, an optimal network with at most k arcs can be computed in 2O(k) · |I|O(1) time where |I| is the total input size."
"13643","HEBO: Pushing The Limits of Sample-Efficient Hyper-parameter Optimisation","Alexander I. Cowen-Rivers, Wenlong Lyu, Rasul Tutunov, Zhi Wang, Antoine Grosnit, Ryan Rhys Griffiths, Alexandre Max Maraval, Hao Jianye, Jun Wang, Jan Peters, Haitham Bou-Ammar",", , , , , , , , , , ","https://www.jair.org/index.php/jair/article/download/13643/26827","In this work we rigorously analyse assumptions inherent to black-box optimisation hyper-parameter tuning tasks. Our results on the Bayesmark benchmark indicate that heteroscedasticity and non-stationarity pose significant challenges for black-box optimisers. Based on these findings, we propose a Heteroscedastic and Evolutionary Bayesian Optimisation solver (HEBO). HEBO performs non-linear input and output warping, admits exact marginal log-likelihood optimisation and is robust to the values of learned parameters. We demonstrate HEBO’s empirical efficacy on the NeurIPS 2020 Black-Box Optimisation challenge, where HEBO placed first. Upon further analysis, we observe that HEBO significantly outperforms existing black-box optimisers on 108 machine learning hyperparameter tuning tasks comprising the Bayesmark benchmark. Our findings indicate that the majority of hyper-parameter tuning tasks exhibit heteroscedasticity and non-stationarity, multiobjective acquisition ensembles with Pareto front solutions improve queried configurations, and robust acquisition maximisers afford empirical advantages relative to their non-robust counterparts. We hope these findings may serve as guiding principles for practitioners of Bayesian optimisation."
"13083","Crossing the Conversational Chasm: A Primer on Natural Language Processing for Multilingual Task-Oriented Dialogue Systems","Evgeniia Razumovskaia, Goran Glavas, Olga Majewska, Edoardo M. Ponti, Anna Korhonen, Ivan Vulic","Language Technology Lab, University of Cambridge, UK, Data and Web Science Group, University of Mannheim, Germany, Language Technology Lab, University of Cambridge, UK, Mila - Quebec AI Institute and McGill University, Canada; University of Cambridge, UK, Language Technology Lab, University of Cambridge, UK, Language Technology Lab, University of Cambridge, UK; PolyAI Limited, UK","https://www.jair.org/index.php/jair/article/download/13083/26828","In task-oriented dialogue (ToD), a user holds a conversation with an artificial agent  with the aim of completing a concrete task. Although this technology represents one of  the central objectives of AI and has been the focus of ever more intense research and  development efforts, it is currently limited to a few narrow domains (e.g., food ordering,  ticket booking) and a handful of languages (e.g., English, Chinese). This work provides an  extensive overview of existing methods and resources in multilingual ToD as an entry point  to this exciting and emerging field. We find that the most critical factor preventing the  creation of truly multilingual ToD systems is the lack of datasets in most languages for  both training and evaluation. In fact, acquiring annotations or human feedback for each  component of modular systems or for data-hungry end-to-end systems is expensive and  tedious. Hence, state-of-the-art approaches to multilingual ToD mostly rely on (zero- or  few-shot) cross-lingual transfer from resource-rich languages (almost exclusively English),  either by means of (i) machine translation or (ii) multilingual representations. These  approaches are currently viable only for typologically similar languages and languages with  parallel / monolingual corpora available. On the other hand, their effectiveness beyond these  boundaries is doubtful or hard to assess due to the lack of linguistically diverse benchmarks  (especially for natural language generation and end-to-end evaluation). To overcome this  limitation, we draw parallels between components of the ToD pipeline and other NLP tasks,  which can inspire solutions for learning in low-resource scenarios. Finally, we list additional  challenges that multilinguality poses for related areas (such as speech, fluency in generated  text, and human-centred evaluation), and indicate future directions that hold promise to  further expand language coverage and dialogue capabilities of current ToD systems."
"13603","Recursion in Abstract Argumentation is Hard  ---  On the Complexity of Semantics Based on Weak Admissibility","Wolfgang Dvořák, Markus Ulbricht, Stefan Woltran","TU Wien, Leipzig University, TU Wien","https://www.jair.org/index.php/jair/article/download/13603/26829","We study the computational complexity of abstract argumentation semantics based on  weak admissibility, a recently introduced concept to deal with arguments of self-defeating  nature. Our results reveal that semantics based on weak admissibility are of much higher  complexity (under typical assumptions) compared to all argumentation semantics which  have been analysed in terms of complexity so far. In fact, we show PSPACE-completeness  of all non-trivial standard decision problems for weak-admissible based semantics. We then  investigate potential tractable fragments and show that restricting the frameworks under  consideration to certain graph-classes significantly reduces the complexity. We also show  that weak-admissibility based extensions can be computed by dividing the given graph into  its strongly connected components (SCCs). This technique ensures that the bottleneck  when computing extensions is the size of the largest SCC instead of the size of the graph  itself and therefore contributes to the search for fixed-parameter tractable implementations  for reasoning with weak admissibility."
"13338","Metric-Distortion Bounds under Limited Information","Ioannis Anagnostides, Dimitris Fotakis, Panagiotis Patsilinakos","Carnegie Mellon University, National Technical University of Athens, National Technical University of Athens","https://www.jair.org/index.php/jair/article/download/13338/26830","In this work, we study the metric distortion problem in voting theory under a limited amount of ordinal information. Our primary contribution is threefold. First, we consider mechanisms that perform a sequence of pairwise comparisons between candidates. We show that a popular deterministic mechanism employed in many knockout phases yields distortion O(log m) while eliciting only m − 1 out of the Θ(m2 ) possible pairwise comparisons, where m represents the number of candidates. Our analysis for this mechanism leverages a powerful technical lemma developed by Kempe (AAAI ‘20). We also provide a matching lower bound on its distortion. In contrast, we prove that any mechanism which performs fewer than m−1 pairwise comparisons is destined to have unbounded distortion. Moreover, we study the power of deterministic mechanisms under incomplete rankings. Most notably, when agents provide their k-top preferences we show an upper bound of 6m/k + 1 on the distortion, for any k ∈ {1, 2, . . . , m}. Thus, we substantially improve over the previous bound of 12m/k established by Kempe (AAAI ‘20), and we come closer to matching the best-known lower bound. Finally, we are concerned with the sample complexity required to ensure near-optimal distortion with high probability. Our main contribution is to show that a random sample of Θ(m/ϵ2 ) voters suffices to guarantee distortion 3 + ϵ with high probability, for any sufficiently small ϵ > 0. This result is based on analyzing the sensitivity of the deterministic mechanism introduced by Gkatzelis, Halpern, and Shah (FOCS ‘20). Importantly, all of our sample-complexity bounds are distribution-independent. From an experimental standpoint, we present several empirical findings on real-life voting applications, comparing the scoring systems employed in practice with a mechanism explicitly minimizing (metric) distortion. Interestingly, for our case studies, we find that the winner in the actual competition is typically the candidate who minimizes the distortion."
"13382","Improving Simulated Annealing for Clique Partitioning Problems","Jian Gao, Yiqi Lv, Minghao Liu, Shaowei Cai, Feifei Ma",", , , , ","https://www.jair.org/index.php/jair/article/download/13382/26832","The Clique Partitioning Problem (CPP) is essential in graph theory with a number of important applications. Due to its NP-hardness, efficient algorithms for solving this problem are very crucial for practical purposes, and simulated annealing is proved to be effective in state-of-the-art CPP algorithms. However, to make simulated annealing more efficient to solve large-scale CPPs, in this paper, we propose a new iterated simulated annealing algorithm. Several methods are proposed in our algorithm to improve simulated annealing. First, a new configuration checking strategy based on timestamp is presented and incorporated into simulated annealing to avoid search cycles. Afterwards, to enhance the local search ability of simulated annealing and speed up convergence, we combine our simulated annealing with a descent search method to solve the CPP. This method further improves solutions found by simulated annealing, and thus compensates for the local search effect. To further accelerate the convergence speed, we introduce a shrinking factor to decline initial temperature and then propose an iterated local search algorithm based on simulated annealing. Additionally, a restart strategy is adopted when the search procedure converges. Extensive experiments on benchmark instances of the CPP were carried out, and the results suggest that the proposed simulated annealing algorithm outperforms all the existing heuristic algorithms, including five state-of-the-art algorithms. Thus the best-known solutions for 34 instances out of 94 are updated. We also conduct comparative analyses of the proposed strategies and show their effectiveness."
"13666","Better Decision Heuristics in CDCL through Local Search and Target Phases","Shaowei Cai, Xindi Zhang, Mathias Fleury, Armin Biere","Institute of Software, Chinese Academy of Sciences, , , ","https://www.jair.org/index.php/jair/article/download/13666/26833","On practical applications, state-of-the-art SAT solvers dominantly use the conflict-driven clause learning (CDCL) paradigm. An alternative for satisfiable instances is local search solvers, which is more successful on random and hard combinatorial instances. Although there have been attempts to combine these methods in one framework, a tight integration which improves the state of the art on a broad set of application instances has been missing. We present a combination of techniques that achieves such an improvement. Our first contribution is to maximize in a local search fashion the assignment trail in CDCL, by sticking to and extending promising assignments via a technique called target phases. Second, we relax the CDCL framework by again extending promising branches to complete assignments while ignoring conflicts. These assignments are then used as starting point of local search which tries to find improved assignments with fewer unsatisfied clauses. Third, these improved assignments are imported back to the CDCL loop where they are used to determine the value assigned to decision variables. Finally, the conflict frequency of variables in local search can be exploited during variable selection in branching heuristics of CDCL. We implemented these techniques to improve three representative CDCL solvers (Glucose, MapleLcm DistChronoBT, and Kissat). Experiments on benchmarks from the main tracks of the last three SAT Competitions from 2019 to 2021 and an additional benchmark set from spectrum allocation show that the techniques bring significant improvements, particularly and not surprisingly, on satisfiable real-world application instances. We claim that these techniques were essential to the large increase in performance witnessed in the SAT Competition 2020 where Kissat and Relaxed LcmdCbDl NewTech were leading the field followed by CryptoMiniSAT-Ccnr, which also incorporated similar ideas."
"13981","Joint Optimization of Concave Scalarized Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm","Qinbo Bai, Mridul Agarwal, Vaneet Aggarwal",", , Purdue University","https://www.jair.org/index.php/jair/article/download/13981/26834","Many engineering problems have multiple objectives, and the overall aim is to optimize a non-linear function of these objectives. In this paper, we formulate the problem of maximizing a non-linear concave function of multiple long-term objectives. A policy-gradient based model-free algorithm is proposed for the problem. To compute an estimate of the gradient, an asymptotically biased estimator is proposed. The proposed algorithm is shown to achieve convergence to within an ε of the global optima after sampling O(M4 σ2/(1-γ)8ε4) trajectories where γ is the discount factor and M is the number of the agents, thus achieving the same dependence on ε as the policy gradient algorithm for the standard reinforcement learning."
"13768","Classical Planning in Deep Latent Space","Masataro Asai, Hiroshi Kajino, Alex Fukunaga, Christian Muise","MIT-IBM Watson AI Lab, IBM Research, IBM Research - Tokyo, Tokyo Japan, Graduate School of Arts and Sciences, University of Tokyo, Tokyo Japan, School of Computing, Queen’s University, Kingston Canada","https://www.jair.org/index.php/jair/article/download/13768/26835","Current domain-independent, classical planners require symbolic models of the problem domain and instance as input, resulting in a knowledge acquisition bottleneck. Meanwhile, although deep learning has achieved significant success in many fields, the knowledge is encoded in a subsymbolic representation which is incompatible with symbolic systems such as planners. We propose Latplan, an unsupervised architecture combining deep learning and classical planning. Given only an unlabeled set of image pairs showing a subset of transitions allowed in the environment (training inputs), Latplan learns a complete propositional PDDL action model of the environment. Later, when a pair of images representing the initial and the goal states (planning inputs) is given, Latplan finds a plan to the goal state in a symbolic latent space and returns a visualized plan execution. We evaluate Latplan using image-based versions of 6 planning domains: 8-puzzle, 15-Puzzle, Blocksworld, Sokoban and Two variations of LightsOut."
"13661","Threshold Treewidth and Hypertree Width","Robert Ganian, Andre Schidler, Manuel Sorge, Stefan Szeider","TU Wien, , , ","https://www.jair.org/index.php/jair/article/download/13661/26836","Treewidth and hypertree width have proven to be highly successful structural parameters in the context of the Constraint Satisfaction Problem (CSP). When either of these parameters is bounded by a constant, then CSP becomes solvable in polynomial time. However, here the order of the polynomial in the running time depends on the width, and this is known to be unavoidable; therefore, the problem is not fixed-parameter tractable parameterized by either of these width measures. Here we introduce an enhancement of tree and hypertree width through a novel notion of thresholds, allowing the associated decompositions to take into account information about the computational costs associated with solving the given CSP instance. Aside from introducing these notions, we obtain efficient theoretical as well as empirical algorithms for computing threshold treewidth and hypertree width and show that these parameters give rise to fixed-parameter algorithms for CSP as well as other, more general problems. We complement our theoretical results with experimental evaluations in terms of heuristics as well as exact methods based on SAT/SMT encodings."
"13816","C-Face: Using Compare Face on Face Hallucination for Low-Resolution Face Recognition","Feng Han, Xudong Wang, Furao Shen, Jian Zhao",", , , ","https://www.jair.org/index.php/jair/article/download/13816/26837","Face hallucination is a task of generating high-resolution (HR) face images from low-resolution (LR) inputs, which is a subfield of the general image super-resolution. However, most of the previous methods only consider the visual effect, ignoring how to maintain the identity of the face. In this work, we propose a novel face hallucination model, called C-Face network, which can generate HR images with high visual quality while preserving the identity information. A face recognition network is used to extract the identity features in the training process. In order to make the reconstructed face images keep the identity information to a great extent, a novel metric, i.e., C-Face loss, is proposed. We also propose a new training algorithm to deal with the convergence problem. Moreover, since our work mainly focuses on the recognition accuracy of the output, we integrate face recognition into the face hallucination process which ensures that the model can be used in real scenarios. Extensive experiments on two large scale face datasets demonstrate that our C-Face network has the best performance compared with other state-of-the-art methods."
"13487","Synthesis and Properties of Optimally Value-Aligned Normative Systems","Nieves Montes, Carles Sierra","Artificial Intelligence Research Institute (IIIA-CSIC), Artificial Intelligence Research Institute (IIIA-CSIC)","https://www.jair.org/index.php/jair/article/download/13487/26839","The value alignment problem is concerned with the design of systems that provably abide by our human values. One approach to this challenge is through the leverage of prescriptive norms that, if carefully designed, are able to steer a multiagent system away from harmful outcomes and towards more beneficial ones. In this work, we first present a general methodology for the automated synthesis of value aligned normative systems, based on a consequentialist view of values. In the second part, we provide analytical tools to examine such value aligned normative systems, namely the Shapley value of individual norms and the compatibility of several values under a fixed set of norms. We illustrate all of our contributions with a running example of a society of agents where taxes are collected and redistributed according to a set of parametrised norms."
"13547","The Computational Complexity of ReLU Network Training Parameterized by Data Dimensionality","Vincent Froese, Christoph Hertrich, Rolf Niedermeier",", TU Berlin, TU Berlin","https://www.jair.org/index.php/jair/article/download/13547/26841","Understanding the computational complexity of training simple neural networks with rectified linear units (ReLUs) has recently been a subject of intensive research. Closing gaps and complementing results from the literature, we present several results on the parameterized complexity of training two-layer ReLU networks with respect to various loss functions. After a brief discussion of other parameters, we focus on analyzing the influence of the dimension d of the training data on the computational complexity. We provide running time lower bounds in terms of W[1]-hardness for parameter d and prove that known brute-force strategies are essentially optimal (assuming the Exponential Time Hypothesis). In comparison with previous work, our results hold for a broad(er) range of loss functions, including lp-loss for all p ∈ [0, ∞]. In particular, we improve a known polynomial-time algorithm for constant d and convex loss functions to a more general class of loss functions, matching our running time lower bounds also in these cases."
"13704","Pricing Problems with Buyer Preselection","Vittorio Bilò, Michele Flammini, Gianpiero Monaco, Luca Moscardelli",", , , University of Chieti-Pescara","https://www.jair.org/index.php/jair/article/download/13704/26842","We investigate the problem of preselecting a subset of buyers (also called agents) participating in a market so as to optimize the performance of stable outcomes. We consider four scenarios arising from the combination of two stability notions, namely market envy-freeness and agent envy-freeness, with the two state-of-the-art objective functions of social welfare and seller’s revenue. When insisting on market envy-freeness, we prove that the problem cannot be approximated within n 1−ε (with n being the number of buyers) for any ε > 0, under both objective functions; we also provide approximation algorithms with an approximation ratio tight up to subpolynomial multiplicative factors for social welfare and the seller’s revenue. The negative result, in particular, holds even for markets with single-minded buyers. We also prove that maximizing the seller’s revenue is NP-hard even for a single buyer, thus closing a previous open question. Under agent envy-freeness and for both objective functions, instead, we design a polynomial time algorithm transforming any stable outcome for a market involving any subset of buyers into a stable outcome for the whole market without worsening its performance. This result creates an interesting middle-ground situation where, if on the one hand buyer preselection cannot improve the performance of agent envy-free outcomes, on the other one it can be used as a tool for simplifying the combinatorial structure of the buyers’ valuation functions in a given market. Finally, we consider the restricted case of multi-unit markets, where all items are of the same type and are assigned the same price. For these markets, we show that preselection may improve the performance of stable outcomes in all of the four considered scenarios, and design corresponding approximation algorithms."
"13482","Efficient Learning of Interpretable Classification Rules","Bishwamittra Ghosh, Dmitry Malioutov, Kuldeep S. Meel","National University of Singapore, , National University of Singapore","https://www.jair.org/index.php/jair/article/download/13482/26843","Machine learning has become omnipresent with applications in various safety-critical domains such as medical, law, and transportation. In these domains, high-stake decisions provided by machine learning necessitate researchers to design interpretable models, where the prediction is understandable to a human. In interpretable machine learning, rule-based classifiers are particularly effective in representing the decision boundary through a set of rules comprising input features. Examples of such classifiers include decision trees, decision lists, and decision sets. The interpretability of rule-based classifiers is in general related to the size of the rules, where smaller rules are considered more interpretable. To learn such a classifier, the brute-force direct approach is to consider an optimization problem that tries to learn the smallest classification rule that has close to maximum accuracy. This optimization problem is computationally intractable due to its combinatorial nature and thus, the problem is not scalable in large datasets. To this end, in this paper we study the triangular relationship among the accuracy, interpretability, and scalability of learning rule-based classifiers. The contribution of this paper is an interpretable learning framework IMLI, that is based on maximum satisfiability (MaxSAT) for synthesizing classification rules expressible in proposition logic. IMLI considers a joint objective function to optimize the accuracy and the interpretability of classification rules and learns an optimal rule by solving an appropriately designed MaxSAT query. Despite the progress of MaxSAT solving in the last decade, the straightforward MaxSAT-based solution cannot scale to practical classification datasets containing thousands to millions of samples. Therefore, we incorporate an efficient incremental learning technique inside the MaxSAT formulation by integrating mini-batch learning and iterative rule-learning. The resulting framework learns a classifier by iteratively covering the training data, wherein in each iteration, it solves a sequence of smaller MaxSAT queries corresponding to each mini-batch. In our experiments, IMLI achieves the best balance among prediction accuracy, interpretability, and scalability. For instance, IMLI attains a competitive prediction accuracy and interpretability w.r.t. existing interpretable classifiers and demonstrates impressive scalability on large datasets where both interpretable and non-interpretable classifiers fail. As an application, we deploy IMLI in learning popular interpretable classifiers such as decision lists and decision sets. The source code is available at https://github.com/meelgroup/mlic."
"13361","Motion Planning Under Uncertainty with Complex Agents and Environments via Hybrid Search","Daniel Strawser, Brian Williams","MIT, MIT","https://www.jair.org/index.php/jair/article/download/13361/26844","As autonomous systems and robots are applied to more real world situations, they must reason about uncertainty when planning actions. Mission success oftentimes cannot be guaranteed and the planner must reason about the probability of failure. Unfortunately, computing a trajectory that satisfies mission goals while constraining the probability of failure is difficult because of the need to reason about complex, multidimensional probability distributions. Recent methods have seen success using chance-constrained, model-based planning. However, the majority of these methods can only handle simple environment and agent models. We argue that there are two main drawbacks of current approaches to goal-directed motion planning under uncertainty. First, current methods suffer from an inability to deal with expressive environment models such as 3D non-convex obstacles. Second, most planners rely on considerable simplifications when computing trajectory risk including approximating the agent’s dynamics, geometry, and uncertainty. In this article, we apply hybrid search to the risk-bound, goal-directed planning problem. The hybrid search consists of a region planner and a trajectory planner. The region planner makes discrete choices by reasoning about geometric regions that the autonomous agent should visit in order to accomplish its mission. In formulating the region planner, we propose landmark regions that help produce obstacle-free paths. The region planner passes paths through the environment to a trajectory planner; the task of the trajectory planner is to optimize trajectories that respect the agent’s dynamics and the user’s desired risk of mission failure. We discuss three approaches to modeling trajectory risk: a CDF-based approach, a sampling-based collocation method, and an algorithm named Shooting Method Monte Carlo. These models allow computation of trajectory risk with more complex environments, agent dynamics, geometries, and models of uncertainty than past approaches. A variety of 2D and 3D test cases are presented including a linear case, a Dubins car model, and an underwater autonomous vehicle. The method is shown to outperform other methods in terms of speed and utility of the solution. Additionally, the models of trajectory risk are shown to better approximate risk in simulation."
"13999","sEMG-Based Upper Limb Movement Classifier: Current Scenario and Upcoming Challenges","Maurício Cagliari Tosin, Juliano Costa Machado, Alexandre Balbinot","Universidade Federal do Rio Grande do Sul, , ","https://www.jair.org/index.php/jair/article/download/13999/26845","Despite achieving accuracies higher than 90% on recognizing upper-limb movements through sEMG (surface Electromyography) signal with the state of art classifiers in the laboratory environment, there are still issues to be addressed for a myo-controlled prosthesis achieve similar performance in real environment conditions. Thereby, the main goal of this review is to expose the latest researches in terms of strategies in each block of the system, giving a global view of the current state of academic research. A systematic review was conducted, and the retrieved papers were organized according to the system step related to the proposed method. Then, for each stage of the upper limb motion recognition system, the works were described and compared in terms of strategy, methodology and issue addressed. An additional section was destined for the description of works related to signal contamination that is often neglected in reviews focused on sEMG based motion classifiers. Therefore, this section is the main contribution of this paper. Deep learning methods are a current trend for classification stage, providing strategies based on time-series and transfer learning to address the issues related to limb position, temporal/inter-subject variation, and electrode displacement. Despite the promising strategies presented for contaminant detection, identification, and removal, there are still some factors to be considered, such as the occurrence of simultaneous contaminants. This review exposes the current scenario of the movement classification system, providing valuable information for new researchers and guiding future works towards myo-controlled devices."
"13706","Altruistic Hedonic Games","Anna Maria Kerkmann, Nhan-Tam Nguyen, Anja Rey, Lisa Rey, Jörg Rothe, Lena Schend, Alessandra Wiechers",", , , , , , ","https://www.jair.org/index.php/jair/article/download/13706/26846","Hedonic games are coalition formation games in which players have preferences over the coalitions they can join. For a long time, all models of representing hedonic games were based upon selfish players only. Among the known ways of representing hedonic games compactly, we focus on friend-oriented hedonic games and propose a novel model for them that takes into account not only the players’ own preferences but also their friends’ preferences. Depending on the order in which players look at their own or their friends’ preferences, we distinguish three degrees of altruism: selfish-first, equal-treatment, and altruistic-treatment preferences. We study both the axiomatic properties of these games and the computational complexity of problems related to various common stability concepts."
"12862","Can We Automate Scientific Reviewing?","Weizhe Yuan, Pengfei Liu, Graham Neubig",", Carnegie Mellon University, ","https://www.jair.org/index.php/jair/article/download/12862/26847","The rapid development of science and technology has been accompanied by an exponential growth in peer-reviewed scientific publications. At the same time, the review of each paper is a laborious process that must be carried out by subject matter experts. Thus, providing high-quality reviews of this growing number of papers is a significant challenge. In this work, we ask the question “can we automate scientific reviewing? ”, discussing the possibility of using natural language processing (NLP) models to generate peer reviews for scientific papers. Because it is non-trivial to define what a “good” review is in the first place, we first discuss possible evaluation metrics that could be used to judge success in this task. We then focus on the machine learning domain and collect a dataset of papers in the domain, annotate them with different aspects of content covered in each review, and train targeted summarization models that take in papers as input and generate reviews as output. Comprehensive experimental results on the test set show that while system-generated reviews are comprehensive, touching upon more aspects of the paper than human-written reviews, the generated texts are less constructive and less factual than human-written reviews for all aspects except the explanation of the core ideas of the papers, which are largely factually correct. Given these results, we pose eight challenges in the pursuit of a good review generation system together with potential solutions, which, hopefully, will inspire more future research in this direction. We make relevant resource publicly available for use by future research: https://github. com/neulab/ReviewAdvisor. In addition, while our conclusion is that the technology is not yet ready for use in high-stakes review settings we provide a system demo, ReviewAdvisor (http://review.nlpedia.ai/), showing the current capabilities and failings of state-of-the-art NLP models at this task (see demo screenshot in A.2). A review of this paper written by the system proposed in this paper can be found in A.1."
"13743","On Efficient Reinforcement Learning for Full-length Game of StarCraft II","Ruo-Ze Liu, Zhen-Jia Pang, Zhou-Yu Meng, Wenhai Wang, Yang Yu, Tong Lu","Nanjing University, , , , , ","https://www.jair.org/index.php/jair/article/download/13743/26848","StarCraft II (SC2) poses a grand challenge for reinforcement learning (RL), of which the main difficulties include huge state space, varying action space, and a long time horizon. In this work, we investigate a set of RL techniques for the full-length game of StarCraft II. We investigate a hierarchical RL approach, where the hierarchy involves two. One is the extracted macro-actions from experts’ demonstration trajectories to reduce the action space in an order of magnitude. The other is a hierarchical architecture of neural networks, which is modular and facilitates scale. We investigate a curriculum transfer training procedure that trains the agent from the simplest level to the hardest level. We train the agent on a single machine with 4 GPUs and 48 CPU threads. On a 64x64 map and using restrictive units, we achieve a win rate of 99% against the difficulty level-1 built-in AI. Through the curriculum transfer learning algorithm and a mixture of combat models, we achieve a 93% win rate against the most difficult non-cheating level built-in AI (level-7). In this extended version of the paper, we improve our architecture to train the agent against the most difficult cheating level AIs (level-8, level-9, and level-10). We also test our method on different maps to evaluate the extensibility of our approach. By a final 3-layer hierarchical architecture and applying significant tricks to train SC2 agents, we increase the win rate against the level-8, level-9, and level-10 to 96%, 97%, and 94%, respectively. Our codes and models are all open-sourced now at https://github.com/liuruoze/HierNet-SC2. To provide a baseline referring the AlphaStar for our work as well as the research and open-source community, we reproduce a scaled-down version of it, mini-AlphaStar (mAS). The latest version of mAS is 1.07, which can be trained using supervised learning and reinforcement learning on the raw action space which has 564 actions. It is designed to run training on a single common machine, by making the hyper-parameters adjustable and some settings simplified. We then can compare our work with mAS using the same computing resources and training time. By experiment results, we show that our method is more effective when using limited resources. The inference and training codes of mini-AlphaStar are all open-sourced at https://github.com/liuruoze/mini-AlphaStar. We hope our study could shed some light on the future research of efficient reinforcement learning on SC2 and other large-scale games."
"13575","On Tackling Explanation Redundancy in Decision Trees","Yacine Izza, Alexey Ignatiev, Joao Marques-Silva",", , ","https://www.jair.org/index.php/jair/article/download/13575/26849","Decision trees (DTs) epitomize the ideal of interpretability of machine learning (ML) models. The interpretability of decision trees motivates explainability approaches by so-called intrinsic interpretability, and it is at the core of recent proposals for applying interpretable ML models in high-risk applications. The belief in DT interpretability is justified by the fact that explanations for DT predictions are generally expected to be succinct. Indeed, in the case of DTs, explanations correspond to DT paths. Since decision trees are ideally shallow, and so paths contain far fewer features than the total number of features, explanations in DTs are expected to be succinct, and hence interpretable. This paper offers both theoretical and experimental arguments demonstrating that, as long as interpretability of decision trees equates with succinctness of explanations, then decision trees ought not be deemed interpretable. The paper introduces logically rigorous path explanations and path explanation redundancy, and proves that there exist functions for which decision trees must exhibit paths with explanation redundancy that is arbitrarily larger than the actual path explanation. The paper also proves that only a very restricted class of functions can be represented with DTs that exhibit no explanation redundancy. In addition, the paper includes experimental results substantiating that path explanation redundancy is observed ubiquitously in decision trees, including those obtained using different tree learning algorithms, but also in a wide range of publicly available decision trees. The paper also proposes polynomial-time algorithms for eliminating path explanation redundancy, which in practice require negligible time to compute. Thus, these algorithms serve to indirectly attain irreducible, and so succinct, explanations for decision trees. Furthermore, the paper includes novel results related with duality and enumeration of explanations, based on using SAT solvers as witness-producing NP-oracles."
"13818","Multi-Agent Path Finding: A New Boolean Encoding","Roberto Asín Achá, Rodrigo López, Sebastian Hagedorn, Jorge A. Baier","Universidad de Concepción, Universidad de Chile & Pontificia Universidad Católica de Chile, Pontificia Universidad Católica de Chile, Pontificia Universidad Católica de Chile","https://www.jair.org/index.php/jair/article/download/13818/26850","Multi-agent pathfinding (MAPF) is an NP-hard problem. As such, dense maps may be very hard to solve optimally. In such scenarios, compilation-based approaches, via Boolean satisfiability (SAT) and answer set programming (ASP), have been shown to outperform heuristic-search-based approaches, such as conflict-based search (CBS). In this paper, we propose a new Boolean encoding for MAPF, and show how to implement it in ASP and MaxSAT. A feature that distinguishes our encoding from existing ones is that swap and follow conflicts are encoded using binary clauses, which can be exploited by current conflict-driven clause learning (CDCL) solvers. In addition, the number of clauses used to encode swap and follow conflicts do not depend on the number of agents, allowing us to scale better. For MaxSAT, we study different ways in which we may combine the MSU3 and LSU algorithms for maximum performance. In our experimental evaluation, we used square grids, ranging from 20 x 20 to 50 x 50 cells, and warehouse maps, with a varying number of agents and obstacles. We compared against representative solvers of the state-of-the-art, including the search-based algorithm CBS, the ASP-based solver ASP-MAPF, and the branch-and-cut-and-price hybrid solver, BCP. We observe that the ASP implementation of our encoding, ASP-MAPF2 outperforms other solvers in most of our experiments. The MaxSAT implementation of our encoding, MtMS shows best performance in relatively small warehouse maps when the number of agents is large, which are the instances with closer resemblance to hard puzzle-like problems."
"13566","Domain Adaptation and Multi-Domain Adaptation for Neural Machine Translation: A Survey","Danielle Saunders","SDL plc","https://www.jair.org/index.php/jair/article/download/13566/26851","The development of deep learning techniques has allowed Neural Machine Translation (NMT) models to become extremely powerful, given sufficient training data and training time. However, systems struggle when translating text from a new domain with a distinct style or vocabulary. Fine-tuning on in-domain data allows good domain adaptation, but requires sufficient relevant bilingual data. Even if this is available, simple fine-tuning can cause overfitting to new data and catastrophic forgetting of previously learned behaviour. We survey approaches to domain adaptation for NMT, particularly where a system may need to translate across multiple domains. We divide techniques into those revolving around data selection or generation, model architecture, parameter adaptation procedure, and inference procedure. We finally highlight the benefits of domain adaptation and multidomain adaptation techniques to other lines of NMT research."
"13676","A Survey of Methods for Automated Algorithm Configuration","Elias Schede, Jasmin Brandt, Alexander Tornede, Marcel Wever, Viktor Bengs, Eyke Hüllermeier, Kevin Tierney","Bielefeld University, Department of Computer Science,  Paderborn University, Department of Computer Science, Paderborn University,, Institute of Informatics, LMU Munich, Institute of Informatics, LMU Munich, Institute of Informatics, LMU Munich, Decision and Operation Technologies Group, Bielefeld University","https://www.jair.org/index.php/jair/article/download/13676/26852","Algorithm configuration (AC) is concerned with the automated search of the most suitable parameter configuration of a parametrized algorithm. There is currently a wide variety of AC problem variants and methods proposed in the literature. Existing reviews do not take into account all derivatives of the AC problem, nor do they offer a complete classification scheme. To this end, we introduce taxonomies to describe the AC problem and features of configuration methods, respectively. We review existing AC literature within the lens of our taxonomies, outline relevant design choices of configuration approaches, contrast methods and problem variants against each other, and describe the state of AC in industry. Finally, our review provides researchers and practitioners with a look at future research directions in the field of AC."
"13446","Planning with Perspectives -- Decomposing Epistemic Planning using Functional STRIPS","Guang Hu, Tim Miller, Nir Lipovetzky","The University of Melbourne, The University of Melbourne, The University of Melbourne","https://www.jair.org/index.php/jair/article/download/13446/26853","In this paper, we present a novel approach to epistemic planning called planning with perspectives (PWP) that is both more expressive and computationally more efficient than existing state-of-the-art epistemic planning tools. Epistemic planning — planning with knowledge and belief — is essential in many multi-agent and human-agent interaction domains. Most state-of-the-art epistemic planners solve epistemic planning problems by either compiling to propositional classical planning (for example, generating all possible knowledge atoms or compiling epistemic formulae to normal forms); or explicitly encoding Kripke-based semantics. However, these methods become computationally infeasible as problem sizes grow. In this paper, we decompose epistemic planning by delegating reasoning about epistemic formulae to an external solver. We do this by modelling the problem using Functional STRIPS, which is more expressive than standard STRIPS and supports the use of external, black-box functions within action models. Building on recent work that demonstrates the relationship between what an agent ‘sees’ and what it knows, we define the perspective of each agent using an external function, and build a solver for epistemic logic around this. Modellers can customise the perspective function of agents, allowing new epistemic logics to be defined without changing the planner. We ran evaluations on well-known epistemic planning benchmarks to compare an existing state-of-the-art planner, and on new scenarios that demonstrate the expressiveness of the PWP approach. The results show that our PWP planner scales significantly better than the state-of-the-art planner that we compared against, and can express problems more succinctly."
"13976","Planted Dense Subgraphs in Dense Random Graphs Can Be Recovered using Graph-based Machine Learning","Itay Levinas, Yoram Louzoun","Bar Ilan University, ","https://www.jair.org/index.php/jair/article/download/13976/26854","Multiple methods of finding the vertices belonging to a planted dense subgraph in a random dense G(n, p) graph have been proposed, with an emphasis on planted cliques. Such methods can identify the planted subgraph in polynomial time, but are all limited to several subgraph structures. Here, we present PYGON, a graph neural network-based algorithm, which is insensitive to the structure of the planted subgraph. This is the first algorithm that uses learning tools for recovering dense subgraphs. We show that PYGON can recover cliques of sizes Θ (√ n), where n is the size of the background graph, comparable with the state of the art. We also show that the same algorithm can recover multiple other planted subgraphs of size Θ (√ n), in both directed and undirected graphs. We suggest a conjecture that no polynomial time PAC-learning algorithm can detect planted dense subgraphs with size smaller than O ( √ n), even if in principle one could find dense subgraphs of logarithmic size."
"13833","Mean-Semivariance Policy Optimization via Risk-Averse Reinforcement Learning","Xiaoteng Ma, Shuai Ma, Li Xia, Qianchuan Zhao","Tsinghua University, , , ","https://www.jair.org/index.php/jair/article/download/13833/26855","Keeping risk under control is often more crucial than maximizing expected reward in real-world decision-making situations, such as finance, robotics, autonomous driving, etc. The most natural choice of risk measures is variance, while it penalizes the upside volatility as much as the downside part. Instead, the (downside) semivariance, which captures the negative deviation of a random variable under its mean, is more suitable for risk-averse proposes. This paper aims at optimizing the mean-semivariance (MSV) criterion in reinforcement learning w.r.t. steady rewards. Since semivariance is time-inconsistent and does not satisfy the standard Bellman equation, the traditional dynamic programming methods are inapplicable to MSV problems directly. To tackle this challenge, we resort to the Perturbation Analysis (PA) theory and establish the performance difference formula for MSV. We reveal that the MSV problem can be solved by iteratively solving a sequence of RL problems with a policy-dependent reward function. Further, we propose two on-policy algorithms based on the policy gradient theory and the trust region method. Finally, we conduct diverse experiments from simple bandit problems to continuous control tasks in MuJoCo, which demonstrate the effectiveness of our proposed methods."
"13854","Low-Rank Representation of Reinforcement Learning Policies","Bogdan Mazoure, Thang Doan, Tianyu Li, Vladimir Makarenkov, Joelle Pineau, Doina Precup, Guillaume Rabusseau","McGill University, McGill University, McGill University, UQÀM University, McGill University; Facebook AI Research; CIFAR AI Chair, McGill University; DeepMind; CIFAR AI Chair, Université de Montréal; CIFAR AI Chair","https://www.jair.org/index.php/jair/article/download/13854/26856","We propose a general framework for policy representation for reinforcement learning tasks. This framework involves finding a low-dimensional embedding of the policy on a reproducing kernel Hilbert space (RKHS). The usage of RKHS based methods allows us to derive strong theoretical guarantees on the expected return of the reconstructed policy. Such guarantees are typically lacking in black-box models, but are very desirable in tasks requiring stability and convergence guarantees. We conduct several experiments on classic RL domains. The results confirm that the policies can be robustly represented in a low-dimensional space while the embedded policy incurs almost no decrease in returns."
"13826","Communication-Aware Local Search for Distributed Constraint Optimization","Ben Rachmut, Roie  Zivan, William Yeoh",", Ben Gurion University of the Negev, Washington University in St. Louis","https://www.jair.org/index.php/jair/article/download/13826/26857","Most studies investigating models and algorithms for distributed constraint optimization problems (DCOPs) assume that messages arrive instantaneously and are never lost. Specifically, distributed local search DCOP algorithms, have been designed as synchronous algorithms (i.e., they perform in synchronous iterations in which each agent exchanges messages with all its neighbors), despite running in asynchronous environments. This is true also for an anytime mechanism that reports the best solution explored during the run of synchronous distributed local search algorithms. Thus, when the assumption of perfect communication is relaxed, the properties that were established for the state-of-the-art local search algorithms and the anytime mechanism may not necessarily apply. In this work, we address this limitation by: (1) Proposing a Communication-Aware DCOP model (CA-DCOP) that can represent scenarios with different communication disturbances; (2) Investigating the performance of existing local search DCOP algorithms, specifically Distributed Stochastic Algorithm (DSA) and Maximum Gain Messages (MGM), in the presence of message latency and message loss; (3) Proposing a latency-aware monotonic distributed local search DCOP algorithm; and (4) Proposing an asynchronous anytime framework for reporting the best solution explored by non-monotonic asynchronous local search DCOP algorithms. Our empirical results demonstrate that imperfect communication has a positive effect on distributed local search algorithms due to increased exploration. Furthermore, the asynchronous anytime framework we proposed allows one to benefit from algorithms with inherent explorative heuristics."
"13896","AAN+: Generalized Average Attention Network for Accelerating Neural Transformer","Biao Zhang, Deyi Xiong, Yubin Ge, Junfeng Yao, Hao Yue, Jinsong Su","University of Edinburgh, , , , , ","https://www.jair.org/index.php/jair/article/download/13896/26859","Transformer benefits from the high parallelization of attention networks in fast training, but it still suffers from slow decoding partially due to the linear dependency O(m) of the decoder self-attention on previous target words at inference. In this paper, we propose a generalized average attention network (AAN+) aiming at speeding up decoding by reducing the dependency from O(m) to O(1). We find that the learned self-attention weights in the decoder follow some patterns which can be approximated via a dynamic structure. Based on this insight, we develop AAN+, extending our previously proposed average attention (Zhang et al., 2018a, AAN) to support more general position- and content-based attention patterns. AAN+ only requires to maintain a small constant number of hidden states during decoding, ensuring its O(1) dependency. We apply AAN+ as a drop-in replacement of the decoder selfattention and conduct experiments on machine translation (with diverse language pairs), table-to-text generation and document summarization. With masking tricks and dynamic programming, AAN+ enables Transformer to decode sentences around 20% faster without largely compromising in the training speed and the generation performance. Our results further reveal the importance of the localness (neighboring words) in AAN+ and its capability in modeling long-range dependency."
"13754","DeepSym: Deep Symbol Generation and Rule Learning for Planning from Unsupervised Robot Interaction","Alper Ahmetoglu, M. Yunus Seker, Justus Piater, Erhan Oztop, Emre Ugur","Bogazici University, Bogazici University, University of Innsbruck, Osaka University, Ozyegin University, Bogazici University","https://www.jair.org/index.php/jair/article/download/13754/26858","Symbolic planning and reasoning are powerful tools for robots tackling complex tasks. However, the need to manually design the symbols restrict their applicability, especially for robots that are expected to act in open-ended environments. Therefore symbol formation and rule extraction should be considered part of robot learning, which, when done properly, will offer scalability, flexibility, and robustness. Towards this goal, we propose a novel general method that finds action-grounded, discrete object and effect categories and builds probabilistic rules over them for non-trivial action planning. Our robot interacts with objects using an initial action repertoire that is assumed to be acquired earlier and observes the effects it can create in the environment. To form action-grounded object, effect, and relational categories, we employ a binary bottleneck layer in a predictive, deep encoderdecoder network that takes the image of the scene and the action applied as input, and generates the resulting effects in the scene in pixel coordinates. After learning, the binary latent vector represents action-driven object categories based on the interaction experience of the robot. To distill the knowledge represented by the neural network into rules useful for symbolic reasoning, a decision tree is trained to reproduce its decoder function. Probabilistic rules are extracted from the decision paths of the tree and are represented in the Probabilistic Planning Domain Definition Language (PPDDL), allowing off-the-shelf planners to operate on the knowledge extracted from the sensorimotor experience of the robot. The deployment of the proposed approach for a simulated robotic manipulator enabled the discovery of discrete representations of object properties such as ‘rollable’ and ‘insertable’. In turn, the use of these representations as symbols allowed the generation of effective plans for achieving goals, such as building towers of the desired height, demonstrating the effectiveness of the approach for multi-step object manipulation. Finally, we demonstrate that the system is not only restricted to the robotics domain by assessing its applicability to the MNIST 8-puzzle domain in which learned symbols allow for the generation of plans that move the empty tile into any given position."
"13685","Solving the Watchman Route Problem with Heuristic Search","Shawn Skyler, Dor Atzmon, Tamir Yaffe, Ariel Felner","Ben-Gurion University, Ben-Gurion University, Ben-Gurion University, ","https://www.jair.org/index.php/jair/article/download/13685/26860","This paper solves the Watchman Route Problem (WRP) on a general discrete graph with Heuristic Search. Given a graph, a line-of-sight (LOS) function, and a start vertex, the task is to (offline) find a (shortest) path through the graph such that all vertices in the graph will be visually seen by at least one vertex on the path. WRP is reminiscent but different from graph covering and mapping problems, which are done online on an unknown graph. We formalize WRP as a heuristic search problem and solve it optimally with an A*-based algorithm. We develop a series of admissible heuristics with increasing difficulty and accuracy. Our heuristics abstract the underlying graph into a disjoint line-of-sight graph (GDLS) which is based on disjoint clusters of vertices such that vertices within the same cluster have LOS to the same specific vertex. We use solutions for the Minimum Spanning Tree (MST) and the Traveling Salesman Problem (TSP) of GDLS as admissible heuristics for WRP. We theoretically and empirically investigate these heuristics. Then, we show how the optimal methods can be modified (by intelligently pruning away large sub-trees) to obtain various suboptimal solvers with and without bound guarantees. These suboptimal solvers are much faster and expand fewer nodes than the optimal solver with only minor reduction in the quality of the solution."
"13787","Computational Short Cuts in Infinite Domain Constraint Satisfaction","Peter Jonsson, Victor Lagerkvist, Sebastian Ordyniak",", Linköping University, ","https://www.jair.org/index.php/jair/article/download/13787/26861","A backdoor in a finite-domain CSP instance is a set of variables where each possible instantiation moves the instance into a polynomial-time solvable class. Backdoors have found many applications in artificial intelligence and elsewhere, and the algorithmic problem of finding such backdoors has consequently been intensively studied. Sioutis and Janhunen have proposed a generalised backdoor concept suitable for infinite-domain CSP instances over binary constraints. We generalise their concept into a large class of CSPs that allow for higher-arity constraints. We show that this kind of infinite-domain backdoors have many of the positive computational properties that finite-domain backdoors have: the associated computational problems are fixed parameter tractable whenever the underlying constraint language is finite. On the other hand, we show that infinite languages make the problems considerably harder: the general backdoor detection problem is W[2]-hard and fixed-parameter tractability is ruled out under standard complexity-theoretic assumptions. We demonstrate that backdoors may have suboptimal behaviour on binary constraints—this is detrimental from an AI perspective where binary constraints are predominant in, for instance, spatiotemporal applications. In response to this, we introduce sidedoors as an alternative to backdoors. The fundamental computational problems for sidedoors remain fixed-parameter tractable for finite constraint language (possibly also containing non-binary relations). Moreover, the sidedoor approach has appealing computational properties that sometimes leads to faster algorithms than the backdoor approach."
"14019","Interpretable Local Concept-based Explanation with Human Feedback to Predict All-cause Mortality","Radwa EL Shawi, Mouaz H. Al-Mallah","Tartu University, ","https://www.jair.org/index.php/jair/article/download/14019/26862","Machine learning models are incorporated in different fields and disciplines in which some of them require a high level of accountability and transparency, for example, the healthcare sector. With the General Data Protection Regulation (GDPR), the importance for plausibility and verifiability of the predictions made by machine learning models has become essential. A widely used category of explanation techniques attempts to explain models’ predictions by quantifying the importance score of each input feature. However, summarizing such scores to provide human-interpretable explanations is challenging. Another category of explanation techniques focuses on learning a domain representation in terms of high-level human-understandable concepts and then utilizing them to explain predictions. These explanations are hampered by how concepts are constructed, which is not intrinsically interpretable. To this end, we propose Concept-based Local Explanations with Feedback (CLEF), a novel local model agnostic explanation framework for learning a set of high-level transparent concept definitions in high-dimensional tabular data that uses clinician-labeled concepts rather than raw features. CLEF maps the raw input features to high-level intuitive concepts and then decompose the evidence of prediction of the instance being explained into concepts. In addition, the proposed framework generates counterfactual explanations, suggesting the minimum changes in the instance’s concept based explanation that will lead to a different prediction. We demonstrate with simulated user feedback on predicting the risk of mortality. Such direct feedback is more effective than other techniques, that rely on hand-labelled or automatically extracted concepts, in learning concepts that align with ground truth concept definitions."
"13864","Creative Problem Solving in Artificially Intelligent Agents: A Survey and Framework","Evana Gizzi, Lakshmi Nair, Sonia Chernova, Jivko Sinapov",", , , ","https://www.jair.org/index.php/jair/article/download/13864/26863","Creative Problem Solving (CPS) is a sub-area within Artificial Intelligence (AI) that focuses on methods for solving off-nominal, or anomalous problems in autonomous systems. Despite many advancements in planning and learning, resolving novel problems or adapting existing knowledge to a new context, especially in cases where the environment may change in unpredictable ways post deployment, remains a limiting factor in the safe and useful integration of intelligent systems. The emergence of increasingly autonomous systems dictates the necessity for AI agents to deal with environmental uncertainty through creativity. To stimulate further research in CPS, we present a definition and a framework of CPS, which we adopt to categorize existing AI methods in this field. Our framework consists of four main components of a CPS problem, namely, 1) problem formulation, 2) knowledge representation, 3) method of knowledge manipulation, and 4) method of evaluation. We conclude our survey with open research questions, and suggested directions for the future."
"13778","Fair in the Eyes of Others","Parham Shams, Aurélie Beynier, Sylvain Bouveret, Nicolas Maudet","LIP6, Sorbonne Université, , LIG, Université Grenoble Alpes, LIP6, Sorbonne University","https://www.jair.org/index.php/jair/article/download/13778/26864","Envy-freeness is a widely studied notion in resource allocation, capturing some aspects of fairness. The notion of envy being inherently subjective though, it might be the case that an agent envies another agent, but that from the other agents' point of view, she has no reason to do so. The difficulty here is to define the notion of objectivity, since no ground-truth can properly serve as a basis of this definition. A natural approach is to consider the judgement of the other agents as a proxy for objectivity. Building on previous work by Parijs (who introduced ""unanimous envy"") we propose the notion of approval envy: an agent ai experiences approval envy towards aj if she is envious of aj, and sufficiently many agents agree that this should be the case, from their own perspectives. Another thoroughly studied notion in resource allocation is proportionality. The same variant can be studied, opening natural questions regarding the links between these two notions. We exhibit several properties of these notions. Computing the minimal threshold guaranteeing approval envy and approval non-proportionality clearly inherits well-known intractable results from envy-freeness and proportionality, but (i) we identify some tractable cases such as house allocation; and (ii) we provide a general method based on a mixed integer programming encoding of the problem, which proves to be efficient in practice. This allows us in particular to show experimentally that existence of such allocations, with a rather small threshold, is very often observed."
"14015","Initialization of Feature Selection Search for Classification","Maria Luque-Rodriguez, Jose Molina-Baena, Alfonso Jimenez-Vilchez, Antonio Arauzo-Azofra","Universidad de Cordoba, Universidad de Cordoba, Universidad de Cordoba, Universidad de Cordoba","https://www.jair.org/index.php/jair/article/download/14015/26865","Selecting the best features in a dataset improves accuracy and efficiency of classifiers  in a learning process. Datasets generally have more features than necessary, some of  them being irrelevant or redundant to others. For this reason, numerous feature selection  methods have been developed, in which different evaluation functions and measures are  applied. This paper proposes the systematic application of individual feature evaluation  methods to initialize search-based feature subset selection methods. An exhaustive review  of the starting methods used by genetic algorithms from 2014 to 2020 has been carried out.  Subsequently, an in-depth empirical study has been carried out evaluating the proposal for  different search-based feature selection methods (Sequential forward and backward selection,  Las Vegas filter and wrapper, Simulated Annealing and Genetic Algorithms). Since  the computation time is reduced and the classification accuracy with the selected features  is improved, the initialization of feature selection proposed in this work is proved to be  worth considering while designing any feature selection algorithms."
"13794","Reinforcement Learning from Optimization Proxy for Ride-Hailing Vehicle Relocation","Enpeng Yuan, Wenbo Chen, Pascal Van Hentenryck",", Georgia Tech, Georgia Tech","https://www.jair.org/index.php/jair/article/download/13794/26867","Idle vehicle relocation is crucial for addressing demand-supply imbalance that frequently arises in the ride-hailing system. Current mainstream methodologies - optimization and reinforcement learning - suffer from obvious computational drawbacks. Optimization models need to be solved in real-time and often trade off model fidelity (hence quality of solutions) for computational efficiency. Reinforcement learning is expensive to train and often struggles to achieve coordination among a large fleet. This paper designs a hybrid approach that leverages the strengths of the two while overcoming their drawbacks. Specifically, it trains an optimization proxy, i.e., a machine-learning model that approximates an optimization model, and then refines the proxy with reinforcement learning. This Reinforcement Learning from Optimization Proxy (RLOP) approach is computationally efficient to train and deploy, and achieves better results than RL or optimization alone. Numerical experiments on the New York City dataset show that the RLOP approach reduces both the relocation costs and computation time significantly compared to the optimization model, while pure reinforcement learning fails to converge due to computational complexity."
"13769","Asymmetric Action Abstractions for Planning in Real-Time Strategy Games","Rubens O. Moraes, Mario A. Nascimento, Levi H.S. Lelis",", , University of Alberta","https://www.jair.org/index.php/jair/article/download/13769/26868","Action abstractions restrict the number of legal actions available for real-time planning in zero-sum extensive-form games, thus allowing algorithms to focus their search on a set of promising actions. Even though unabstracted game trees can lead to optimal policies, due to real-time constraints and the tree size, they are not a practical choice. In this context, we introduce an action abstraction scheme which we call asymmetric action abstraction. Asymmetric abstractions allow search algorithms to “pay more attention” to some aspects of the game by unevenly dividing the algorithm’s search effort amongst different aspects of the game. We also introduce four algorithms that search in asymmetrically abstracted game trees to evaluate the effectiveness of our abstraction schemes. Two of our algorithms are adaptations of algorithms developed for searching in action-abstracted spaces, Portfolio Greedy Search and Stratified Strategy Selection, and the other two are adaptations of an algorithm developed for searching in unabstracted spaces, NaïveMCTS. An extensive set of experiments in a real-time strategy game shows that search algorithms using asymmetric abstractions are able to outperform all other search algorithms tested."
"13734","Learning to Design Fair and Private Voting Rules","Farhad Mohsin, Ao Liu, Pin-Yu Chen, Francesca Rossi, Lirong Xia","Rensselaer Polytechnic Institute, , IBM Research, IBM Research, Rensselaer Polytechnic Institute","https://www.jair.org/index.php/jair/article/download/13734/26869","Voting is used widely to identify a collective decision for a group of agents, based on their preferences. In this paper, we focus on evaluating and designing voting rules that support both the privacy of the voting agents and a notion of fairness over such agents. To do this, we introduce a novel notion of group fairness and adopt the existing notion of local differential privacy. We then evaluate the level of group fairness in several existing voting rules, as well as the trade-offs between fairness and privacy, showing that it is not possible to always obtain maximal economic efficiency with high fairness or high privacy levels. Then, we present both a machine learning and a constrained optimization approach to design new voting rules that are fair while maintaining a high level of economic efficiency. Finally, we empirically examine the effect of adding noise to create local differentially private voting rules and discuss the three-way trade-off between economic efficiency, fairness, and privacy. This paper appears in the special track on AI & Society."
"13865","Strategy Graphs for Influence Diagrams","Eric A. Hansen, Jinchuan Shi, James Kastrantas","Mississippi State University, Mississippi State University, Mississippi State University","https://www.jair.org/index.php/jair/article/download/13865/26870","An influence diagram is a graphical model of a Bayesian decision problem that is solved by finding a strategy that maximizes expected utility. When an influence diagram is solved by variable elimination or a related dynamic programming algorithm, it is traditional to represent a strategy as a sequence of policies, one for each decision variable, where a policy maps the relevant history for a decision to an action. We propose an alternative representation of a strategy as a graph, called a strategy graph, and show how to modify a variable elimination algorithm so that it constructs a strategy graph. We consider both a classic variable elimination algorithm for influence diagrams and a recent extension of this algorithm that has more relaxed constraints on elimination order that allow improved performance. We consider the advantages of representing a strategy as a graph and, in particular, how to simplify a strategy graph so that it is easier to interpret and analyze."
"13511","First-Order Rewritability and Complexity of Two-Dimensional Temporal Ontology-Mediated Queries","Alessandro Artale, Roman Kontchakov, Alisa Kovtunova, Vladislav Ryzhikov, Frank Wolter, Michael Zakharyaschev",", , , , , ","https://www.jair.org/index.php/jair/article/download/13511/26871","Aiming at ontology-based data access to temporal data, we design two-dimensional temporal ontology and query languages by combining logics from the (extended) DL-Lite family with linear temporal logic LTL over discrete time (Z,<). Our main concern is first-order rewritability of ontology-mediated queries (OMQs) that consist of a 2D ontology and a positive temporal instance query. Our target languages for FO-rewritings are two-sorted FO(<)—first-order logic with sorts for time instants ordered by the built-in precedence relation < and for the domain of individuals—its extension FO(<,≡) with the standard congruence predicates t ≡ 0 (mod n), for any fixed n > 1, and FO(RPR) that admits relational primitive recursion. In terms of circuit complexity, FO(<,≡)- and FO(RPR)-rewritability guarantee answering OMQs in uniform AC0 and NC1, respectively. We proceed in three steps. First, we define a hierarchy of 2D DL-Lite/LTL ontology languages and investigate the FO-rewritability of OMQs with atomic queries by constructing projections onto 1D LTL OMQs and employing recent results on the FO-rewritability of propositional LTL OMQs. As the projections involve deciding consistency of ontologies and data, we also consider the consistency problem for our languages. While the undecidability of consistency for 2D ontology languages with expressive Boolean role inclusions might be expected, we also show that, rather surprisingly, the restriction to Krom and Horn role inclusions leads to decidability (and ExpSpace-completeness), even if one admits full Booleans on concepts. As a final step, we lift some of the rewritability results for atomic OMQs to OMQs with expressive positive temporal instance queries. The lifting results are based on an in-depth study of the canonical models and only concern Horn ontologies."
"13639","Towards Evidence Retrieval Cost Reduction in Abstract Argumentation Frameworks with Fallible Evidence","Andrea Cohen, Sebastian Gottifredi, Alejandro J. García, Guillermo R. Simari","Universidad Nacional del Sur, , , ","https://www.jair.org/index.php/jair/article/download/13639/26872","Arguments in argumentation systems cannot always be considered as standalone entities, requiring the consideration of the pieces of evidence they rely on. This evidence might have to be retrieved from external sources such as databases or the web, and each attempt to retrieve a piece of evidence comes with an associated cost. Moreover, a piece of evidence may be available in a given scenario but not in others, and this is not known beforehand. As a result, the collection of active arguments (whose entire set of evidence is available) that can be used by the argumentation machinery of the system may vary from one scenario to another. In this work, we consider an Abstract Argumentation Framework with Fallible Evidence that accounts for these issues, and propose a heuristic measure used as part of the acceptability calculus (specifically, for building pruned dialectical trees) with the aim of minimizing the evidence retrieval cost of the arguments involved in the reasoning process. We provide an algorithmic solution that is empirically tested against two baselines and formally show the correctness of our approach."
"13636","Chance-constrained Static Schedules for Temporally Probabilistic Plans","Cheng Fang, Andrew J. Wang, Brian C. Williams","Massachusetts Institute of Technology, MIT, CSAIL, MIT","https://www.jair.org/index.php/jair/article/download/13636/26873","Time management under uncertainty is essential to large scale projects. From space exploration to industrial production, there is a need to schedule and perform activities. given complex specifications on timing. In order to generate schedules that are robust to uncertainty in the duration of activities, prior work has focused on a problem framing that uses an interval-bounded uncertainty representation. However, such approaches are unable to take advantage of known probability distributions over duration. In this paper we concentrate on a probabilistic formulation of temporal problems with uncertain duration, called the probabilistic simple temporal problem. As distributions often have an unbounded range of outcomes, we consider chance-constrained solutions, with guarantees on the probability of meeting temporal constraints. By considering distributions over uncertain duration, we are able to use risk as a resource, reason over the relative likelihood of outcomes, and derive higher utility solutions. We first demonstrate our approach by encoding the problem as a convex program. We then develop a more efficient hybrid algorithm whose parent solver generates risk allocations and whose child solver generates schedules for a particular risk allocation. The child is made efficient by leveraging existing interval-bounded scheduling algorithms, while the parent is made efficient by extracting conflicts over risk allocations. We perform numerical experiments to show the advantages of reasoning over probabilistic uncertainty, by comparing the utility of schedules generated with risk allocation against those generated from reasoning over bounded uncertainty. We also empirically show that solution time is greatly reduced by incorporating conflict-directed risk allocation."
"13811","Proofs and Certificates for Max-SAT","Matthieu Py, Mohamed Sami Cherif, Djamal Habet","Aix-Marseille University, LIS, , ","https://www.jair.org/index.php/jair/article/download/13811/26877","Current Max-SAT solvers are able to efficiently compute the optimal value of an input instance but they do not provide any certificate of its validity. In this paper, we present a tool, called MS-Builder, which generates certificates for the Max-SAT problem in the particular form of a sequence of equivalence-preserving transformations. To generate a certificate, MS-Builder iteratively calls a SAT oracle to get a SAT resolution refutation which is handled and adapted into a sound refutation for Max-SAT. In particular, we prove that the size of the computed Max-SAT refutation is linear with respect to the size of the initial refutation if it is semi-read-once, tree-like regular, tree-like or semi-tree-like. Additionally, we propose an extendable tool, called MS-Checker, able to verify the validity of any Max-SAT certificate using Max-SAT inference rules. Both tools are evaluated on the unweighted and weighted benchmark instances of the 2020 Max-SAT Evaluation."
"13673","Towards Continual Reinforcement Learning: A Review and Perspectives","Khimya  Khetarpal, Matthew Riemer, Irina Rish, Doina Precup",", IBM Research, Mila, University of Montreal, , ","https://www.jair.org/index.php/jair/article/download/13673/26878","In this article, we aim to provide a literature review of different formulations and approaches to continual reinforcement learning (RL), also known as lifelong or non-stationary RL. We begin by discussing our perspective on why RL is a natural fit for studying continual learning. We then provide a taxonomy of different continual RL formulations by mathematically characterizing two key properties of non-stationarity, namely, the scope and driver non-stationarity. This offers a unified view of various formulations. Next, we review and present a taxonomy of continual RL approaches. We go on to discuss evaluation of continual RL agents, providing an overview of benchmarks used in the literature and important metrics for understanding agent performance. Finally, we highlight open problems and challenges in bridging the gap between the current state of continual RL and findings in neuroscience. While still in its early days, the study of continual RL has the promise to develop better incremental reinforcement learners that can function in increasingly realistic applications where non-stationarity plays a vital role. These include applications such as those in the fields of healthcare, education, logistics, and robotics."
"14034","The LM-Cut Heuristic Family for Optimal Numeric Planning with Simple Conditions","Ryo Kuroiwa, Alexander Shleyfman, Chiara Piacentini, Margarita P. Castro, J. Christopher Beck","University of Toronto, Technion - Israel Institute of Technology, Augmenta Inc, Pontificia Universidad Católica de Chile, University of Toronto","https://www.jair.org/index.php/jair/article/download/14034/26880","The LM-cut heuristic, both alone and as part of the operator counting framework, represents one of the most successful heuristics for classical planning. In this paper, we generalize LM-cut and its use in operator counting to optimal numeric planning with simple conditions and simple numeric effects, i.e., linear expressions over numeric state variables and actions that increase or decrease such variables by constant quantities. We introduce a variant of hmaxhbd (a previously proposed numeric hmax heuristic) based on the delete-relaxed version of such planning tasks and show that, although inadmissible by itself, our variant yields a numeric version of the classical LM-cut heuristic which is admissible. We classify the three existing families of heuristics for this class of numeric planning tasks and introduce the LM-cut family, proving dominance or incomparability between all pairs of existing max and LM-cut heuristics for numeric planning with simple conditions. Our extensive empirical evaluation shows that the new LM-cut heuristic, both on its own and as part of the operator counting framework, is the state-of-the-art for this class of numeric planning problem."
"13683","Data-Driven Revision of Conditional Norms in Multi-Agent Systems","Davide Dell'Anna, Natasha Alechina, Fabiano Dalpiaz, Mehdi Dastani, Brian Logan","Utrecht University, , , , ","https://www.jair.org/index.php/jair/article/download/13683/26879","In multi-agent systems, norm enforcement is a mechanism for steering the behavior of individual agents in order to achieve desired system-level objectives. Due to the dynamics of multi-agent systems, however, it is hard to design norms that guarantee the achievement of the objectives in every operating context. Also, these objectives may change over time, thereby making previously defined norms ineffective. In this paper, we investigate the use of system execution data to automatically synthesise and revise conditional prohibitions with deadlines, a type of norms aimed at prohibiting agents from exhibiting certain patterns of behaviors. We propose DDNR (Data-Driven Norm Revision), a data-driven approach to norm revision that synthesises revised norms with respect to a data set of traces describing the behavior of the agents in the system. We evaluate DDNR using a state-of-the-art, off-the-shelf urban traffic simulator. The results show that DDNR synthesises revised norms that are significantly more accurate than the original norms in distinguishing adequate and inadequate behaviors for the achievement of the system-level objectives."
"13791","TOOLTANGO: Common sense Generalization in Predicting Sequential Tool Interactions for Robot Plan Synthesis","Shreshth Tuli, Rajas Bansal, Rohan Paul, Mausam",", Stanford University, Indian Institute of Technology Delhi, Indian Institute of Technology Delhi","https://www.jair.org/index.php/jair/article/download/13791/26881","Robots assisting us in environments such as factories or homes must learn to make use of objects as tools to perform tasks, for instance, using a tray to carry objects. We consider the problem of learning common sense knowledge of when a tool may be useful and how its use may be composed with other tools to accomplish a high-level task instructed by a human. Specifically, we introduce a novel neural model, termed TOOLTANGO, that first predicts the next tool to be used, and then uses this information to predict the next action. We show that this joint model can inform learning of a fine-grained policy enabling the robot to use a particular tool in sequence and adds a significant value in making the model more accurate. TOOLTANGO encodes the world state, comprising objects and symbolic relationships between them, using a graph neural network and is trained using demonstrations from human teachers instructing a virtual robot in a physics simulator. The model learns to attend over the scene using knowledge of the goal and the action history, finally decoding the symbolic action to execute. Crucially, we address generalization to unseen environments where some known tools are missing, but unseen alternative tools are present. We show that by augmenting the representation of the environment with pre-trained embeddings derived from a knowledge-base, the model can generalize effectively to novel environments. Experimental results show at least 48.8-58.1% absolute improvement over the baselines in predicting successful symbolic plans for a simulated mobile manipulator in novel environments with unseen objects. This work takes a step in the direction of enabling robots to rapidly synthesize robust plans for complex tasks, particularly in novel settings."
"13922","Automated Dynamic Algorithm Configuration","Steven Adriaensen, André Biedenkapp, Gresa Shala, Noor Awad, Theresa Eimer, Marius Lindauer, Frank Hutter","University of Freiburg, Machine Learning Lab, University of Freiburg, Machine Learning Lab, University of Freiburg, Machine Learning Lab, University of Freiburg, Machine Learning Lab, Leibniz University Hannover, Institute for Information Processing, Leibniz University Hannover, Institute for Information Processing, University of Freiburg, Machine Learning Lab & Bosch Center for Artificial Intelligence","https://www.jair.org/index.php/jair/article/download/13922/26882","The performance of an algorithm often critically depends on its parameter configuration. While a variety of automated algorithm configuration methods have been proposed to relieve users from the tedious and error-prone task of manually tuning parameters, there is still a lot of untapped potential as the learned configuration is static, i.e., parameter settings remain fixed throughout the run. However, it has been shown that some algorithm parameters are best adjusted dynamically during execution. Thus far, this is most commonly achieved through hand-crafted heuristics. A promising recent alternative is to automatically learn such dynamic parameter adaptation policies from data. In this article, we give the first comprehensive account of this new field of automated dynamic algorithm configuration (DAC), present a series of recent advances, and provide a solid foundation for future research in this field. Specifically, we (i) situate DAC in the broader historical context of AI research; (ii) formalize DAC as a computational problem; (iii) identify the methods used in prior art to tackle this problem; and (iv) conduct empirical case studies for using DAC in evolutionary optimization, AI planning, and machine learning."
"14195","The Complexity of Network Satisfaction Problems for Symmetric Relation Algebras with a Flexible Atom","Manuel Bodirsky, Simon Knäuer",", TU Dresden","https://www.jair.org/index.php/jair/article/download/14195/26883","Robin Hirsch posed in 1996 the Really Big Complexity Problem: classify the computational complexity of the network satisfaction problem for all finite relation algebras A. We provide a complete classification for the case that A is symmetric and has a fexible atom; in this case, the problem is NP-complete or in P. The classification task can be reduced to the case where A is integral. If a finite integral relation algebra has a flexible atom, then it has a normal representation B. We can then study the computational complexity of the network satisfaction problem of A using the universal-algebraic approach, via an analysis of the polymorphisms of B. We also use a Ramsey-type result of Nešetřil and Rödl and a complexity dichotomy result of Bulatov for conservative finite-domain constraint satisfaction problems."
