"index","Title","Authors","Affiliations","pdf_url","abstract"
"15195","Undesirable Biases in NLP: Addressing Challenges of Measurement","Oskar van der Wal, Dominik Bachmann, Alina Leidinger, Leendert van Maanen, Willem Zuidema, Katrin Schulz","Institute for Logic, Language and Computation, University of Amsterdam, Institute for Logic, Language and Computation, University of Amsterdam and Experimental Psychology, Helmholtz Institute, Utrecht University, Institute for Logic, Language and Computation, University of Amsterdam and Experimental Psychology, Helmholtz Institute, Utrecht University, Experimental Psychology, Helmholtz Institute, Utrecht University, Institute for Logic, Language and Computation, University of Amsterdam and Experimental Psychology, Helmholtz Institute, Utrecht University, Institute for Logic, Language and Computation, University of Amsterdam","https://www.jair.org/index.php/jair/article/download/15195/26998","As Large Language Models and Natural Language Processing (NLP) technology rapidly develop and spread into daily life, it becomes crucial to anticipate how their use could harm people. One problem that has received a lot of attention in recent years is that this technology has displayed harmful biases, from generating derogatory stereotypes to producing disparate outcomes for different social groups. Although a lot of effort has been invested in assessing and mitigating these biases, our methods of measuring the biases of NLP models have serious problems and it is often unclear what they actually measure. In this paper, we provide an interdisciplinary approach to discussing the issue of NLP model bias by adopting the lens of psychometrics — a field specialized in the measurement of concepts like bias that are not directly observable. In particular, we will explore two central notions from psychometrics, the construct validity and the reliability of measurement tools, and discuss how they can be applied in the context of measuring model bias. Our goal is to provide NLP practitioners with methodological tools for designing better bias measures, and to inspire them more generally to explore tools from psychometrics when working on bias measurement tools. This article appears in the AI & Society track."
"15315","The AI Race: Why Current Neural Network-based Architectures are a Poor Basis for Artificial General Intelligence","Jérémie Sublime","ISEP - School of Digital Engineers","https://www.jair.org/index.php/jair/article/download/15315/26999","Artificial General Intelligence is the idea that someday an hypothetical agent will arise from artificial intelligence (AI) progresses, and will surpass by far the brightest and most gifted human minds. This idea has been around since the early development of AI. Since then, scenarios on how such AI may behave towards humans have been the subject of many fictional and research works. This paper analyzes the current state of artificial intelligence progresses, and how the current AI race with the ever faster release of impressive new AI methods (that can deceive humans, outperform them at tasks we thought impossible to tackle by AI a mere decade ago, and that disrupt the job market) have raised concerns that Artificial General Intelligence (AGI) might be coming faster that we thought. In particular, we focus on 3 specific families of modern AIs to develop the idea that deep neural networks, which are the current backbone of nearly all artificial intelligence methods, are poor candidates for any AGI to arise due to their many limitations, and therefore that any threat coming from the recent AI race does not lie in AGI but in the limitations, uses, and lack of regulations of our current models and algorithms. This article appears in the AI & Society track."
"14879","Principles and their Computational Consequences for Argumentation Frameworks with Collective Attacks","Wolfgang Dvořák, Matthias König, Markus Ulbricht, Stefan Woltran","TU Wien, , Leipzig University, Department of Computer Science, TU Wien","https://www.jair.org/index.php/jair/article/download/14879/27000","Argumentation frameworks (AFs) are a key formalism in AI research. Their semantics have been investigated in terms of principles, which define characteristic properties in order to deliver guidance for analyzing established and developing new semantics. Because of the simple structure of AFs, many desired properties hold almost trivially, at the same time hiding interesting concepts behind syntactic notions. We extend the principle-based approach to argumentation frameworks with collective attacks (SETAFs) and provide a comprehensive overview of common principles for their semantics. Our analysis shows that investigating principles based on decomposing the given SETAF (e.g. directionality or SCC-recursiveness) poses additional challenges in comparison to usual AFs. We introduce the notion of the reduct as well as the modularization principle for SETAFs which will prove beneficial for this kind of investigation. We then demonstrate how our findings can be utilized for incremental computation of extensions and show how we can use graph properties of the frameworks to speed up these algorithms."
"15057","Right Place, Right Time: Proactive Multi-Robot Task Allocation Under Spatiotemporal Uncertainty","Charlie Street, Bruno Lacerda, Manuel Mühlig, Nick Hawes","School of Computer Science, University of Birmingham, Oxford Robotics Institute, University of Oxford, Honda Research Institute Europe GmbH, Oxford Robotics Institute, University of Oxford","https://www.jair.org/index.php/jair/article/download/15057/27001","For many multi-robot problems, tasks are announced during execution, where task announcement times and locations are uncertain. To synthesise multi-robot behaviour that is robust to early announcements and unexpected delays, multi-robot task allocation methods must explicitly model the stochastic processes that govern task announcement. In this paper, we model task announcement using continuous-time Markov chains which predict when and where tasks will be announced. We then present a task allocation framework which uses the continuous-time Markov chains to allocate tasks proactively, such that robots are near or at the task location upon its announcement. Our method seeks to minimise the expected total waiting duration for each task, i.e. the duration between task announcement and a robot beginning to service the task. Our framework can be applied to any multi-robot task allocation problem where robots complete spatiotemporal tasks which are announced stochastically. We demonstrate the efficacy of our approach in simulation, where we outperform baselines which do not allocate tasks proactively, or do not fully exploit our task announcement models."
"15185","Visually Grounded Language Learning: A Review of Language Games, Datasets, Tasks, and Models","Alessandro Suglia, Ioannis Konstas, Oliver Lemon",", Heriot-Watt University, Heriot-Watt University","https://www.jair.org/index.php/jair/article/download/15185/27002","In recent years, several machine learning models have been proposed. They are trained with a language modelling objective on large-scale text-only data. With such pretraining, they can achieve impressive results on many Natural Language Understanding and Generation tasks. However, many facets of meaning cannot be learned by “listening to the radio” only. In the literature, many Vision+Language (V+L) tasks have been defined with the aim of creating models that can ground symbols in the visual modality. In this work, we provide a systematic literature review of several tasks and models proposed in the V+L field. We rely on Wittgenstein’s idea of ‘language games’ to categorise such tasks into 3 different families: 1) discriminative games, 2) generative games, and 3) interactive games. Our analysis of the literature provides evidence that future work should be focusing on interactive games where communication in Natural Language is important to resolve ambiguities about object referents and action plans and that physical embodiment is essential to understand the semantics of situations and events. Overall, these represent key requirements for developing grounded meanings in neural models."
"14752","Query-driven Qualitative Constraint Acquisition","Mohamed-Bachir Belaid, Nassim Belmecheri, Arnaud Gotlieb, Nadjib Lazaar, Helge Spieker","NILU, Norwegian Institute for Air Research, PO Box 100, 2027, Kjeller, Norway., Simula Research Laboratory, Oslo, Norway, Simula Research Laboratory, Oslo, Norway, LIRMM, University of Montpellier, CNRS, Montpellier, France., Simula Research Laboratory, Oslo, Norway.","https://www.jair.org/index.php/jair/article/download/14752/27003","Many planning, scheduling or multi-dimensional packing problems involve the design of subtle logical combinations of temporal or spatial constraints. Recently, we introduced GEQCA-I, which stands for Generic Qualitative Constraint Acquisition, as a new active constraint acquisition method for learning qualitative constraints using qualitative queries. In this paper, we revise and extend GEQCA-I to GEQCA-II with a new type of query, universal query, for qualitative constraint acquisition, with a deeper query-driven acquisition algorithm. Our extended experimental evaluation shows the efficiency and usefulness of the concept of universal query in learning randomly-generated qualitative networks, including both temporal networks based on Allen’s algebra and spatial networks based on region connection calculus. We also show the effectiveness of GEQCA-II in learning the qualitative part of real scheduling problems."
"15762","Detecting Change Intervals with Isolation Distributional Kernel","Yang Cao, Ye Zhu, Kai Ming Ting, Flora D. Salim, Hong Xian Li, Luxing Yang, Gang Li",", Deakin University, , , , , ","https://www.jair.org/index.php/jair/article/download/15762/27004","Detecting abrupt changes in data distribution is one of the most significant tasks in streaming data analysis. Although many unsupervised Change-Point Detection (CPD) methods have been proposed recently to identify those changes, they still suffer from missing subtle changes, poor scalability, or/and sensitivity to outliers. To meet these challenges, we are the first to generalise the CPD problem as a special case of the Change-Interval Detection (CID) problem. Then we propose a CID method, named iCID, based on a recent Isolation Distributional Kernel (IDK). iCID identifies the change interval if there is a high dissimilarity score between two non-homogeneous temporal adjacent intervals. The data-dependent property and finite feature map of IDK enabled iCID to efficiently identify various types of change-points in data streams with the tolerance of outliers. Moreover, the proposed online and offline versions of iCID have the ability to optimise key parameter settings. The effectiveness and efficiency of iCID have been systematically verified on both synthetic and real-world datasets."
"15248","Boolean Observation Games","Hans van Ditmarsch, Sunil Simon",", IIT Kanpur","https://www.jair.org/index.php/jair/article/download/15248/27005","We introduce Boolean Observation Games, a subclass of multi-player finite strategic games with incomplete information and qualitative objectives. In Boolean observation games, each player is associated with a finite set of propositional variables of which only it can observe the value, and it controls whether and to whom it can reveal that value. It does not control the given, fixed, value of variables. Boolean observation games are a generalization of Boolean games, a well-studied subclass of strategic games but with complete information, and wherein each player controls the value of its variables. In Boolean observation games, player goals describe multi-agent knowledge of variables. As in classical strategic games, players choose their strategies simultaneously and therefore observation games capture aspects of both imperfect and incomplete information. They require reasoning about sets of outcomes given sets of indistinguishable valuations of variables. An outcome relation between such sets determines what the Nash equilibria are. We present various outcome relations, including a qualitative variant of ex-post equilibrium. We identify conditions under which, given an outcome relation, Nash equilibria are guaranteed to exist. We also study the complexity of checking for the existence of Nash equilibria and of verifying if a strategy profile is a Nash equilibrium. We further study the subclass of Boolean observation games with ‘knowing whether’ goal formulas, for which the satisfaction does not depend on the value of variables. We show that each such Boolean observation game corresponds to a Boolean game and vice versa, by a different correspondence, and that both correspondences are precise in terms of existence of Nash equilibria."
"15348","Human-in-the-Loop Reinforcement Learning: A Survey and Position on Requirements, Challenges, and Opportunities","Carl Orge Retzlaff, Srijita Das, Christabel Wayllace, Payam Mousavi, Mohammad Afshari, Tianpei Yang, Anna Saranti, Alessa Angerschmid, Matthew E. Taylor, Andreas Holzinger","University of Life Sciences Vienna, , , , , , , , , ","https://www.jair.org/index.php/jair/article/download/15348/27006","Artificial intelligence (AI) and especially reinforcement learning (RL) have the potential to enable agents to learn and perform tasks autonomously with superhuman performance. However, we consider RL as fundamentally a Human-in-the-Loop (HITL) paradigm, even when an agent eventually performs its task autonomously.  In cases where the reward function is challenging or impossible to define, HITL approaches are considered particularly advantageous. The application of Reinforcement Learning from Human Feedback (RLHF) in systems such as ChatGPT demonstrates the effectiveness of optimizing for user experience and integrating their feedback into the training loop. In HITL RL, human input is integrated during the agent’s learning process, allowing iterative updates and fine-tuning based on human feedback, thus enhancing the agent’s performance. Since the human is an essential part of this process, we argue that human-centric approaches are the key to successful RL, a fact that has not been adequately considered in the existing literature. This paper aims to inform readers about current explainability methods in HITL RL. It also shows how the application of explainable AI (xAI) and specific improvements to existing explainability approaches can enable a better human-agent interaction in HITL RL for all types of users, whether for lay people, domain experts, or machine learning specialists. Accounting for the workflow in HITL RL and based on software and machine learning methodologies, this article identifies four phases for human involvement for creating HITL RL systems: (1) Agent Development, (2) Agent Learning, (3) Agent Evaluation, and (4) Agent Deployment. We highlight human involvement, explanation requirements, new challenges, and goals for each phase. We furthermore identify low-risk, high-return opportunities for explainability research in HITL RL and present long-term research goals to advance the field. Finally, we propose a vision of human-robot collaboration that allows both parties to reach their full potential and cooperate effectively."
"15278","Reinforcement Learning for Generative AI: State of the Art, Opportunities and Open Research Challenges","Giorgio Franceschelli, Mirco Musolesi","University of Bologna, University College London; University of Bologna","https://www.jair.org/index.php/jair/article/download/15278/27007","Generative Artificial Intelligence (AI) is one of the most exciting developments in Computer Science of the last decade. At the same time, Reinforcement Learning (RL) has emerged as a very successful paradigm for a variety of machine learning tasks. In this survey, we discuss the state of the art, opportunities and open research questions in applying RL to generative AI. In particular, we will discuss three types of applications, namely, RL as an alternative way for generation without specified objectives; as a way for generating outputs while concurrently maximizing an objective function; and, finally, as a way of embedding desired characteristics, which cannot be easily captured by means of an objective function, into the generative process. We conclude the survey with an in-depth discussion of the opportunities and challenges in this fascinating emerging area."
"15075","Weighted, Circular and Semi-Algebraic Proofs","Ilario Bonacina, Maria Luisa Bonet, Jordi Levy","UPC Barcelona Tech, , ","https://www.jair.org/index.php/jair/article/download/15075/27008","In recent years there has been an increasing interest in studying proof systems stronger than Resolution, with the aim of building more efficient SAT solvers based on them. In defining these proof systems, we try to find a balance between the power of the proof system (the size of the proofs required to refute a formula) and the difficulty of finding the proofs. In this paper we consider the proof systems circular Resolution, Sherali-Adams, Nullstellensatz and Weighted Resolution and we study their relative power from a theoretical perspective. We prove that circular Resolution, Sherali-Adams and Weighted Resolution are polynomially equivalent proof systems. We also prove that Nullstellensatz is polynomially equivalent to a restricted version of Weighted Resolution. The equivalences carry on also for versions of the systems where the coefficients/weights are expressed in unary. The practical interest in these systems comes from the fact that they admit efficient algorithms to find proofs in case these have small width/degree."
"15249","An Algorithm with Improved Complexity for Pebble Motion Multi-Agent Path Finding on Trees","Stefano Ardizzoni, Irene Saccani, Luca Consolini, Marco Locatelli, Bernhard Nebel","Università di Parma, , , , ","https://www.jair.org/index.php/jair/article/download/15249/27009","The pebble motion on trees (PMT) problem consists in finding a feasible sequence of moves that repositions a set of pebbles to assigned target vertices. This problem has been widely studied because, in many cases, the more general Multi-Agent path finding (MAPF) problem on graphs can be reduced to PMT. We propose a simple and easy to implement procedure, which finds solutions of length O(|P|nc + n2), where n is the number of nodes, P is the set of pebbles, and c the maximum length of corridors in the tree. This complexity result is more detailed than the current best known result O(n3), which is equal to our result in the worst case, but does not capture the dependency on c and |P|."
"15071","On Mitigating the Utility-Loss in Differentially Private Learning:  A New Perspective by a Geometrically Inspired Kernel Approach","Mohit Kumar, Bernhard A. Moser, Lukas Fischer","University of Rostock, , ","https://www.jair.org/index.php/jair/article/download/15071/27011","Privacy-utility tradeoff remains as one of the fundamental issues of differentially private machine learning. This paper introduces a geometrically inspired kernel-based approach to mitigate the accuracy-loss issue in classification. In this approach, a representation of the affine hull of given data points is learned in Reproducing Kernel Hilbert Spaces (RKHS). This leads to a novel distance measure that hides privacy-sensitive information about individual data points and improves the privacy-utility tradeoff via significantly reducing the risk of membership inference attacks. The effectiveness of the approach is demonstrated through experiments on MNIST dataset, Freiburg groceries dataset, and a real biomedical dataset. It is verified that the approach remains computationally practical. The application of the approach to federated learning is considered and it is observed that the accuracy-loss due to data being distributed is either marginal or not significantly high."
"15170","Exploring the Tradeoff Between System Profit and Income Equality Among Ride-hailing Drivers","Evan Yifan Xu, Pan Xu",", New Jersey Institute of Technology","https://www.jair.org/index.php/jair/article/download/15170/27012","This paper examines the income inequality among rideshare drivers resulting from discriminatory cancellations by riders, considering the impact of demographic factors such as gender, age, and race. We investigate the tradeoff between income inequality, referred to as the fairness objective, and system efficiency, known as the profit objective. To address this issue, we propose an online bipartite-matching model that captures the sequential arrival of riders according to a known distribution. The model incorporates the notion of acceptance rates between driver-rider types, which are defined based on demographic characteristics. Specifically, we analyze the probabilities of riders accepting or canceling their assigned drivers, reflecting the level of acceptance between different rider and driver types. We construct a bi-objective linear program as a valid benchmark and propose two LP-based parameterized online algorithms. Rigorous analysis of online competitive ratios is conducted to illustrate the flexibility and efficiency of our algorithms in achieving a balance between fairness and profit. Furthermore, we present experimental results based on real-world and synthetic datasets, validating the theoretical predictions put forth in our study."
"14323","Practical and Parallelizable Algorithms for Non-Monotone Submodular Maximization with Size Constraint","Yixin Chen, Alan Kuhnle","Texas A&M University, Texas A&M University","https://www.jair.org/index.php/jair/article/download/14323/27013","We present combinatorial and parallelizable algorithms for the maximization of a submodular function, not necessarily monotone, with respect to a size constraint. We improve the best approximation factor achieved by an algorithm that has optimal adaptivity and nearly optimal query complexity to 1/6 − ε, and even further to 0.193 − ε by increasing the adaptivity by a factor of O(log(k)). The conference version of this work mistakenly employed a subroutine that does not work for non-monotone, submodular functions. In this version, we propose a fixed and improved subroutine to add a set with high average marginal gain, ThreshSeq, which returns a solution in O(log(n)) adaptive rounds with high probability. Moreover, we provide two approximation algorithms. The first has approximation ratio 1/6 − ε, adaptivity O(log(n)), and query complexity O(n log(k)), while the second has approximation ratio 0.193 − ε, adaptivity O(log(n) log(k)), and query complexity O(n log(k)). Our algorithms are empirically validated to use a low number of adaptive rounds and total queries while obtaining solutions with high objective value in comparison with state-of-the-art approximation algorithms, including continuous algorithms that use the multilinear extension."
"14747","Can Fairness be Automated? Guidelines and Opportunities for Fairness-aware AutoML","Hilde Weerts, Florian Pfisterer, Matthias Feurer, Katharina Eggensperger, Edward Bergman, Noor Awad, Joaquin Vanschoren, Mykola Pechenizkiy, Bernd Bischl, Frank Hutter","Eindhoven University of Technology, Ludwig Maximilians Universität München, Munich Center for Machine Learning, Albert-Ludwigs-Universität Freiburg, Albert-Ludwigs-Universität Freiburg, Eberhard Karls Universität Tübingen, Albert-Ludwigs-Universität Freiburg, Albert-Ludwigs-Universität Freiburg, Eindhoven University of Technology, Eindhoven University of Technology, Ludwig Maximilians Universität München, Munich Center for Machine Learning, Albert-Ludwigs-Universität Freiburg, Bosch Center for Artificial Intelligence","https://www.jair.org/index.php/jair/article/download/14747/27014","The field of automated machine learning (AutoML) introduces techniques that automate parts of the development of machine learning (ML) systems, accelerating the process and reducing barriers for novices. However, decisions derived from ML models can reproduce, amplify, or even introduce unfairness in our societies, causing harm to (groups of) individuals. In response, researchers have started to propose AutoML systems that jointly optimize fairness and predictive performance to mitigate fairness-related harm. However, fairness is a complex and inherently interdisciplinary subject, and solely posing it as an optimization problem can have adverse side effects. With this work, we aim to raise awareness among developers of AutoML systems about such limitations of fairness-aware AutoML, while also calling attention to the potential of AutoML as a tool for fairness research. We present a comprehensive overview of different ways in which fairness-related harm can arise and the ensuing implications for the design of fairness-aware AutoML. We conclude that while fairness cannot be automated, fairness-aware AutoML can play an important role in the toolbox of ML practitioners. We highlight several open technical challenges for future work in this direction. Additionally, we advocate for the creation of more user-centered assistive systems designed to tackle challenges encountered in fairness work. This article appears in the AI & Society track."
"15702","Multi-Objective Reinforcement Learning Based on Decomposition: A Taxonomy and Framework","Florian Felten, El-Ghazali Talbi, Grégoire Danoy","SnT, University of Luxembourg, CNRS/CRIStAL, University of Lille, FSTM/DCS, University of Luxembourg, FSTM/DCS, SnT, University of Luxembourg","https://www.jair.org/index.php/jair/article/download/15702/27015","Multi-objective reinforcement learning (MORL) extends traditional RL by seeking policies making different compromises among conflicting objectives. The recent surge of interest in MORL has led to diverse studies and solving methods, often drawing from existing knowledge in multi-objective optimization based on decomposition (MOO/D). Yet, a clear categorization based on both RL and MOO/D is lacking in the existing literature. Consequently, MORL researchers face difficulties when trying to classify contributions within a broader context due to the absence of a standardized taxonomy. To tackle such an issue, this paper introduces multi-objective reinforcement learning based on decomposition (MORL/D), a novel methodology bridging the literature of RL and MOO. A comprehensive taxonomy for MORL/D is presented, providing a structured foundation for categorizing existing and potential MORL works. The introduced taxonomy is then used to scrutinize MORL research, enhancing clarity and conciseness through well-defined categorization. Moreover, a flexible framework derived from the taxonomy is introduced. This framework accommodates diverse instantiations using tools from both RL and MOO/D. Its versatility is demonstrated by implementing it in different configurations and assessing it on contrasting benchmark problems. Results indicate MORL/D instantiations achieve comparable performance to current state-of-the-art approaches on the studied problems. By presenting the taxonomy and framework, this paper offers a comprehensive perspective and a unified vocabulary for MORL. This not only facilitates the identification of algorithmic contributions but also lays the groundwork for novel research avenues in MORL."
"15826","Learning Logic Specifications for Policy Guidance in POMDPs: an Inductive Logic Programming Approach","Daniele Meli, Alberto Castellini, Alessandro Farinelli","University of Verona, University of Verona, University of Verona","https://www.jair.org/index.php/jair/article/download/15826/27016","Partially Observable Markov Decision Processes (POMDPs) are a powerful framework for planning under uncertainty. They allow to model state uncertainty as a belief probability distribution. Approximate solvers based on Monte Carlo sampling show great success to relax the computational demand and perform online planning. However, scaling to complex realistic domains with many actions and long planning horizons is still a major challenge, and a key point to achieve good performance is guiding the action-selection process with domain-dependent policy heuristics which are tailored for the specific application domain.  We propose to learn high-quality heuristics from POMDP traces of executions generated by any solver.  We convert the belief-action pairs to a logical semantics, and exploit data- and time-efficient Inductive Logic Programming (ILP) to generate interpretable belief-based policy specifications, which are then used as online heuristics. We evaluate thoroughly our methodology on two notoriously challenging POMDP problems, involving large action spaces and long planning horizons, namely, rocksample and pocman. Considering different state-of-the-art online POMDP solvers, including POMCP, DESPOT and AdaOPS, we show that learned heuristics expressed in Answer Set Programming (ASP) yield performance superior to neural networks and similar to optimal handcrafted task-specific heuristics within lower computational time. Moreover, they well generalize to more challenging scenarios not experienced in the training phase (e.g., increasing rocks and grid size in rocksample, incrementing the size of the map and the aggressivity of ghosts in pocman)."
"15423","Performative Ethics From Within the Ivory Tower: How CS Practitioners Uphold Systems of Oppression","Zari McFadden, Lauren Alvarez","North Carolina State University, North Carolina State University","https://www.jair.org/index.php/jair/article/download/15423/27017","This paper analyzes where Artificial Intelligence (AI) ethics research fails and breaks down the dangers of well-intentioned but ultimately performative ethics research. A large majority of AI ethics research is criticized for not providing a comprehensive analysis of  how AI is interconnected with sociological systems of oppression and power. Our work contributes to the handful of research that presents intersectional, Western systems of oppression and power as a framework for examining AI ethics work and the complexities of building less harmful technology; directly connecting technology to named systems such as capitalism and classism, colonialism, racism and white supremacy, patriarchy, and ableism. We then explore current AI ethics rhetoric’s effect on the AI ethics domain. We conclude by providing an applied example to contextualize intersectional systems of oppression and AI interventions in the US justice system and present actionable steps for AI practitioners to participate in a less performative, critical analysis of AI. This article appears in the AI & Society track."
"14849","conDENSE: Conditional Density Estimation for Time Series Anomaly Detection","Alex Moore, Davide Morelli","Huma Therapeutics ltd, Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford","https://www.jair.org/index.php/jair/article/download/14849/27018","In recent years deep learning methods, based on reconstruction errors, have facilitated huge improvements in unsupervised anomaly detection. These methods make the limiting assumption that the greater the distance between an observation and a prediction the lower the likelihood of that observation. In this paper we propose conDENSE, a novel anomaly detection algorithm, which does not use reconstruction errors but rather uses conditional density estimation in masked autoregressive flows. By directly estimating the likelihood of data, our model moves beyond approximating expected behaviour with a single point estimate, as is the case in reconstruction error models. We show how conditioning on a dense representation of the current trajectory, extracted from a variational autoencoder with a gated recurrent unit (GRU VAE), produces a model that is suitable for periodic datasets, while also improving performance on non-periodic datasets. Experiments on 31 time-series, including real-world anomaly detection benchmark datasets and synthetically generated data, show that the model can outperform state-of-the-art deep learning methods."
"15301","Multi-Modal Attentive Prompt Learning for Few-shot Emotion Recognition in Conversations","Xingwei Liang, Geng Tu, Jiachen Du, Ruifeng Xu",", Harbin Institute of Tehnology, Harbin Institute of Tehnology, Harbin Institute of Tehnology","https://www.jair.org/index.php/jair/article/download/15301/27019","Emotion recognition in conversations (ERC) has emerged as an important research area in Natural Language Processing and Affective Computing, focusing on accurately identifying emotions within the conversational utterance. Conventional approaches typically rely on labeled training samples for fine-tuning pre-trained language models (PLMs) to enhance classification performance. However, the limited availability of labeled data in real-world scenarios poses a significant challenge, potentially resulting in diminished model performance. In response to this challenge, we present the Multi-modal Attentive Prompt (MAP) learning framework, tailored specifically for few-shot emotion recognition in conversations. The MAP framework consists of four integral modules: multi-modal feature extraction for the sequential embedding of text, visual, and acoustic inputs; a multi-modal prompt generation module that creates six manually-designed multi-modal prompts; an attention mechanism for prompt aggregation; and an emotion inference module for emotion prediction. To evaluate our proposed model’s efficacy, we conducted extensive experiments on two widely recognized benchmark datasets, MELD and IEMOCAP. Our results demonstrate that the MAP framework outperforms state-of-the-art ERC models, yielding notable improvements of 3.5% and 0.4% in micro F1 scores. These findings highlight the MAP learning framework’s ability to effectively address the challenge of limited labeled data in emotion recognition, offering a promising strategy for improving ERC model performance."
"15849","A Principled Distributional Approach to Trajectory Similarity Measurement and its Application to Anomaly Detection","Yufan Wang, Zijing Wang, Kai Ming Ting, Yuanyi Shang",", Nanjing University, , ","https://www.jair.org/index.php/jair/article/download/15849/27020","This paper aims to solve two enduring challenges in existing trajectory similarity measures: computational inefficiency and the absence of the ‘uniqueness’ property that should be guaranteed in a distance function: dist(X, Y ) = 0 if and only if X = Y , where X and Y are two trajectories. In this work, we present a novel approach utilizing a distributional kernel for trajectory representation and similarity measurement, based on the kernel mean embedding framework. It is the very first time a distributional kernel is used for trajectory representation and similarity measurement. Our method does not rely on point-to-point distances which are used in most existing distances for trajectories. Unlike prevalent learning and deep learning approaches, our method requires no learning. We show the generality of this new approach in anomalous trajectory and sub-trajectory detection. We identify that the distributional kernel has (i) a data-dependent property and the ‘uniqueness’ property which are the key factors that lead to its superior task-specific performance, and (ii) runtime orders of magnitude faster than existing distance measures."
"15167","Learning to Resolve Social Dilemmas: A Survey","Shaheen Fatima, Nicholas R. Jennings, Michael Wooldridge","Loughborough University, Loughborough University, Oxford University","https://www.jair.org/index.php/jair/article/download/15167/27021","Social dilemmas are situations of inter-dependent decision making in which individual rationality can lead to outcomes with poor social qualities. The ubiquity of social dilemmas in social, biological, and computational systems has generated substantial research across these diverse disciplines into the study of mechanisms for avoiding deficient outcomes by promoting and maintaining mutual cooperation. Much of this research is focused on studying how individuals faced with a dilemma can learn to cooperate by adapting their behaviours according to their past experience. In particular, three types of learning approaches have been studied: evolutionary game-theoretic learning, reinforcement learning, and best-response learning. This article is a comprehensive integrated survey of these learning approaches in the context of dilemma games. We formally introduce dilemma games and their inherent challenges. We then outline the three learning approaches and, for each approach, provide a survey of the solutions proposed for dilemma resolution. Finally, we provide a comparative summary and discuss directions in which further research is needed."
"14888","Cultural Bias in Explainable AI Research: A Systematic Analysis","Uwe Peters, Mary Carman","University of Cambridge, University of the Witwatersrand","https://www.jair.org/index.php/jair/article/download/14888/27022","For synergistic interactions between humans and artificial intelligence (AI) systems, AI outputs often need to be explainable to people. Explainable AI (XAI) systems are commonly tested in human user studies. However, whether XAI researchers consider potential cultural differences in human explanatory needs remains unexplored. We highlight psychological research that found significant differences in human explanations between many people from Western, commonly individualist countries and people from non-Western, often collectivist countries. We argue that XAI research currently overlooks these variations and that many popular XAI designs implicitly and problematically assume that Western explanatory needs are shared cross-culturally. Additionally, we systematically reviewed over 200 XAI user studies and found that most studies did not consider relevant cultural variations, sampled only Western populations, but drew conclusions about human-XAI interactions more generally. We also analyzed over 30 literature reviews of XAI studies. Most reviews did not mention cultural differences in explanatory needs or flag overly broad cross-cultural extrapolations of XAI user study results. Combined, our analyses provide evidence of a cultural bias toward Western populations in XAI research, highlighting an important knowledge gap regarding how culturally diverse users may respond to widely used XAI systems that future work can and should address."
"15329","Removing Bias and Incentivizing Precision in Peer-grading","Anujit Chakraborty, Jatin Jindal, Swaprava Nath",", , IIT Bombay","https://www.jair.org/index.php/jair/article/download/15329/27024","Most peer-evaluation practices rely on the evaluator’s goodwill and model them as potentially noisy evaluators. But what if graders are competitive, i.e., enjoy higher utility when their peers get lower scores? We model the setting as a multi-agent incentive design problem and propose a new mechanism, PEQA, that incentivizes these agents (peer-graders) through a score-assignment rule and a grading performance score. PEQA is designed in such a way that it makes grader-bias irrelevant and ensures grader-utility to be monotonically increasing with the grading-precision, despite competitiveness. When grading is costly and costs are private information of the individual graders, a modified version of PEQA implements the socially optimal grading-choices in equilibrium. Data from our classroom experiments is consistent with our theoretical assumptions and show that PEQA outperforms the popular median mechanism, which is used in several massive open online courses (MOOCs)."
"14924","Iterative Train Scheduling under Disruption with Maximum Satisfiability","Alexandre Lemos, Filipe Gouveia, Pedro T. Monteiro, Inês Lynce",", INESC-ID, , ","https://www.jair.org/index.php/jair/article/download/14924/27025","This paper proposes an iterative Maximum Satisfiability (MaxSAT) approach designed to solve train scheduling optimization problems. The generation of railway timetables is known to be intractable for a single track. We consider hundreds of trains on interconnected multi-track railway networks with complex connections between trains. Furthermore, the proposed algorithm is incremental to reduce the impact of time discretization. The performance of our approach is evaluated with the real-world Swiss Federal Railway (SBB) Crowd Sourcing Challenge benchmark and Periodic Event Scheduling Problems benchmark (PESPLib). The execution time of the proposed approach is shown to be, on average, twice as fast as the best existing solution for the SBB instances. In addition, we achieve a significant improvement over SAT-based solutions for solving the PESPLib instances. We also analyzed real schedule data from Switzerland and the Netherlands to create a disruption generator based on probability distributions. The novel incremental algorithm allows solving the train scheduling problem under disruptions with better performance than traditional algorithms."
"15698","DIGCN: A Dynamic Interaction Graph Convolutional Network Based on Learnable Proposals for Object Detection","Pingping Cao, Yanping Zhu, Yuhao Jin, Benkun Ruan, Qiang Niu","China University of Mining and Technology, , , , ","https://www.jair.org/index.php/jair/article/download/15698/27026","We propose a Dynamic Interaction Graph Convolutional Network (DIGCN), an image object detection method based on learnable proposals and GCN. Existing object detection methods usually work on dense candidates, resulting in redundant and near-duplicate results. Meanwhile, non-maximum suppression post-processing operations are required to eliminate negative effects, which increases the computational complexity. Although the existing sparse detector avoids cumbersome post-processing operations, it ignores the potential relationship between objects and proposals, which hinders detection accuracy improvement. Therefore, we propose a dynamic interaction GCN module in the DIGCN, which performs dynamic interaction and relational modeling on the proposal boxes and proposal features to improve the object detection accuracy. In addition, we introduce a learnable proposal method with a sparse set of learned object proposals to eliminate a huge number of hand-designed object candidates, avoiding complicated tasks such as object candidate design and many-to-one label assignment, and reducing object detection model complexity to a certain extent. DIGCN demonstrates accuracy and run-time performance on par with the well-established and highly optimized detector baselines on the challenging COCO dataset, e.g. with the ResNet-101FPN as the backbone our method attains the accuracy of 46.5 AP while processing 13 frames per second. Our work provides a new method for object detection research."
"15213","A Map of Diverse Synthetic Stable Matching Instances","Niclas Boehmer, Klaus Heeger, Stanisław Szufa","TU Berlin, , ","https://www.jair.org/index.php/jair/article/download/15213/27027","Focusing on Stable Roommates (SR), we contribute to the toolbox for conducting experiments for stable matching problems. We introduce the polynomial-time computable mutual attraction distance to measure the similarity of SR instances, analyze its properties, and use it to create a map of SR instances. This map visualizes 460 synthetic SR instances (each sampled from one of ten different statistical cultures) as follows: Each instance is a point in the plane, and two points are close on the map if the corresponding SR instances are similar with respect to our mutual attraction distance to each other. Subsequently, we conduct several illustrative experiments and depict their results on the map, illustrating the map’s usefulness as a non-aggregate visualization tool, the diversity of our generated dataset, and the need to use instances sampled from different statistical cultures. Lastly, we extend our approach to the bipartite Stable Marriage problem."
"15703","Structure in Deep Reinforcement Learning: A Survey and Open Problems","Aditya Mohan, Amy Zhang, Marius Lindauer","Leibniz University of Hannover, , ","https://www.jair.org/index.php/jair/article/download/15703/27028","Reinforcement Learning (RL), bolstered by the expressive capabilities of Deep Neural Networks (DNNs) for function approximation, has demonstrated considerable success in numerous applications. However, its practicality in addressing various real-world scenarios, characterized by diverse and unpredictable dynamics, noisy signals, and large state and action spaces, remains limited. This limitation stems from poor data efficiency, limited generalization capabilities, a lack of safety guarantees, and the absence of interpretability, among other factors. To overcome these challenges and improve performance across thesecrucial metrics, one promising avenue is to incorporate additional structural information about the problem into the RL learning process. Various sub-fields of RL have proposed methods for incorporating such inductive biases. We amalgamate these diverse methodologies under a unified framework, shedding light on the role of structure in the learning problem, and classify these methods into distinct patterns of incorporating structure. By leveraging this comprehensive framework, we provide valuable insights into the challenges of structured RL and lay the groundwork for a design pattern perspective on RL research. This novelperspective paves the way for future advancements and aids in developing more effective and efficient RL algorithms that can potentially handle real-world scenarios better."
"15819","USN: A Robust Imitation Learning Method against Diverse Action Noise","Xingrui Yu, Bo Han, Ivor W. Tsang","Centre for Frontier AI Research (CFAR), Agency for Science, Technology and Research (A*STAR), Singapore, Department of Computer Science Hong Kong Baptist University, Hong Kong SAR, ","https://www.jair.org/index.php/jair/article/download/15819/27029","Learning from imperfect demonstrations is a crucial challenge in imitation learning (IL). Unlike existing works that still rely on the enormous effort of expert demonstrators, we consider a more cost-effective option for obtaining a large number of demonstrations. That is, hire annotators to label actions for existing image records in realistic scenarios. However, action noise can occur when annotators are not domain experts or encounter confusing states. In this work, we introduce two particular forms of action noise, i.e., state-independent and state-dependent action noise. Previous IL methods fail to achieve expert-level performance when the demonstrations contain action noise, especially the state-dependent action noise.  To mitigate the harmful effects of action noises, we propose a robust learning paradigm called USN (Uncertainty-aware Sample-selection with Negative learning). The model first estimates the predictive uncertainty for all demonstration data and then selects sampleswith high loss based on the uncertainty measures. Finally, it updates the model parameters with additional negative learning on the selected samples. Empirical results in Box2D tasks and Atari games show that USN consistently improves the final rewards of behavioral cloning, online imitation learning, and offline imitation learning methods under various action noises. The ratio of significant improvements is up to 94.44%. Moreover, our method scales to conditional imitation learning with real-world noisy commands in urban driving"
"15748","Collision Avoiding Max-Sum for Mobile Sensor Teams","Arseniy Pertzovsky, Roie Zivan, Noa Agmon","Ben-Gurion University of the Negev, Ben-Gurion University of the Negev, Bar-Ilan University","https://www.jair.org/index.php/jair/article/download/15748/27030","Recent advances in technology have large teams of robots with limited computation skills work together in order to achieve a common goal. Their personal actions need to contribute to the joint effort, however, they also must assure that they do not harm the efforts of the other members of the team, e.g., as a result of collisions. We focus on the distributed target coverage problem, in which the team must cooperate in order to maximize utility from sensed targets, while avoiding collisions with other agents. State of the art solutions focus on the distributed optimization of the coverage task in the team level, while neglecting to consider collision avoidance, which could have far reaching consequences on the overall performance. Therefore, we propose CAMS: a collision-avoiding version of the Max-sum algorithm, for solving problems including mobile sensors. In CAMS, a factor-graph that includes two types of constraints (represented by function-nodes) is being iteratively generated and solved. The first type represents the task-related requirements, and the second represents collision avoidance constraints. We prove that consistent beliefs are sent by target representing function-nodes during the run of the algorithm, and identify factor-graph structures on which CAMS is guaranteed to converge to an optimal (collision-free) solution. We present an experimental evaluation in extensive simulations, showing that CAMS produces high quality collision-free coverage also in large and complex scenarios. We further present evidence from experiments in a real multi-robot system that CAMS outperforms the state of the art in terms of convergence time."
"15317","Bt-GAN: Generating Fair Synthetic Healthdata via Bias-transforming Generative Adversarial Networks","Resmi Ramachandranpillai, Md Fahim Sikder, David Bergström, Fredrik Heintz","Postdoctoral Research Associate, Northeastern University, Department of Computer and Information Science, Linköping University, Sweden, Department of Computer and Information Science, Linköping University, Sweden, Department of Computer and Information Science, Linköping University, Sweden","https://www.jair.org/index.php/jair/article/download/15317/27031","Synthetic data generation offers a promising solution to enhance the usefulness of Electronic Healthcare Records (EHR) by generating realistic de-identified data. However, the existing literature primarily focuses on the quality of synthetic health data, neglecting the crucial aspect of fairness in downstream predictions. Consequently, models trained on synthetic EHR have faced criticism for producing biased outcomes in target tasks. These biases can arise from either spurious correlations between features or the failure of models to accurately represent sub-groups. To address these concerns, we present Bias-transforming Generative Adversarial Networks (Bt-GAN), a GAN-based synthetic data generator specifically designed for the healthcare domain. In order to tackle spurious correlations (i), we propose an information-constrained Data Generation Process (DGP) that enables the generator to learn a fair deterministic transformation based on a well-defined notion of algorithmic fairness. To overcome the challenge of capturing exact sub-group representations (ii), we incentivize the generator to preserve sub-group densities through score-based weighted sampling. This approach compels the generator to learn from underrepresented regions of the data manifold. To evaluate the effectiveness of our proposed method, we conduct extensive experiments using the Medical Information Mart for Intensive Care (MIMIC-III) database. Our results demonstrate that Bt-GAN achieves state-of-the-art accuracy while significantly improving fairness and minimizing bias amplification. Furthermore, we perform an in-depth explainability analysis to provide additional evidence supporting the validity of our study. In conclusion, our research introduces a novel and professional approach to addressing the limitations of synthetic data generation in the healthcare domain. By incorporating fairness considerations and leveraging advanced techniques such as GANs, we pave the way for more reliable and unbiased predictions in healthcare applications."
"15414","Computing Pareto-Optimal and Almost Envy-Free Allocations of Indivisible Goods","Jugal Garg, Aniket Murhekar","University of Illinois, Urbana-Champaign, University of Illinois, Urbana-Champaign","https://www.jair.org/index.php/jair/article/download/15414/27032","We study the problem of fair and efficient allocation of a set of indivisible goods to agents with additive valuations using the popular fairness notions of envy-freeness up to one good (EF1) and equitability up to one good (EQ1) in conjunction with Pareto-optimality (PO). There exists a pseudo-polynomial time algorithm to compute an EF1+PO allocation and a non-constructive proof of the existence of allocations that are both EF1 and fractionally Pareto-optimal (fPO), which is a stronger notion than PO. We present a pseudopolynomial time algorithm to compute an EF1+fPO allocation, thereby improving the earlier results. Our techniques also enable us to show that an EQ1+fPO allocation always exists when the values are positive and that it can be computed in pseudo-polynomial time. We also consider the class of k-ary instances where k is a constant, i.e., each agent has at most k different values for the goods. For such instances, we show that an EF1+fPO allocation can be computed in strongly polynomial time. When all values are positive, we show that an EQ1+fPO allocation for such instances can be computed in strongly polynomial time. Next, we consider instances where the number of agents is constant and show that an EF1+PO (likewise, an EQ1+PO) allocation can be computed in polynomial time. These results significantly extend the polynomial-time computability beyond the known cases of binary or identical valuations. We also design a polynomial-time algorithm that computes a Nash welfare maximizing allocation when there are constantly many agents with constant many different values for the goods. Finally, on the complexity side, we show that the problem of computing an EF1+fPO allocation lies in the complexity class PLS."
"15326","Estimating Agent Skill in Continuous Action Domains","Christopher Archibald, Delma Nieves-Rivera",", Mississippi State University","https://www.jair.org/index.php/jair/article/download/15326/27033","Actions in most real-world continuous domains cannot be executed exactly. An agent’s performance in these domains is influenced by two critical factors: the ability to select effective actions (decision-making skill), and how precisely it can execute those selected actions (execution skill). This article addresses the problem of estimating the execution and decision-making skill of an agent, given observations. Several execution skill estimation methods are presented, each of which utilize different information from the observations and make assumptions about the agent’s decision-making ability. A final novel method forgoes these assumptions about decision-making and instead estimates the execution and decision-making skills simultaneously under a single Bayesian framework. Experimental results in several domains evaluate the estimation accuracy of the estimators, especially focusing on how robust they are as agents and their decision-making methods are varied. These results demonstrate that reasoning about both types of skill together significantly improves the robustness and accuracy of execution skill estimation. A case study is presented using the proposed methods to estimate the skill of Major League Baseball pitchers, demonstrating how these methods can be applied to real-world data sources."
"15642","Experimental Design of Extractive Question-Answering Systems: Influence of Error Scores and Answer Length","Amer Farea, Frank Emmert-Streib","Tampere University, Tampere University","https://www.jair.org/index.php/jair/article/download/15642/27034","Question-answering (QA) systems are becoming more and more important because they enable human-computer communication in a natural language. In recent years, significant progress has been made with transformer-based models that leverage deep learning in combination with large amounts of text data. However, a significant challenge with QA systems lies in their complexity rooted in the ambiguity and flexibility of a natural language. This makes even their evaluation a formidable task. For this reason, in this study, we focus on the evaluation of extractive question-answering (EQA) systems by conducting a large-scale analysis of distilBERT using benchmark data provided by the Stanford Question Answering Dataset (SQuAD). Specifically, the main objectives of this paper are fourfold. First, we study the influence of the answer length on the performance and we demonstrate that there is an inverse correlation between both. Second, we study differences in exact match (EM) measures because there are different definitions commonly used in the literature. As a result, we find that despite the fact that all of those measures are named ”exact match” these measures are actually different from each other. Third, we study the practical relevance of these different definitions because due to the ambivalent meaning of ”exact match” in the literature, it is often unclear if reported improvements are genuine or only due to a change in the exact match measure. Importantly, our results show that differences between differently defined EM measures are in the same order of magnitude as reported differences found in the literature. This raises concerns about the robustness of reported results. Fourth, we provide guidelines to improve the experimental design of general EQA studies, aiming to enhance performance evaluation and minimize the potential for spurious results."
"14741","Effectiveness of Tree-based Ensembles for Anomaly Discovery: Insights, Batch and Streaming Active Learning","Shubhomoy Das, Md Rakibul Islam, Nitthilan Kannappan Jayakodi, Janardhan Rao Doppa",", Washington State University, , Washington State University","https://www.jair.org/index.php/jair/article/download/14741/27035","Anomaly detection (AD) task corresponds to identifying the true anomalies among a given set of data instances. AD algorithms score the data instances and produce a ranked list of candidate anomalies. The ranked list of anomalies is then analyzed by a human to discover the true anomalies. Ensemble of tree-based anomaly detectors trained in an unsupervised manner and scoring based on uniform weights for ensembles are shown to work well in practice. However, the manual process of analysis can be laborious for the human analyst when the number of false-positives is very high. Therefore, in many real-world AD applications including computer security and fraud prevention, the anomaly detector must be configurable by the human analyst to minimize the effort on false positives. One important way to configure the detector is by providing true labels (nominal or anomaly) for a few instances. Recent work on active anomaly discovery has shown that greedily querying the top-scoring instance and tuning the weights of ensembles based on label feedback allows us to quickly discover true anomalies. This paper makes four main contributions to improve the state-of-the-art in anomaly discovery using tree-based ensembles. First, we provide an important insight that explains the practical successes of unsupervised tree-based ensembles and active learning based on greedy query selection strategy. We also show empirical results on real-world data to support our insights and theoretical analysis to support active learning. Second, we develop a novel batch active learning algorithm to improve the diversity of discovered anomalies based on a formalism called compact description to describe the discovered anomalies. Third, we develop a novel active learning algorithm to handle streaming data setting. We present a data drift detection algorithm that not only detects the drift robustly, but also allows us to take corrective actions to adapt the anomaly detector in a principled manner. Fourth, we present extensive experiments to evaluate our insights and our tree-based active anomaly discovery algorithms in both batch and streaming data settings. Our results show that active learning allows us to discover significantly more anomalies than state-of-the-art unsupervised baselines, our batch active learning algorithm discovers diverse anomalies, and our algorithms under the streaming-data setup are competitive with the batch setup."
"15821","Expressing and Exploiting Subgoal Structure in Classical Planning Using Sketches","Dominik Drexler, Jendrik Seipp, Hector Geffner","Linköping University, Linköping University, RWTH Aachen University","https://www.jair.org/index.php/jair/article/download/15821/27036","Width-based planning methods deal with conjunctive goals by decomposing problems into subproblems of low width. Algorithms like SIW thus fail when the goal is not easily serializable in this way or when some of the subproblems have a high width. In this work, we address these limitations by using a simple but powerful language for expressing finer problem decompositions introduced recently by Bonet and Geffner, called policy sketches. A policy sketch R over a set of Boolean and numerical features is a set of sketch rules C → E that express how the values of these features are supposed to change. Like general policies, policy sketches are domain general, but unlike policies, the changes captured by sketch rules do not need to be achieved in a single step. We show that many planning domains that cannot be solved by SIW are provably solvable in low polynomial time with the SIWR algorithm, the version of SIW that employs user-provided policy sketches. Policy sketches are thus shown to be a powerful language for expressing domain-specific knowledge in a simple and compact way and a convenient alternative to languages such as HTNs or temporal logics. Furthermore, they make it easy to express general problem decompositions and prove key properties of them like their width and complexity."
"14435","Block Domain Knowledge-Driven Learning of Chain Graphs Structure","Shujing Yang, Fuyuan Cao",", Shanxi  University","https://www.jair.org/index.php/jair/article/download/14435/27037","As the interdependence between arbitrary objects in the real world grows, it becomes gradually important to use chain graphs containing directed and undirected edges to learn the structure among objects. However, independence among some variables corresponds to multiple structures and the direction of edges among variables cannot be uniquely determined. This limitation restricts existing chain graphs structure learning algorithms to only learning their Markov equivalence class. To alleviate this limitation, we de ne the block domain knowledge and propose a block domain knowledge-driven learning chain graphs structure algorithm (KDLCG). The KDLCG algorithm learns the adjacencies and spouses of all variables, which are utilized to directly construct the skeleton and orient the edges of the complexes, thereby learning the Markov equivalence class of the chain graphs. Subsequently, the KDLCG algorithm then updates some edges with Meek rules, guided by block domain knowledge. Finally, the KDLCG algorithm directs some edges by estimating causal effects between two variables, driven by block domain knowledge. Meanwhile, we conduct theoretical analysis to prove the correctness of our algorithm and compare it with the LCD algorithm and MBLWF algorithm on synthetic and real-world datasets. The experimental results validate the effectiveness of our algorithm."
"15742","Understanding Sample Generation Strategies for Learning Heuristic Functions in Classical Planning","Rafael V. Bettker, Pedro P. Minini, André G. Pereira, Marcus Ritt",", , Federal University of Rio Grande do Sul, ","https://www.jair.org/index.php/jair/article/download/15742/27039","We study the problem of learning good heuristic functions for classical planning tasks with neural networks based on samples represented by states with their cost-to-goal estimates. The heuristic function is learned for a state space and goal condition with the number of samples limited to a fraction of the size of the state space, and must generalize well for all states of the state space with the same goal condition. Our main goal is to better understand the influence of sample generation strategies on the performance of a greedy best-first heuristic search (GBFS) guided by a learned heuristic function. In a set of controlled experiments, we find that two main factors determine the quality of the learned heuristic: the algorithm used to generate the sample set and how close the sample estimates to the perfect cost-to-goal are. These two factors are dependent: having perfect cost-to-goal estimates is insufficient if the samples are not well distributed across the state space. We also study other effects, such as adding samples with high-value estimates. Based on our findings, we propose practical strategies to improve the quality of learned heuristics: three strategies that aim to generate more representative states and two strategies that improve the cost-to-goal estimates. Our practical strategies result in a learned heuristic that, when guiding a GBFS algorithm, increases by more than 30% the mean coverage compared to a baseline learned heuristic."
"15191","On the Trade-off between Redundancy and Cohesiveness in Extractive Summarization","Ronald Cardenas, Matthias Galle, Shay B. Cohen","university of edinburgh, Cohere, University of Edinburgh","https://www.jair.org/index.php/jair/article/download/15191/27040","Extractive summaries are usually presented as lists of sentences with no expected cohesion between them and with plenty of redundant information if not accounted for. In this paper, we investigate the trade-offs incurred when aiming to control for inter-sentential cohesion and redundancy in extracted summaries, and their impact on their informativeness. As case study, we focus on the summarization of long, highly redundant documents and consider two optimization scenarios, reward-guided and with no supervision. In the reward-guided scenario, we compare systems that control for redundancy and cohesiveness during sentence scoring. In the unsupervised scenario, we introduce two systems that aim to control all three properties --informativeness, redundancy, and cohesiveness-- in a principled way. Both systems implement a psycholinguistic theory that simulates how humans keep track of relevant content units and how cohesiveness and non-redundancy constraints are applied in short-term memory during reading. Extensive automatic and human evaluations reveal that systems optimizing for --among other properties-- cohesiveness are capable of better organizing content in summaries compared to systems that optimize only for redundancy, while maintaining comparable informativeness. We find that the proposed unsupervised systems manage to extract highly cohesive summaries across varying levels of document redundancy, although sacrificing informativeness in the process. Finally, we lay evidence as to how simulated cognitive processes impact the trade-off between the analysed summary properties."
"14972","Scalable Primal Heuristics Using Graph Neural Networks for Combinatorial Optimization","Furkan Cantürk, Taha Varol, Reyhan Aydoğan, Okan Örsan Özener","Ozyegin University, Ozyegin University, Ozyegin University, Ozyegin University","https://www.jair.org/index.php/jair/article/download/14972/27041","By examining the patterns of solutions obtained for various instances, one can gain insights into the structure and behavior of combinatorial optimization (CO) problems and develop efficient algorithms for solving them. Machine learning techniques, especially Graph Neural Networks (GNNs), have shown promise in parametrizing and automating this laborious design process. The inductive bias of GNNs allows for learning solutions to mixed-integer programming (MIP) formulations of constrained CO problems with a relational representation of decision variables and constraints. The trained GNNs can be leveraged with primal heuristics to construct high-quality feasible solutions to CO problems quickly. However, current GNN-based end-to-end learning approaches have limitations for scalable training and generalization on larger-scale instances; therefore, they have been mostly evaluated over small-scale instances. Addressing this issue, our study builds on supervised learning of optimal solutions to the downscaled instances of given large-scale CO problems. We introduce several improvements on a recent GNN model for CO to generalize on instances of a larger scale than those used in training. We also propose a two-stage primal heuristic strategy based on uncertainty-quantification to automatically configure how solution search relies on the predicted decision values. Our models can generalize on 16x upscaled instances of commonly benchmarked five CO problems. Unlike the regressive performance of existing GNN-based CO approaches as the scale of problems increases, the CO pipelines using our models offer an incremental performance improvement relative to CPLEX. The proposed uncertainty-based primal heuristics provide 6-75% better optimality gap values and 45-99% better primal gap values for the 16x upscaled instances and brings immense speedup to obtain high-quality solutions. All these gains are achieved through a computationally efficient modeling approach without sacrificing solution quality."
"15693","Similarity-Based Adaptation for Task-Aware and Task-Free Continual Learning","Tameem Adel","University of Cambridge, UK & National Physical Laboratory","https://www.jair.org/index.php/jair/article/download/15693/27042","Continual learning (CL) is a paradigm which addresses the issue of how to learn from sequentially arriving tasks. The goal of this paper is to introduce a CL framework which can both learn from a global multi-task architecture and locally adapt this learning to the task at hand. In addition to the global knowledge, we conjecture that it is also beneficial to further focus on the most relevant pieces of previous knowledge. Using a prototypical network as a proxy, the proposed framework bases its adaptation on the similarity between the current data stream and the previously encountered data. We develop two algorithms, one for the standard task-aware CL and another for the more challenging task-free setting where boundaries between tasks are unknown. We correspondingly derive a generalization upper bound on the error of an upcoming task. Experiments demonstrate that the introduced algorithms lead to improved performance on several CL benchmarks."
"14947","Exploiting Contextual Target Attributes for Target Sentiment Classification","Bowen Xing, Ivor W. Tsang","University of Tecnology Sydney, ","https://www.jair.org/index.php/jair/article/download/14947/27043","In the past few years, pre-trained language models (PTLMs) have brought significant improvements to target sentiment classification (TSC). Existing PTLM-based models can be categorized into two groups: 1) fine-tuning-based models that adopt PTLM as the context encoder; 2) prompting-based models that transfer the classification task to the text/word generation task. Despite the improvements achieved by these models, we argue that they have their respective limitations. For fine-tuning-based models, they cannot make the best use of the PTLMs’ strong language modeling ability because the pre-train task and downstream fine-tuning task are not consistent. For prompting-based models, although they can sufficiently leverage the language modeling ability, it is hard to explicitly model the target-context interactions, which are widely realized as a crucial point of this task. In this paper, we present a new perspective of leveraging PTLM for TSC: simultaneously leveraging the merits of both language modeling and explicit target-context interactions via contextual target attributes. Specifically, we design the domain- and target-constrained cloze test, which can leverage the PTLMs’ strong language modeling ability to generate the given target’s attributes pertaining to the review context. The attributes contain the background and property information of the target, which can help to enrich the semantics of the review context and the target. To exploit the attributes for tackling TSC, we first construct a heterogeneous information graph by treating the attributes as nodes and combining them with (1) the syntax graph automatically produced by the off-the-shelf dependency parser and (2) the semantics graph of the review context, which is derived from the self-attention mechanism. Then we propose a heterogeneous information gated graph convolutional network to model the interactions among the attribute information, the syntactic information, and the contextual information. The experimental results on three benchmark datasets demonstrate the superiority of our model, which achieves new state-of-the-art performance."
"15155","Mitigating Value Hallucination in Dyna-Style Planning via Multistep Predecessor Models","Farzane Aminmansour, Taher Jafferjee, Ehsan Imani, Erin J. Talvitie, Michael Bowling, Martha White","University of Alberta, , , , , ","https://www.jair.org/index.php/jair/article/download/15155/27044","Dyna-style reinforcement learning (RL) agents improve sample efficiency over model-free RL agents by updating the value function with simulated experience generated by an environment model. However, it is often difficult to learn accurate models of environment dynamics, and even small errors may result in failure of Dyna agents. In this paper, we highlight that one potential cause of that failure is bootstrapping off of the values of simulated states, and introduce a new Dyna algorithm to avoid this failure. We discuss a design space of Dyna algorithms, based on using successor or predecessor models---simulating forwards or backwards---and using one-step or multi-step updates. Three of the variants have been explored, but surprisingly the fourth variant has not: using predecessor models with multi-step updates. We present the \emph{Hallucinated Value Hypothesis} (HVH): updating the values of real states towards values of simulated states can result in misleading action values which adversely affect the control policy. We discuss and evaluate all four variants of Dyna amongst which three update real states toward simulated states --- so potentially toward hallucinated values --- and our proposed approach, which does not. The experimental results provide evidence for the HVH, and suggest that using predecessor models with multi-step updates is a fruitful direction toward developing Dyna algorithms that are more robust to model error."
"15581","General Policies, Subgoal Structure, and Planning Width","Blai Bonet, Hector Geffner","UNIVERSIDAD SIMON BOLIVAR, ","https://www.jair.org/index.php/jair/article/download/15581/27045","It has been observed that many classical planning domains with atomic goals can be solved by means of a simple polynomial exploration procedure, called IW, that runs in time exponential in the problem width, which in these cases is bounded and small. Yet, while the notion of width has become part of state-of-the-art planning algorithms such as BFWS, there is no good explanation for why so many benchmark domains have bounded width when atomic goals are considered. In this work, we address this question by relating bounded width with the existence of general optimal policies that in each planning instance are represented by tuples of atoms of bounded size. We also define the notions of (explicit) serializations and serialized width that have a broader scope, as many domains have a bounded serialized width but no bounded width. Such problems are solved nonoptimally in polynomial time by a variant of the Serialized IW algorithm. Finally, the language of general policies and the semantics of serializations are combined to yield a simple, meaningful, and expressive language for specifying serializations in compact form in the form of sketches, which can be used for encoding domain control knowledge by hand or for learning it from examples. Sketches express general problem decompositions in terms of subgoals, and terminating sketches of bounded width express problem decompositions that can be solved in polynomial time."
"15313","Computing Unsatisfiable Cores for LTLf Specifications","Marco Roveri, Claudio Di Ciccio, Chiara Di Francescomarino, Chiara Ghidini","DISI - University of Trento, Sapienza University of Rome, University of Trento, Fondazione Bruno Kessler","https://www.jair.org/index.php/jair/article/download/15313/27046","Linear-time temporal logic on finite traces (LTLf) is rapidly becoming a de-facto standard to produce specifications in many application domains (including planning, business process management, run-time monitoring, and reactive synthesis). Several studies have challenged the satisfiability problem thus far. In this paper, we focus instead on unsatisfiable LTLf specifications, with the objective of extracting the subset of formulae that cause inconsistencies within them, i.e., the unsatisfiable cores. We provide four algorithms to this end, which leverage the adaptation of a range of state-of-the-art algorithms to LTLf satisfiability checking. We implement those algorithms extending the respective implementations and carry out an experimental evaluation on a set of reference benchmarks, restricting to the unsatisfiable specifications. The results put in evidence that the different algorithms and tools exhibit complementary features determining their efficiency and efficacy. Indeed, our findings suggest exploring different strategies and algorithmic solutions for the extraction of unsatisfiable cores from LTLf specifications, thus confirming the challenging and multi-faceted nature of this problem."
"15595","Best of Both Worlds: Agents with Entitlements","Martin Hoefer, Marco Schmalhofer, Giovanna Varricchio",", Goethe University Frankfurt, ","https://www.jair.org/index.php/jair/article/download/15595/27047","Fair division of indivisible goods is a central challenge in artificial intelligence. For many prominent fairness criteria including envy-freeness (EF) or proportionality (PROP), no allocations satisfying these criteria might exist. Two popular remedies to this problem are randomization or relaxation of fairness concepts. A timely research direction is to combine the advantages of both, commonly referred to as Best of Both Worlds (BoBW). We consider fair division with entitlements, which allows to adjust notions of fairness to heterogeneous priorities among agents. This is an important generalization to standard fair division models and is not well-understood in terms of BoBW results. Our main result is a lottery for additive valuations and different entitlements that is ex-ante weighted envy-free (WEF), as well as ex-post weighted proportional up to one good (WPROP1) and weighted transfer envy-free up to one good (WEF(1, 1)). We show that this result is tight – ex-ante WEF is incompatible with any stronger ex-post WEF relaxation. In addition, we extend BoBW results on group fairness to entitlements and explore generalizations of our results to instances with more expressive valuation functions."
"14676","Methods for Recovering Conditional Independence Graphs: A Survey","Harsh Shrivastava, Urszula Chajewska","Microsoft Research, Microsoft Research","https://www.jair.org/index.php/jair/article/download/14676/27048","Conditional Independence (CI) graphs are a type of Probabilistic Graphical Models that are primarily used to gain insights about feature relationships. Each edge represents the partial correlation between the connected features which gives information about their direct dependence. In this survey, we list different methods and study the advances in techniques developed to recover CI graphs. We cover traditional optimization methods as well as recently developed deep learning architectures along with their recommended implementations. To facilitate wider adoption, we include preliminaries that consolidate associated operations, for example techniques to obtain covariance matrix for mixed datatypes."
"14945","The TOAD System for Totally Ordered HTN Planning","Daniel Höller","Saarland University, Saarland Informatics Campus, Saarbrücken, Germany","https://www.jair.org/index.php/jair/article/download/14945/27049","We present an approach for translating Totally Ordered Hierarchical Task Network (HTN) planning problems to classical planning problems. While this enables the use of sophisticated classical planning systems to find solutions, we need to overcome the differences in expressiveness of these two planning formalisms. Prior work on this topic did this by translating bounded HTN problems. In contrast, we approximate them, i.e., we change the problem such that every action sequence that is a solution to the HTN problem is also a solution for the classical problem, but the latter might have more solutions. To obtain a sound overall approach, we verify solutions returned by the classical planning system to ensure that they are also solutions to the HTN problem. For translation and approximation, we use techniques introduced to approximate Context-Free Languages by using Finite Automata. We named our system Toad (Totally Ordered HTN Approximation using DFA). For a subset of HTN problems the translation is even possible without approximation. Whether or not it is necessary is decided based on the property of self-embedding, which comes also from the field of formal languages. We investigate the theoretical connection of self-embedding and tail-recursiveness, a property from the HTN literature used to identify a subclass of HTN planning problems that can be translated to classical planning, and show that it is more general. To guide the classical planner, we introduce a novel heuristic tailored towards our models. We evaluate Toad on the benchmark set of the 2020 International Planning Competition. Our evaluation shows that (1) most problems can be translated without approximation and that (2) Toad is competitive with the state of the art in HTN planning."
"15604","Using Constraint Propagation to Bound Linear Programs","Tomáš Dlask, Tomáš Werner","Faculty of Electrical Engineering, Czech Technical University in Prague, Faculty of Electrical Engineering, Czech Technical University in Prague","https://www.jair.org/index.php/jair/article/download/15604/27050","We present an approach to compute bounds on the optimal value of linear programs based on constraint propagation. Given a feasible dual solution, we apply constraint propagation to the complementary slackness conditions and, if propagation succeeds to prove these conditions infeasible, the infeasibility certificate (in the sense of Farkas’ lemma) is reconstructed from the propagation history. This certificate is a dual-improving direction, which allows us to improve the bound. As constraint propagation need not always detect infeasibility of a linear inequality system, the method is not guaranteed to converge to a global solution of the linear program but only to an upper bound, whose tightness depends on the structure of the program and the used propagation method. The approach is suited for large sparse linear programs (such as LP relaxations of combinatorial optimization problems), for which the classical LP algorithms may be infeasible, if only for their super-linear space complexity. The approach can be seen as a generalization of the Virtual Arc Consistency (VAC) algorithm to bound the LP relaxation of the Weighted CSP (WCSP). We newly apply it to the LP relaxation of the Weighted Max-SAT problem, experimentally showing that the obtained bounds are often not far from optima of the relaxation and proving that they are exact for known tractable subclasses of Weighted Max-SAT."
"15451","Robust Average-Reward Reinforcement Learning","Yue Wang, Alvaro Velasquez, George Atia, Ashley Prater-Bennette, Shaofeng Zou","University of Central Florida, , , , ","https://www.jair.org/index.php/jair/article/download/15451/27051","Robust Markov decision processes (MDPs) aim to find a policy that optimizes the worst-case performance over an uncertainty set of MDPs. Existing studies mostly have focused on the robust MDPs under the discounted reward criterion, leaving the ones under the average-reward criterion largely unexplored. In this paper, we develop the first comprehensive and systematic study of robust average-reward MDPs, where the goal is to optimize the long-term average performance under the worst case. Our contributions are four-folds: (1) we prove the uniform convergence of the robust discounted value function to the robust average-reward function as the discount factor γ goes to 1; (2) we derive the robust average-reward Bellman equation, characterize the structure of its solution set, and prove the equivalence between solving the robust Bellman equation and finding the optimal robust policy; (3) we design robust dynamic programming algorithms, and theoretically characterize their convergence to the optimal policy; and (4) we design two model-free algorithms unitizing the multi-level Monte-Carlo approach, and prove their asymptotic convergence"
"16210","Counting Complexity for Reasoning in Abstract Argumentation","Johannes K. Fichte, Markus Hecher, Arne Meier","Linköping University, MIT Computer Science, Leibniz Universität Hannover","https://www.jair.org/index.php/jair/article/download/16210/27052","In this paper, we consider counting and projected model counting of extensions in abstract argumentation for various semantics, including credulous reasoning. When asking for projected counts, we are interested in counting the number of extensions of a given argumentation framework, while multiple extensions that are identical when restricted to the projected arguments count as only one projected extension. We establish classical complexity results and parameterized complexity results when the problems are parameterized by the treewidth of the undirected argumentation graph. To obtain upper bounds for counting projected extensions, we introduce novel algorithms that exploit small treewidth of the undirected argumentation graph of the input instance by dynamic programming. Our algorithms run in double or triple exponential time in the treewidth, depending on the semantics under consideration. Finally, we establish lower bounds of bounded treewidth algorithms for counting extensions and projected extension under the exponential time hypothesis (ETH)."
"15579","Simulating Counterfactuals","Juha Karvanen, Santtu Tikka, Matti Vihola","University of Jyväskylä, , ","https://www.jair.org/index.php/jair/article/download/15579/27053","Counterfactual inference considers a hypothetical intervention in a parallel world that shares some evidence with the factual world. If the evidence specifies a conditional distribution on a manifold, counterfactuals may be analytically intractable. We present an algorithm for simulating values from a counterfactual distribution where conditions can be set on both discrete and continuous variables. We show that the proposed algorithm can be presented as a particle filter leading to asymptotically valid inference. The algorithm is applied to fairness analysis in credit-scoring."
"15236","Individual Fairness, Base Rate Tracking and the Lipschitz Condition","Benjamin Eva","Duke University","https://www.jair.org/index.php/jair/article/download/15236/27055","In recent years, there has been a proliferation of competing conceptions of what it means for a predictive algorithm to treat its subjects fairly. Most approaches focus on explicating a notion of group fairness, i.e. of what it means for an algorithm to treat one group unfairly in comparison to another. In contrast, Dwork et al. (2012) attempt to carve out a formalised conception of individual fairness, i.e. of what it means for an algorithm to treat an individual fairly or unfairly. In this paper, I demonstrate that the conception of individual fairness advocated by Dwork et al. is closely related to a criterion of group fairness, called ‘base rate tracking’, introduced in Eva (2022). I subsequently show that base rate tracking solves some fundamental conceptual problems associated with the Lipschitz criterion, before arguing that group level fairness criteria are at least as powerful as their individual level counterparts when it comes to diagnosing algorithmic bias."
"15956","SAT-based Decision Tree Learning for Large Data Sets","Andre Schidler, Stefan Szeider","TU Wien, TU Wien","https://www.jair.org/index.php/jair/article/download/15956/27056","Decision trees of low depth are beneficial for understanding and interpreting the data they represent. Unfortunately, finding a decision tree of lowest complexity (depth or size) that correctly represents given data is NP-hard. Hence known algorithms either (i) utilize heuristics that do not minimize the depth or (ii) are exact but scale only to small or medium-sized instances. We propose a new hybrid approach to decision tree learning, combining heuristic and exact methods in a novel way. More specifically, we employ SAT encodings repeatedly to local parts of a decision tree provided by a standard heuristic, leading to an overall reduction in complexity. This allows us to scale the power of exact SAT-based methods to comparatively very large data sets. We evaluate our new approach experimentally on a range of real-world instances that contain up to several thousand samples. In almost all cases, our method successfully decreases the complexity of the initial decision tree; often, the decrease is significant."
"15916","Viewpoint: Hybrid Intelligence Supports Application Development for Diabetes Lifestyle Management","Bernd J. W. Dudzik, Jasper S. van der Waa, Pei-Yu Chen, Roel Dobbe, Íñigo M.D.R.  de Troya, Roos M. Bakker, Maaike  H. T.  de Boer, Quirine T.S.  Smit, Davide Dell'Anna, Emre Erdogan, Pinar Yolum, Shihan Wang, Selene Baez Santamaria, Lea Krause, Bart A. Kamphorst","Delft University of Technology, TNO, Delft University of Technology, Delft University of Technology, Delft University of Technology, TNO, TNO, TNO, Utrecht University, Utrecht University, Utrecht University, Utrecht University, Vrije Universiteit Amsterdam, Vrije Universiteit Amsterdam, Wageningen University & Research","https://www.jair.org/index.php/jair/article/download/15916/27057","Type II diabetes is a complex health condition requiring patients to closely and continuously collaborate with healthcare professionals and other caretakers on lifestyle changes. While intelligent products have tremendous potential to support such Diabetes Lifestyle Management (DLM), existing products are typically conceived from a technology-centered perspective that insufficiently acknowledges the degree to which collaboration and inclusion of stakeholders is required. In this article, we argue that the emergent design philosophy of Hybrid Intelligence (HI) forms a suitable alternative lens for research and development. In particular, we (1) highlight a series of pragmatic challenges for effective AI-based DLM support based on results from an expert focus group, and (2) argue for HI’s potential to address these by outlining relevant research trajectories."
"15986","Unifying SAT-Based Approaches to Maximum Satisfiability Solving","Hannes Ihalainen, Jeremias Berg, Matti Järvisalo",", University of Helsinki, ","https://www.jair.org/index.php/jair/article/download/15986/27058","Maximum satisfiability (MaxSAT), employing propositional logic as the declarative language of choice, has turned into a viable approach to solving NP-hard optimization problems arising from artificial intelligence and other real-world settings. A key contributing factor to the success of MaxSAT is the rise of increasingly effective exact solvers that are based on iterative calls to a Boolean satisfiability (SAT) solver. The three types of SAT-based MaxSAT solving approaches, each with its distinguishing features, implemented in current state-of-the-art MaxSAT solvers are the core-guided, the implicit hitting set (IHS), and the objective-bounding approaches. The objective-bounding approach is based on directly searching over the objective function range by iteratively querying a SAT solver if the MaxSAT instance at hand has a solution under different bounds on the objective. In contrast, both core-guided and IHS are so-called unsatisfiability-based approaches that employ a SAT solver as an unsatisfiable core extractor to determine sources of inconsistencies, but critically differ in how the found unsatisfiable cores are made use of towards finding a provably optimal solution. Furthermore, a variety of different algorithmic variants of the core-guided approach in particular have been proposed and implemented in solvers. It is well-acknowledged that each of the three approaches has its advantages and disadvantages, which is also witnessed by instance and problem-domain specific runtime performance differences (and at times similarities) of MaxSAT solvers implementing variants of the approaches. However, the questions of to what extent the approaches are fundamentally different and how the benefits of the individual methods could be combined in a single algorithmic approach are currently not fully understood. In this work, we approach these questions by developing UniMaxSAT, a general unifying algorithmic framework. Based on the recent notion of abstract cores, UniMaxSAT captures in general core-guided, IHS and objective-bounding computations. The framework offers a unified way of establishing quite generally the correctness of the current approaches. We illustrate this by formally showing that UniMaxSAT can simulate the computations of various algorithmic instantiations of the three types of MaxSAT solving approaches. Furthermore, UniMaxSAT can be instantiated in novel ways giving rise to new algorithmic variants of the approaches. We illustrate this aspect by developing a prototype implementation of an algorithmic variant for MaxSAT based on the framework."
"15786","Axiomatization of Non-Recursive Aggregates in First-Order Answer Set Programming","Jorge Fandinno, Zachary Hansen, Yuliya Lierler",", University of Nebraska Omaha, University of Nebraska Omaha","https://www.jair.org/index.php/jair/article/download/15786/27059","This paper contributes to the development of theoretical foundations of answer set programming. Groundbreaking work on the SM operator by Ferraris, Lee, and Lifschitz proposed a definition/semantics for logic (answer set) programs based on a syntactic transformation similar to parallel circumscription. That definition radically differed from its predecessors by using classical (second-order) logic and avoiding reference to either grounding or fixpoints. Yet, the work lacked the formalization of crucial and commonly used answer set programming language constructs called aggregates. In this paper, we present a characterization of logic programs with aggregates based on a many-sorted generalization of the SM operator. This characterization introduces new function symbols for aggregate operations and aggregate elements, whose meaning can be fixed by adding appropriate axioms to the result of the SM transformation. We prove that our characterization coincides with the ASP-Core-2 semantics for logic programs and, if we allow non-positive recursion through aggregates, it coincides with the semantics of the answer set solver CLINGO."
"15461","Does CLIP Know My Face?","Dominik Hintersdorf, Lukas Struppek, Manuel Brack, Felix Friedrich, Patrick Schramowski, Kristian Kersting","Technical University of Darmstadt, , , , , ","https://www.jair.org/index.php/jair/article/download/15461/27060","With the rise of deep learning in various applications, privacy concerns around the protection of training data have become a critical area of research. Whereas prior studies have focused on privacy risks in single-modal models, we introduce a novel method to assess privacy for multi-modal models, specifically vision-language models like CLIP. The proposed Identity Inference Attack (IDIA) reveals whether an individual was included in the training data by querying the model with images of the same person. Letting the model choose from a wide variety of possible text labels, the model reveals whether it recognizes the person and, therefore, was used for training. Our large-scale experiments on CLIP demonstrate that individuals used for training can be identified with very high accuracy. We confirm that the model has learned to associate names with depicted individuals, implying the existence of sensitive information that can be extracted by adversaries. Our results highlight the need for stronger privacy protection in large-scale models and suggest that IDIAs can be used to prove the unauthorized use of data for training and to enforce privacy laws. This article appears in the AI & Society track."
"15305","On the Convergence of Swap Dynamics to Pareto-Optimal Matchings","Felix Brandt, Anaëlle Wilczynski","TUM, ","https://www.jair.org/index.php/jair/article/download/15305/27061","We study whether Pareto-optimal stable matchings can be reached via pairwise swaps in one-to-one matching markets with initial assignments. We consider housing markets, marriage markets, and roommate markets as well as three different notions of swap rationality. Our main results are as follows. While it can be efficiently determined whether a Pareto-optimal stable matching can be reached when defining swaps via blocking pairs, checking whether this is the case for all such sequences is computationally intractable. When defining swaps such that all involved agents need to be better off, even deciding whether a Pareto-optimal stable matching can be reached via some sequence is intractable. This confirms and extends a conjecture made by Damamme, Beynier, Chevaleyre, and Maudet (2015) who have shown that convergence to a Pareto-optimal matching is guaranteed in housing markets with single-peaked preferences. We prove that in marriage and roommate markets, single-peakedness is not sufficient for this to hold, but the stronger restriction of one-dimensional Euclidean preferences is."
"14063","Symbolic Task Inference in Deep Reinforcement Learning","Hosein Hasanbeig, Natasha Yogananda Jeppu, Alessandro Abate, Tom Melham, Daniel Kroening","University of Oxford, University of Oxford, University of Oxford, University of Oxford, Amazon","https://www.jair.org/index.php/jair/article/download/14063/27062","This paper proposes DeepSynth, a method for effective training of deep reinforcement learning agents when the reward is sparse or non-Markovian, but at the same time progress towards the reward requires achieving an unknown sequence of high-level objectives. Our method employs a novel algorithm for synthesis of compact finite state automata to uncover this sequential structure automatically. We synthesise a human-interpretable automaton from trace data collected by exploring the environment. The state space of the environment is then enriched with the synthesised automaton, so that the generation of a control policy by deep reinforcement learning is guided by the discovered structure encoded in the automaton. The proposed approach is able to cope with both high-dimensional, low-level features and unknown sparse or non-Markovian rewards. We have evaluated DeepSynth’s performance in a set of experiments that includes the Atari game Montezuma’s Revenge, known to be challenging. Compared to approaches that rely solely on deep reinforcement learning, we obtain a reduction of two orders of magnitude in the iterations required for policy synthesis, and a significant improvement in scalability."
"15884","Tackling Cooperative Incompatibility for Zero-Shot Human-AI Coordination","Yang Li, Shao Zhang, Jichen Sun, Wenhao Zhang, Yali Du, Ying Wen, Xinbing Wang, Wei Pan","Department of Computer Science, The University of Manchester, , , , , , , University of Manchester","https://www.jair.org/index.php/jair/article/download/15884/27064","Securing coordination between AI agent and teammates (human players or AI agents) in contexts involving unfamiliar humans continues to pose a significant challenge in Zero-Shot Coordination. The issue of cooperative incompatibility becomes particularly prominent when an AI agent is unsuccessful in synchronizing with certain previously unknown partners. Traditional algorithms have aimed to collaborate with partners by optimizing fixed objectives within a population, fostering diversity in strategies and behaviors. However, these techniques may lead to learning loss and an inability to cooperate with specific strategies within the population, a phenomenon named cooperative incompatibility in learning. In order to solve cooperative incompatibility in learning and effectively address the problem in the context of ZSC, we introduce the Cooperative Open-ended LEarning (COLE) framework, which formulates open-ended objectives in cooperative games with two players using perspectives of graph theory to evaluate and pinpoint the cooperative capacity of each strategy. We present two practical algorithms, specifically COLESV and COLER, which incorporate insights from game theory and graph theory. We also show that COLE could effectively overcome the cooperative incompatibility from theoretical and empirical analysis. Subsequently, we created an online Overcooked human-AI experiment platform, the COLE platform, which enables easy customization of questionnaires, model weights, and other aspects. Utilizing the COLE platform, we enlist 130 participants for human experiments. Our findings reveal a preference for our approach over state-of-the-art methods using a variety of subjective metrics. Moreover, objective experimental outcomes in the Overcooked game environment indicate that our method surpasses existing ones when coordinating with previously unencountered AI agents and the human proxy model. Our code and demo are publicly available at https://sites.google.com/download/cole-2023."
"15135","A Hybrid Intelligence Method for Argument Mining","Michiel van der Meer, Enrico Liscio, Catholijn M. Jonker, Aske Plaat, Piek Vossen, Pradeep K. Murukannaiah","Leiden Institute of Advanced Computer Science (LIACS), TU Delft, TU Delft, Leiden Institute of Advanced Computer Science, Vrije Universiteit Amsterdam, TU Delft","https://www.jair.org/index.php/jair/article/download/15135/27065","Large-scale survey tools enable the collection of citizen feedback in opinion corpora. Extracting the key arguments from a large and noisy set of opinions helps in understanding the opinions quickly and accurately. Fully automated methods can extract arguments but (1) require large labeled datasets that induce large annotation costs and (2) work well for known viewpoints, but not for novel points of view. We propose HyEnA, a hybrid (human + AI) method for extracting arguments from opinionated texts, combining the speed of automated processing with the understanding and reasoning capabilities of humans. We evaluate HyEnA on three citizen feedback corpora. We find that, on the one hand, HyEnA achieves higher coverage and precision than a state-of-the-art automated method when compared to a common set of diverse opinions, justifying the need for human insight. On the other hand, HyEnA requires less human effort and does not compromise quality compared to (fully manual) expert analysis, demonstrating the benefit of combining human and artificial intelligence."
"15333","From Single-Objective to Bi-Objective Maximum Satisfiability Solving","Christoph Jabs, Jeremias Berg, Andreas Niskanen, Matti Järvisalo","University of Helsinki, University of Helsinki, University of Helsinki, University of Helsinki","https://www.jair.org/index.php/jair/article/download/15333/27066","The declarative approach is key to efficiently finding optimal solutions to various types of NP-hard real-world combinatorial optimization problems. Most work on practical declarative solvers—ranging from classical integer programming to finite-domain constraint optimization and maximum satisfiability (MaxSAT)—has focused on optimization under a single objective; fewer advances have been made towards efficient declarative techniques for multi-objective optimization problems. Motivated by significant recent advances in practical solvers for MaxSAT, in this work we develop BiOptSat, an exact declarative approach for finding Pareto-optimal solutions to bi-objective optimization problems, with propositional logic as the underlying constraint language. BiOptSat can be viewed as an instantiation of the lexicographic method. The approach makes use of a single Boolean satisfiability solver that is incrementally employed throughout the entire search procedure, allowing for finding a single Pareto-optimal solution, finding one representative solution for each non-dominated point, and enumerating all Pareto-optimal solutions. We detail several algorithmic instantiations of BiOptSat, each building on recent algorithms proposed for single-objective MaxSAT. We empirically evaluate the instantiations compared to recently-proposed alternative approaches to multi-objective MaxSAT solving on several real-world domains from the literature, showing the practical benefits of our approach."
"15407","Computational Argumentation-based Chatbots: A Survey","Federico Castagna, Nadin Kökciyan, Isabel Sassoon, Simon Parsons, Elizabeth Sklar","University of Lincoln, , , , ","https://www.jair.org/index.php/jair/article/download/15407/27067","Chatbots are conversational software applications designed to interact dialectically with users for a plethora of different purposes. Surprisingly, these colloquial agents have only recently been coupled with computational models of arguments (i.e. computational argumentation), whose aim is to formalise, in a machine-readable format, the ordinary exchange of information that characterises human communications. Chatbots may employ argumentation with different degrees and in a variety of manners. The present survey sifts through the literature to review papers concerning this kind of argumentation-based bot, drawing conclusions about the benefits and drawbacks that this approach entails in comparison with standard chatbots, while also envisaging possible future development and integration with the Transformer-based architecture and state-of-the-art Large Language models."
"14990","Towards Trustworthy AI-Enabled Decision Support Systems: Validation of the Multisource AI Scorecard Table (MAST)","Pouria Salehi, Yang Ba, Nayoung Kim, Ahmadreza Mosallanezhad, Anna Pan, Myke C. Cohen, Yixuan Wang, Jieqiong Zhao, Shawaiz Bhatti, James Sung, Erik Blasch, Michelle V. Mancenido, Erin K. Chiou",", , , , , , , , , , , , Arizona State University","https://www.jair.org/index.php/jair/article/download/14990/27068","The Multisource AI Scorecard Table (MAST) is a checklist tool to inform the design and evaluation of trustworthy AI systems based on the U.S. Intelligence Community’s analytic tradecraft standards. In this study, we investigate whether MAST can be used to differentiate between high and low trustworthy AI-enabled decision support systems (AI-DSSs). Evaluating trust in AI-DSSs poses challenges to researchers and practitioners. These challenges include identifying the components, capabilities, and potential of these systems, many of which are based on the complex deep learning algorithms that drive DSS performance and preclude complete manual inspection. Using MAST, we developed two interactive AI-DSS testbeds. One emulated an identity-verification task in security screening, and another emulated a text-summarization system to aid in an investigative task. Each testbed had one version designed to reach low MAST ratings, and another designed to reach high MAST ratings. We hypothesized that MAST ratings would be positively related to the trust ratings of these systems. A total of 177 subject-matter experts were recruited to interact with and evaluate these systems. Results generally show higher MAST ratings for the high-MAST compared to the low-MAST groups, and that measures of trust perception are highly correlated with the MAST ratings. We conclude that MAST can be a useful tool for designing and evaluating systems that will engender trust perceptions, including for AI-DSS that may be used to support visual screening or text summarization tasks. However, higher MAST ratings may not translate to higher joint performance, and the connection between MAST and appropriate trust or trustworthiness remains an open question."
"15550","The Complexity of Subelection Isomorphism Problems","Piotr Faliszewski, Krzysztof Sornat, Stanisław Szufa",", Dalle Molle Institute for Artificial Intelligence, AGH University, Jagiellonian University","https://www.jair.org/index.php/jair/article/download/15550/27069","We study extensions of the Election Isomorphism problem, focused on the existence of isomorphic subelections. Specifically, we propose the Subelection Isomorphism and the Maximum Common Subelection problems and study their computational complexity and approximability. Using our problems in experiments, we provide some insights into the nature of several statistical models of elections."
"15800","Mixed Fair Division: A Survey","Shengxin Liu, Xinhang Lu, Mashbat Suzuki, Toby Walsh","Harbin Institute of Technology, Shenzhen, China, UNSW Sydney, Australia, UNSW Sydney, Australia, UNSW Sydney, Australia","https://www.jair.org/index.php/jair/article/download/15800/27070","Fair division considers the allocation of scarce resources among agents in such a way that every agent gets a fair share. It is a fundamental problem in society and has received significant attention and rapid developments from the game theory and artificial intelligence communities in recent years. The majority of the fair division literature can be divided along at least two orthogonal directions: goods versus chores, and divisible versus indivisible resources. In this survey, besides describing the state of the art, we outline a number of interesting open questions and future directions in three mixed fair division settings: (i) indivisible goods and chores, (ii) divisible and indivisible goods (mixed goods), and (iii) indivisible goods with subsidy which can be viewed like a divisible good."
"15679","Probabilities of the Third Type: Statistical Relational Learning and Reasoning with Relative Frequencies","Felix Weitkämper","Ludwig-Maximilians-Universität München","https://www.jair.org/index.php/jair/article/download/15679/27071","Dependencies on the relative frequency of a state in the domain are common when modelling probabilistic dependencies on relational data. For instance, the likelihood of a school closure during an epidemic might depend on the proportion of infected pupils exceeding a threshold. Often, rather than depending on discrete thresholds, dependencies are continuous: for instance, the likelihood of any one mosquito bite transmitting an illness depends on the proportion of carrier mosquitoes. Current approaches usually only consider probabilities over possible worlds rather than over domain elements themselves. An exception are the recently introduced Lifted Bayesian Networks for Conditional Probability Logic, which express discrete dependencies on probabilistic data. We introduce functional lifted Bayesian networks, a formalism that explicitly incorporates continuous dependencies on relative frequencies into statistical relational artificial intelligence. and compare and contrast them with lifted Bayesian Networks for Conditional Probability Logic. Incorporating relative frequencies is not only beneficial to modelling; it also provides a more rigorous approach to learning problems where training and test or application domains have different sizes. To this end, we provide a representation of the asymptotic probability distributions induced by functional lifted Bayesian networks on domains of increasing sizes. Since that representation has well-understood scaling behaviour across domain sizes, it can be used to estimate parameters for a large domain consistently from randomly sampled subpopulations. Furthermore, we show that in parametric families of FLBN, convergence is uniform in the parameters, which ensures a meaningful dependence of the asymptotic probabilities on the parameters of the model."
"15827","MallobSat: Scalable SAT Solving by Clause Sharing","Dominik Schreiber, Peter Sanders","Karlsruhe Institute of Technology, Karlsruhe Institute of Technology","https://www.jair.org/index.php/jair/article/download/15827/27072","SAT solving in large distributed environments has previously led to some famous results and to impressive speedups for selected inputs. However, in terms of general-purpose SAT solving, prior approaches still cannot make efficient use of a large number of processors. We aim to address this issue with a complete and systematic overhaul of the distributed solver HordeSat with a focus on its algorithmic building blocks. In particular, we present a communication-efficient approach to clause sharing, careful buffering and filtering of produced clauses, and effective orchestration of state-of-the-art solver backends. In extensive evaluations, our approach named MallobSat significantly outperforms an updated HordeSat, doubling its mean speedup. Our clause sharing results in effective parallelization even if all threads execute identical solver programs that only differ based on which clauses they import at which times. We thus argue that MallobSat is not a portfolio solver with the added bonus of clause sharing but rather a clause-sharing solver where adding some explicit diversification is useful but not essential. We also discuss the last four iterations of the International SAT Competition (2020–2023), where our system ranked very favorably, and identify several previously unsolved competition problems that MallobSat solved successfully. Last but not least, our approach is malleable, i.e., supports running on a fluctuating set of resources, which allows us to combine parallel job processing and parallel SAT solving in a flexible manner for best resource efficiency"
"15865","Language-Models-as-a-Service: Overview of a New Paradigm and its Challenges","Emanuele La Malfa, Aleksandar Petrov, Simon Frieder, Christoph Weinhuber, Ryan Burnell, Raza Nazar, Anthony Cohn, Nigel Shadbolt, Michael Wooldridge","University of Oxford, , , , , , , , ","https://www.jair.org/index.php/jair/article/download/15865/27073","Some of the most powerful language models currently are proprietary systems, accessible only via (typically restrictive) web or software programming interfaces. This is the Language-Models-as-a-Service (LMaaS) paradigm. In contrast with scenarios where full model access is available, as in the case of open-source models, such closed-off language models present specific challenges for evaluating, benchmarking, and testing them. This paper has two goals: on the one hand, we delineate how the aforementioned challenges act as impediments to the accessibility, reproducibility, reliability, and trustworthiness of LMaaS. We systematically examine the issues that arise from a lack of information about language models for each of these four aspects. We conduct a detailed analysis of existing solutions, put forth a number of recommendations, and highlight directions for future advancements. On the other hand, it serves as a synthesized overview of the licences and capabilities of the most popular LMaaS."
"15960","The RL/LLM Taxonomy Tree: Reviewing Synergies Between Reinforcement Learning and Large Language Models","Moschoula Pternea, Prerna Singh, Abir Chakraborty, Yagna Oruganti, Mirco Milletari, Sayli Bapat, Kebei Jiang","Microsoft, Microsoft, Microsoft, Microsoft, Microsoft, Microsoft, Microsoft","https://www.jair.org/index.php/jair/article/download/15960/27074","In this work, we review research studies that combine Reinforcement Learning (RL) and Large Language Models (LLMs), two areas that owe their momentum to the development of Deep Neural Networks (DNNs). We propose a novel taxonomy of three main classes based on the way that the two model types interact with each other. The first class, RL4LLM, includes studies where RL is leveraged to improve the performance of LLMs on tasks related to Natural Language Processing (NLP). RL4LLM is divided into two sub-categories depending on whether RL is used to directly fine-tune an existing LLM or to improve the prompt of the LLM. In the second class, LLM4RL, an LLM assists the training of an RL model that performs a task that is not inherently related to natural language. We further break down LLM4RL based on the component of the RL training framework that the LLM assists or replaces, namely reward shaping, goal generation, and policy function. Finally, in the third class, RL+LLM, an LLM and an RL agent are embedded in a common planning framework without either of them contributing to training or fine-tuning of the other. We further branch this class to distinguish between studies with and without natural language feedback. We use this taxonomy to explore the motivations behind the synergy of LLMs and RL and explain the reasons for its success, while pinpointing potential shortcomings and areas where further research is needed, as well as alternative methodologies that serve the same goal."
"15484","Scalable Distributed Algorithms for Size-Constrained Submodular Maximization in the MapReduce and Adaptive Complexity Models","Yixin Chen, Tonmoy Dey, Alan Kuhnle","Texas A&M University, Florida State University, Texas A&M University","https://www.jair.org/index.php/jair/article/download/15484/27075","Distributed maximization of a submodular function in the MapReduce (MR) model has received much attention, culminating in two frameworks that allow a centralized algorithm to be run in the MR setting without loss of approximation, as long as the centralized algorithm satisfies a certain consistency property – which had previously only been known to be satisfied by the standard greedy and continous greedy algorithms. A separate line of work has studied parallelizability of submodular maximization in the adaptive complexity model, where each thread may have access to the entire ground set. For the size-constrained maximization of a monotone and submodular function, we show that several sublinearly adaptive (highly parallelizable) algorithms satisfy the consistency property required to work in the MR setting, which yields practical, parallelizable and distributed algorithms. Separately, we develop the first distributed algorithm with linear query complexity for this problem. Finally, we provide a method to increase the maximum cardinality constraint for MR algorithms at the cost of additional MR rounds."
"15320","Decision-Focused Learning: Foundations, State of the Art, Benchmark and Future Opportunities","Jayanta Mandi, James Kotary, Senne Berden, Maxime  Mulamba, Victor Bucarey, Tias Guns, Ferdinando  Fioretto",", , , , , , ","https://www.jair.org/index.php/jair/article/download/15320/27076","Decision-focused learning (DFL) is an emerging paradigm that integrates machine learning (ML) and constrained optimization to enhance decision quality by training ML models in an end-to-end system. This approach shows significant potential to revolutionize combinatorial decision-making in real-world applications that operate under uncertainty, where estimating unknown parameters within decision models is a major challenge. This paper presents a comprehensive review of DFL, providing an in-depth analysis of both gradient-based and gradient-free techniques used to combine ML and constrained optimization. It evaluates the strengths and limitations of these techniques and includes an extensive empirical evaluation of eleven methods across seven problems. The survey also offers insights into recent advancements and future research directions in DFL."
"15566","The Goal after Tomorrow: Offline Goal Reasoning with Norms","Pere Pardo, Christian Strasser","University of Luxembourg, Institute of Philosophy II, Ruhr-Universität Bochum","https://www.jair.org/index.php/jair/article/download/15566/27079","Recent studies have focused on autonomous agents that select their own goals and then select actions to achieve these goals, using online Goal Reasoning (GR). GR agents can revise goals and plans at execution time if unexpected outcomes occur. However, for ethical or legal agent design, even the partial execution of an online plan may result in foreseeable norm violations. To prevent these violations, it is crucial to incorporate GR already at the planning phase. To this end, we design an offline GR system that can harbour normative systems or deontic logics for goal generation. Our main results include a characterization and comparison of the completeness classes for a variety of offline GR planners, and a discussion of the irreducibility of offline GR to pure planning methods."
"16422","Understanding What Affects the Generalization Gap in Visual Reinforcement Learning: Theory and Empirical Evidence","Jiafei Lyu, Le Wan, Xiu Li, Zongqing Lu","Tsinghua University, , , ","https://www.jair.org/index.php/jair/article/download/16422/27080","Recently, there are many efforts attempting to learn useful policies for continuous control in visual reinforcement learning (RL). In this scenario, it is important to learn a generalizable policy, as the testing environment may differ from the training environment, e.g., there exist distractors during deployment. Many practical algorithms are proposed to handle this problem. However, to the best of our knowledge, none of them provide a theoretical understanding of what affects the generalization gap and why their proposed methods work. In this paper, we bridge this issue by theoretically answering the key factors that contribute to the generalization gap when the testing environment has distractors. Our theories indicate that minimizing the representation distance between training and testing environments, which aligns with human intuition, is the most critical for the benefit of reducing the generalization gap. Our theoretical results are supported by the empirical evidence in the DMControl Generalization Benchmark (DMC-GB)."
"16653","The State of Computer Vision Research in Africa","Abdul-Hakeem Omotayo, Ashery Mbilinyi, Lukman Ismaila, Houcemeddine Turki, Mahmoud Abdien, Karim Gamal, Idriss Tondji, Yvan Pimi, Naome A. Etori, Marwa M. Matar, Clifford Broni-Bediako, Abigail Oppong, Mai Gamal, Eman Ehab, Gbetondji Dovonon, Zainab Akinjobi, Daniel Ajisafe, Oluwabukola G. Adegboro, Mennatullah Siam","University of California, Davis, University of British Columbia, Canada, Johns Hopkins University, University of Sfax, Tunisia, Queen’s University, Canada, Queen’s University, Canada, African Institute for Mathematical Sciences (AIMS/AMMI), Senegal, African Institute for Mathematical Sciences (AIMS/AMMI), Senegal, University of Minnesota-Twin Cities, USA, Al-Azhar University in Cairo, Egypt, RIKEN Center for Advanced Intelligence Project, Japan, Ashesi University, Ghana, German University in Cairo, Egypt, Nile University, Egypt, University College London, UK, New Mexico’s State University, USA, University of British Columbia, Canada, Dublin City University, Ireland, Ontario Tech University, Canada","https://www.jair.org/index.php/jair/article/download/16653/27081","Despite significant efforts to democratize artificial intelligence (AI), computer vision which is a sub-field of AI, still lags in Africa. A significant factor to this, is the limited access to computing resources, datasets, and collaborations. As a result, Africa’s contribution to top-tier publications in this field has only been 0.06% over the past decade. Towards improving the computer vision field and making it more accessible and inclusive, this study analyzes 63,000 Scopus-indexed computer vision publications from Africa. We utilize large language models to automatically parse their abstracts, to identify and categorize topics and datasets. This resulted in listing more than 100 African datasets. Our objective is to provide a comprehensive taxonomy of dataset categories to facilitate better understanding and utilization of these resources. We also analyze collaboration trends of researchers within and outside the continent. Additionally, we conduct a large-scale questionnaire among African computer vision researchers to identify the structural barriers they believe require urgent attention. In conclusion, our study offers a comprehensive overview of the current state of computer vision research in Africa, to empower marginalized communities to participate in the design and development of computer vision systems."
"15483","Separating and Collapsing Electoral Control Types","Benjamin Carleton, Michael C. Chavrimootoo, Lane A. Hemaspaandra, David E. Narváez, Conor Taliancich, Henry B. Welles","Cornell University, University of Rochester, University of Rochester, Virginia Tech, Property Matrix, University of Rochester","https://www.jair.org/index.php/jair/article/download/15483/27082","Electoral control refers to attacking elections by adding, deleting, or partitioning voters or candidates. Hemaspaandra, Hemaspaandra, and Menton recently discovered, for seven pairs (T, T′) of seemingly distinct standard electoral control types, that T and T′ are in practice identical: For each input I and each election system E, I is a “yes” instance of both T and T′ under E, or of neither. Surprisingly, this had previously gone undetected even as the field was score-carding how many standard control types various election systems were resistant to; various “different” cells on such score cards were, unknowingly, duplicate effort on the same issue. This naturally raises the worry that perhaps other pairs of control types are identical, and so work still is being needlessly duplicated. We completely determine, for all standard control types, which pairs are, for elections whose votes are linear orderings of the candidates, always identical. In particular, we prove that no identical control pairs exist beyond the known seven. We also for three central election systems completely determine which control pairs are identical (“collapse”) with respect to those particular election systems, and we also explore containment and incomparability relationships between control pairs. For approval voting, which has a different “type” for its votes, Hemaspaandra, Hemaspaandra, and Menton’s seven collapses still hold (since we observe that their argument applies to all election systems). However, we find 14 additional collapses that hold for approval voting but do not hold for some election systems whose votes are linear orderings of the candidates. We find one new collapse for veto elections and none for plurality. We prove that each of the three election systems mentioned have no collapses other than those inherited from Hemaspaandra, Hemaspaandra, and Menton or added in the present paper. We establish many new containment relationships between separating control pairs, and for each separating pair of standard control types classify its separation in terms of either containment (always, and strict on some inputs) or incomparability. Our work, for the general case and these three important election systems, clarifies the landscape of the 44 standard control types, for each pair collapsing or separating them, and also providing finer-grained information on the separations."
"15118","Opening the Analogical Portal to Explainability: Can Analogies Help Laypeople in AI-assisted Decision Making?","Gaole He, Agathe Balayn, Stefan Buijsman, Jie Yang, Ujwal Gadiraju","Delft University of Technology, , , , ","https://www.jair.org/index.php/jair/article/download/15118/27085","Concepts are an important construct in semantics, based on which humans understand the world with various levels of abstraction. With the recent advances in explainable artificial intelligence (XAI), concept-level explanations are receiving an increasing amount of attention from the broad research community. However, laypeople may find such explanations difficult to digest due to the potential knowledge gap and the concomitant cognitive load. Inspired by prior work that has explored analogies and sensemaking, we argue that augmenting concept-level explanations with analogical inference information from commonsense knowledge can be a potential solution to tackle this issue. To investigate the validity of our proposition, we first designed an effective analogy-based explanation generation method and collected 600 analogy-based explanations from 100 crowd workers. Next, we proposed a set of structured dimensions for the qualitative assessment of such explanations, and conducted an empirical evaluation of the generated analogies with experts. Our findings revealed significant positive correlations between the qualitative dimensions of analogies and the perceived helpfulness of analogy-based explanations, suggesting the effectiveness of the dimensions. To understand the practical utility and the effectiveness of analogybased explanations in assisting human decision-making, we conducted a follow-up empirical study (N = 280) on a skin cancer detection task with non-expert humans and an imperfect AI system. Thus, we designed a between-subjects study spanning five different experimental conditions with varying types of explanations. The results of our study confirmed that a knowledge gap can prevent participants from understanding concept-level explanations. Consequently, when only the target domain of our designed analogy-based explanation was provided (in a specific experimental condition), participants demonstrated relatively more appropriate reliance on the AI system. In contrast to our expectations, we found that analogies were not effective in fostering appropriate reliance. We carried out a qualitative analysis of the open-ended responses from participants in the study regarding their perceived usefulness of explanations and analogies. Our findings suggest that human intuition and the perceived plausibility of analogies may have played a role in affecting user reliance on the AI system. We also found that the understanding of commonsense explanations varied with the varying experience of the recipient user, which points out the need for further work on personalization when leveraging commonsense explanations. In summary, although we did not find quantitative support for our hypotheses around the benefits of using analogies, we found considerable qualitative evidence suggesting the potential of high-quality analogies in aiding non-expert users in their decision making with AI-assistance. These insights can inform the design of future methods for the generation and use of effective analogy-based explanations."
"16174","Digraph k-Coloring Games: New Algorithms and Experiments","Andrea D'Ascenzo, Mattia D'Emidio, Michele Flammini, Gianpiero Monaco","University of L'Aquila, University of L'Aquila, Gran Sasso Science Institute, University of Chieti-Pescara","https://www.jair.org/index.php/jair/article/download/16174/27086","We study digraph k-coloring games where strategic agents are vertices of a digraph and arcs represent agents' mutual unidirectional conflicts/idiosyncrasies. Each agent can select, as strategy, one of k different colors, and her payoff in a given state (a k-coloring) is given by the number of outgoing neighbors with a color different from her one. Such games model lots of strategic real-world scenarios and are related to several fundamental classes of anti-coordination games. Unfortunately, the problem of understanding whether an instance of the game admits a pure Nash equilibrium (NE), i.e., a state where no agent can improve her payoff by changing strategy, is NP-complete. Thus, in this paper, we focus on algorithms to compute an approximate NE: informally, a coloring is an approximate γ-NE, for some γ ≥ 1, if no agent can improve her payoff, by changing strategy, by a multiplicative factor of γ.  Our contribution is manifold and of both theoretical and experimental nature. First, we characterize the hardness of finding pure and approximate equilibria in both general and special classes of digraphs. Second, we design and analyze three approximation algorithms with different theoretical guarantees on the approximation ratio, under different conditions; (i) algorithm APPROX-1 which computes, for any k ≥ 3, a Δo-NE for any n vertex graph having a maximum outdegree of Δo, in polynomial time; (ii) algorithm LLL-SPE, a randomized algorithm that, for any constant k ≥ 2, determines a γ-NE for some constant γ but only in digraphs whose minimum outdegree is sufficiently large, in polynomial time in expectation; (iii) algorithm APPROX-3 which, for any ε, computes a (1+ε)-NE by using O(log(n)/ε) colors, for any n-vertex digraph. Note that, the latter shows that a (1+ε)-NE exists and can be computed in polynomial time for k = O(log(n)).  Finally, to assess how proposed algorithms behave in the typical case, we complete our study with an extensive experimental evaluation showing that, while newly introduced algorithms achieve bounded worst case behavior, they generally perform poorly in practice. Motivated by such unsatisfactory performance, we shift our attention to the best-response paradigm, successfully applied to other classes of games, and design and experimentally evaluate it a heuristic based on such paradigm. Our experiments provide strong evidences of such approach outperforming, in terms of approximation and computational time, all other methods and hence identify it as the most suited candidate for practical usage. More remarkably, it is also able to compute exact, pure NE in the great majority of cases. This suggests that, while these games are known to not always possess a pure NE, such an equilibrium often exists and can be efficiently computed, even by a distributed uncoordinated interaction of the agents."
"15932","The Effect of Preferences in Abstract Argumentation under a Claim-Centric View","Michael Bernreiter, Wolfgang Dvořák, Anna Rapberger, Stefan Woltran","TU Wien, TU Wien, Imperial College London, TU Wien","https://www.jair.org/index.php/jair/article/download/15932/27087","In this paper, we study the effect of preferences in abstract argumentation under a claim-centric perspective. Recent work has revealed that semantical and computational properties can change when reasoning is performed on claim-level rather than on the argument-level, while under certain natural restrictions (arguments with the same claims have the same outgoing attacks) these properties are conserved. We now investigate these effects when, in addition, preferences have to be taken into account and consider four prominent reductions to handle preferences between arguments. As we shall see, these reductions give rise to four new classes of claim-augmented argumentation frameworks. These classes behave differently from each other with respect to semantic properties and computational complexity, but also in connection with structured argumentation formalisms such as assumption-based argumentation. This strengthens the view that the actual choice for handling preferences has to be taken with care."
"15665","The Human in Interactive Machine Learning: Analysis and Perspectives for Ambient Intelligence","Kevin Delcourt, Sylvie Trouilhet, Jean-Paul Arcangeli, Françoise Adreit","Institut de Recherche en Informatique de Toulouse, , , ","https://www.jair.org/index.php/jair/article/download/15665/27088","As the vision of Ambient Intelligence (AmI) becomes more feasible, the challenge of designing effective and usable human-machine interaction in this context becomes increasingly important. Interactive Machine Learning (IML) offers a set of techniques and tools to involve end-users in the machine learning process, making it possible to build more trustworthy and adaptable ambient systems. In this paper, our focus is on exploring approaches to effectively integrate and assist human users within ML-based AmI systems. Through a survey of key IML-related contributions, we identify principles for designing effective human-AI interaction in AmI applications. We apply them to the case of Opportunistic Composition, which is an approach to achieve AmI, to enhance collaboration between humans and Artificial Intelligence. Our study highlights the need for user-centered and context-aware design, and provides insights into the challenges and opportunities of integrating IML techniques into AmI systems."
"16041","Uncertainty as a Fairness Measure","Selim Kuzucu, Jiaee Cheong, Hatice Gunes, Sinan Kalkan",", University of Cambridge, , ","https://www.jair.org/index.php/jair/article/download/16041/27089","Unfair predictions of machine learning (ML) models impede their broad acceptance in real-world settings. Tackling this arduous challenge first necessitates defining what it means for an ML model to be fair. This has been addressed by the ML community with various measures of fairness that depend on the prediction outcomes of the ML models, either at the group-level or the individual-level. These fairness measures are limited in that they utilize point predictions, neglecting their variances, or uncertainties, making them susceptible to noise, missingness and shifts in data. In this paper, we first show that a ML model may appear to be fair with existing point-based fairness measures but biased against a demographic group in terms of prediction uncertainties. Then, we introduce new fairness measures based on different types of uncertainties, namely, aleatoric uncertainty and epistemic uncertainty. We demonstrate on many datasets that (i) our uncertaintybased measures are complementary to existing measures of fairness, and (ii) they provide more insights about the underlying issues leading to bias."
"15024","Efficient and Fair Healthcare Rationing","Haris Aziz, Florian Brandl",", University of Bonn, Germany","https://www.jair.org/index.php/jair/article/download/15024/27090","The rationing of healthcare resources has emerged as an important issue, which has been discussed by medical experts, policy-makers, and the general public. We consider a rationing problem where medical units are to be allocated to patients. Each unit is reserved for one of several categories, and each category has a priority ranking over the patients. We present a class of allocation rules that respect the priorities, comply with the eligibility requirements, allocate the largest feasible number of units, and do not penalize agents for rising in the priority ranking of a category. The rules characterize all possible allocations that satisfy the first three properties and are polynomial-time computable."
"15244","Inverting Cryptographic Hash Functions via Cube-and-Conquer","Oleg Zaikin","Swansea University","https://www.jair.org/index.php/jair/article/download/15244/27091","MD4 and MD5 are fundamental cryptographic hash functions proposed in the early 1990s. MD4 consists of 48 steps and produces a 128-bit hash given a message of arbitrary finite size. MD5 is a more secure 64-step extension of MD4. Both MD4 and MD5 are vulnerable to practical collision attacks, yet it is still not realistic to invert them, i.e., to find a message given a hash. In 2007, the 39-step version of MD4 was inverted by reducing to SAT and applying a CDCL solver along with the so-called Dobbertin’s constraints. As for MD5, in 2012 its 28-step version was inverted via a CDCL solver for one specified hash without adding any extra constraints. In this study, Cube-and-Conquer (a combination of CDCL and lookahead) is applied to invert step-reduced versions of MD4 and MD5. For this purpose, two algorithms are proposed. The first one generates inverse problems for MD4 by gradually modifying the Dobbertin’s constraints. The second algorithm tries the cubing phase of Cube-and-Conquer with different cutoff thresholds to find the one with the minimum runtime estimate of the conquer phase. This algorithm operates in two modes: (i) estimating the hardness of a given propositional Boolean formula; (ii) incomplete SAT solving of a given satisfiable propositional Boolean formula. While the first algorithm is focused on inverting step-reduced MD4, the second one is not area-specific and is therefore applicable to a variety of classes of hard SAT instances. In this study, 40-, 41-, 42-, and 43-step MD4 are inverted for the first time via the first algorithm and the estimating mode of the second algorithm. Also, 28-step MD5 is inverted for four hashes via the incomplete SAT solving mode of the second algorithm. For three hashes out of them, it is done for the first time."
"15178","A Fortiori Case-Based Reasoning: From Theory to Data","Wijnand van Woerkom, Davide Grossi, Henry Prakken, Bart Verheij","Universiteit Utrecht, , , ","https://www.jair.org/index.php/jair/article/download/15178/27092","The widespread application of uninterpretable machine learning systems for sensitive purposes has spurred research into elucidating the decision-making process of these systems. These efforts have their background in many different disciplines, one of which is the field of AI & law. In particular, recent works have observed that machine learning training data can be interpreted as legal cases. Under this interpretation, the formalism developed to study case law, called the theory of precedential constraint, can be used to analyze the way in which machine learning systems draw on training data—or should draw on them—to make decisions. In the present work, we advance the theory underlying these explanation methods, by relating it to order theory and logic. This allows us to write a software implementation of the theory that can be used to compute with the definitions and give automatic proofs of the properties of the model. We use this implementation to evaluate the model on a series of datasets. Through this analysis, we characterize the types of datasets that are more, or less, suitable to be described by the theory."
"16299","Expected 1.x Makespan-Optimal Multi-Agent Path Finding on Grid Graphs in Low Polynomial Time","Teng Guo, Jingjin Yu",", ","https://www.jair.org/index.php/jair/article/download/16299/27094","Multi-Agent Path Finding (MAPF) is NP-hard to solve optimally, even on graphs, suggesting no polynomial-time algorithms can compute exact optimal solutions for them. This raises a natural question: How optimal can polynomial-time algorithms reach? Whereas algorithms for computing constant-factor optimal solutions have been developed, the constant factor is generally very large, limiting their application potential. In this work, among other breakthroughs, we propose the first low-polynomial-time MAPF algorithms delivering 1-1.5 (resp., 1-1.67) asymptotic makespan optimality guarantees for 2D (resp., 3D) grids for random instances at a very high 1/3 agent density, with high probability. Moreover, when regularly distributed obstacles are introduced, our methods experience no performance degradation. These methods generalize to support 100% agent density. Regardless of the dimensionality and density, our high-quality methods are enabled by a unique hierarchical integration of two key building blocks. At the higher level, we apply the labeled Grid Rearrangement Algorithm (GRA), capable of performing efficient reconfiguration on grids through row/column shuffles. At the lower level, we devise novel methods that efficiently simulate row/column shuffles returned by GRA. Our implementations of GRA-based algorithms are highly effective in extensive numerical evaluations, demonstrating excellent scalability compared to other SOTA methods. For example, in 3D settings, GRA-based algorithms readily scale to grids with over 370,000 vertices and over 120,000 agents and consistently achieve conservative makespan optimality approaching 1.5, as predicted by our theoretical analysis."
"16457","Towards Robust Offline-to-Online Reinforcement Learning via Uncertainty and Smoothness","Xiaoyu Wen, Xudong Yu, Rui Yang, Haoyuan Chen, Chenjia Bai, Zhen Wang","Northwestern Polytechnical University, Harbin Institute of Technology, The Hong Kong University of Science and Technology, Northwestern Polytechnical University, Shanghai Artificial Intelligence Laboratory and Shenzhen Research Institute of Northwestern Polytechnical University, Northwestern Polytechnical University","https://www.jair.org/index.php/jair/article/download/16457/27095","To obtain a near-optimal policy with fewer interactions in Reinforcement Learning (RL), a promising approach involves the combination of offline RL, which enhances sample efficiency by leveraging offline datasets, and online RL, which explores informative transitions by interacting with the environment. Offline-to-Online RL provides a paradigm for improving an offline-trained agent within limited online interactions. However, due to the significant distribution shift between online experiences and offline data, most offline RL algorithms suffer from performance drops and fail to achieve stable policy improvement in offline-to-online adaptation. To address this problem, we propose the Robust Offlineto-Online (RO2O) algorithm, designed to enhance offline policies through uncertainty and smoothness, and to mitigate the performance drop in online adaptation. Specifically, RO2O incorporates Q-ensemble for uncertainty penalty and adversarial samples for policy and value smoothness, which enable RO2O to maintain a consistent learning procedure in online adaptation without requiring special changes to the learning objective. Theoretical analyses in linear MDPs demonstrate that the uncertainty and smoothness lead to tighter optimality bound in offline-to-online against distribution shift. Experimental results illustrate the superiority of RO2O in facilitating stable offline-to-online learning and achieving significant improvement with limited online interactions."
"16019","A Unified Perspective on Value Backup and Exploration in Monte-Carlo Tree Search","Tuan Dam, Carlo D'Eramo, Jan Peters, Joni Pajarinen",", , , ","https://www.jair.org/index.php/jair/article/download/16019/27096","Monte-Carlo Tree Search (MCTS) is a class of methods for solving complex decisionmaking problems through the synergy of Monte-Carlo planning and Reinforcement Learning (RL). The highly combinatorial nature of the problems commonly addressed by MCTS requires the use of efficient exploration strategies for navigating the planning tree and quickly convergent value backup methods. These crucial problems are particularly evident in recent advances that combine MCTS with deep neural networks for function approximation. In this work, we propose two methods for improving the convergence rate and exploration based on a newly introduced backup operator and entropy regularization. We provide strong theoretical guarantees to bound convergence rate, approximation error, and regret of our methods. Moreover, we introduce a mathematical framework based on the use of the α-divergence for backup and exploration in MCTS. We show that this theoretical formulation unifies different approaches, including our newly introduced ones, under the same mathematical framework, allowing to obtain different methods by simply changing the value of α. In practice, our unified perspective offers a flexible way to balance between exploration and exploitation by tuning the single α parameter according to the problem at hand. We validate our methods through a rigorous empirical study from basic toy problems to the complex Atari games, and including both MDP and POMDP problems."
"16373","Effi ciently Adapt to New Dynamic via Meta-Model","Kaixin Huang, Chen Zhao, Chun Yuan",", , ","https://www.jair.org/index.php/jair/article/download/16373/27098","We delve into the realm of offline meta-reinforcement learning (OMRL), a practical paradigm in the field of reinforcement learning that leverages offline data to adapt to new tasks. While prior approaches have not explored the utilization of context-based dynamical models to tackle OMRL problems, our research endeavors to fill this gap. Our investigation uncovers shortcomings in existing context-based methods, primarily related to distribution shifts during offline learning and challenges in establishing stable task representations. To address these issues, we formulate the problem as Hidden-Parameter MDPs and propose a framework for effective model adaptation using meta-models plus latent variables, which is inferred by the transformer-based system recognition module trained in an unsupervised fashion. Through extensive experimentation encompassing diverse simulated robotics and control tasks, we validate the efficacy of our approach and demonstrate its superior generalization ability compared to existing schemes, and explore multiple strategies for obtaining policies with personalized models. Our method achieves a model with reduced prediction error, outperforming previous methods in policy performance, and facilitating efficient adaptation when compared to prior dynamic model generalization methods and OMRL algorithms."
"15273","Truth-tracking with Non-expert Information Sources","Joseph Singleton, Richard Booth",", ","https://www.jair.org/index.php/jair/article/download/15273/27099","We study what can be learned when receiving propositional reports from multiple nonexpert information sources. We suppose that sources report all that they consider possible, given their expertise. This may result in false and inconsistent reports when sources lack expertise on a topic. A learning method is truth-tracking, roughly speaking, if it eventually converges to correct beliefs about the “actual” world. This involves finding both the actual state of affairs in the domain described by the sources, and finding the extent of the expertise of the sources themselves. We investigate the extent to which truth-tracking is possible, and describe what information can be learned even if the actual world cannot be pinned down uniquely. We find that a broad spread of expertise among the sources allows the actual state of affairs to be found, even if no individual source is an expert on all topics. On the other hand, narrower expertise at the individual level allows the actual expertise to be found more easily. Finally, we turn to learning methods themselves: we provide a postulate-based characterisation of truth-tracking for general methods under mild assumptions, before looking at a couple of specific classes of methods from the belief change literature."
"16374","Approximate Counting of Linear Extensions in Practice","Topi Talvitie, Mikko Koivisto","University of Helsinki, ","https://www.jair.org/index.php/jair/article/download/16374/27100","We investigate the problem of computing the number of linear extensions of a given partial order on n elements. The problem has applications in numerous areas, such as sorting, planning, and learning graphical models. The problem is #P-hard but admits fully polynomial-time approximation schemes. However, the polynomial complexity bounds of the known schemes involve high degrees and large constant factors, rendering the schemes only feasible when n is some dozens. We present novel schemes, which stem from the idea of not requiring provable polynomial worst-case running time bounds. Using various new algorithmic techniques and implementation optimizations, we discover schemes that yield speedups by several orders of magnitude, enabling accurate approximations even when n is in several hundreds."
"15985","Differentially Private Neural Tangent Kernels (DP-NTK) for Privacy-Preserving Data Generation","Yilin Yang, Kamil Adamczewski, Xiaoxiao Li, Danica J. Sutherland, Mijung Park",", , , , Technical University of Denmark","https://www.jair.org/index.php/jair/article/download/15985/27101","Maximum mean discrepancy (MMD) is a particularly useful distance metric for differentially private data generation: when used with finite-dimensional features, it allows us to summarize and privatize the data distribution once, which we can repeatedly use during generator training without further privacy loss. An important question in this framework is, then, what features are useful to distinguish between real and synthetic data distributions, and whether those enable us to generate quality synthetic data. This work considers using the features of neural tangent kernels (NTKs), more precisely empirical NTKs (e-NTKs). We find that, perhaps surprisingly, the expressiveness of the untrained e-NTK features is comparable to that of the features taken from pre-trained perceptual features using public data. As a result, our method improves the privacy-accuracy trade-off compared to other state-of-the-art methods, without relying on any public data, as demonstrated on several tabular and image benchmark datasets."
"16882","A Scoping Study on AI Affordances in Early Childhood Education: Mapping the Global Landscape, Identifying Research Gaps, and  Charting Future Research Directions","Jennifer J. Chen","","https://www.jair.org/index.php/jair/article/download/16882/27102","Artificial intelligence (AI), manifested in the forms of technologies, systems, tools, and applications, has advanced rapidly, especially in recent years. It has permeated many aspects of human behavior and nearly all sectors of society, such as healthcare and education. In the context of early childhood education (ECE), AI has afforded valuable opportunities that directly and indirectly enhance children’s learning and development. While there are already two existing reviews of the literature on AI in ECE, they show either a lack of descriptive information concerning selected studies or inconsistencies between inclusion/exclusion criteria and selected studies, thereby raising concerns about their rigor. Representing a more methodologically rigorous effort and a significant contribution to the field of AI in ECE, this scoping study aimed to achieve three main goals: (1) “mapping” the global landscape of the current extent, range, and nature of relevant studies on the affordances of AI for use in ECE, (2) identifying potential research gaps, and (3) charting future research directions. Specifically, it addressed this overarching research question: What is the global landscape of the current state of knowledge concerning the affordances of AI for use in ECE? Specifically, the state of knowledge here refers to three aspects: (1) extent, (2) range, and (3) nature. First, regarding the extent aspect, the empirical knowledge was derived from 18 research articles in 11 countries and 16 peer-reviewed academic journals between 2005 and 2023, with 14 of these articles published in the past four years (2020–2023). Second, with respect to the range of study populations, it covered 15,081 children in early childhood (ages 2 to 8 years) across these 11 countries. Third, thematic analysis of these studies revealed four areas of AI affordances: (1) AI as tangible and intangible tools for interactive learning and information retrieval, (2) AI as technology for predicting/classifying children’s conditions, (3) AI as the object for learning by adapting to and personalizing children’s learning, and (4) AI as the subject for children's learning about it. Based on these findings, this scoping review identified three research gaps for future studies: (1) interviewing and/or surveying education stakeholders (parents, educators, policymakers) to explore the affordances of appropriate AI for use with, by, and for children bearing ethical considerations; (2) conducting group comparisons to investigate contextual factors contributing to the “AI divide” among children from different socioeconomic backgrounds; and (3) comparing sociocultural influences on AI use in ECE across cultures."
"15522","QCDCL vs QBF Resolution: Further Insights","Benjamin Böhm, Olaf Beyersdorff",", Friedrich Schiller University Jena","https://www.jair.org/index.php/jair/article/download/15522/27103","We continue the investigation on the relations of QCDCL and QBF resolution systems. In particular, we introduce QCDCL versions that tightly characterise QU-Resolution and (a slight variant of) long-distance Q-Resolution. We show that most QCDCL variants – parameterised by different policies for decisions, unit propagations and reductions – lead to incomparable systems for almost all choices of these policies."
"15736","Cross-domain Constituency Parsing by Leveraging Heterogeneous Data","Peiming Guo, Meishan Zhang, Yulong Chen, Jianling Li, Min Zhang, Yue Zhang",", Harbin Institute of Technology (Shenzhen), China, Westlake University, Tianjin University, Harbin Institute of Technology (Shenzhen), China, Westlake University","https://www.jair.org/index.php/jair/article/download/15736/27104","Knowledge transfer is investigated in various natural language processing tasks except cross-domain constituency parsing. In this paper, we leverage heterogeneous data to transfer cross-domain and cross-task knowledge to constituency parsing. Concretely, we first select language modeling, named entity recognition, CCG supertagging and dependency parsing as auxiliary tasks and collect the corpora of these tasks covering various domains as cross-domain and cross-task heterogeneous data. Second, we exploit three types of prefixes: shared, task and domain prefix, to merge cross-domain and cross-task data and decompose the general, task and domain representation in the pretrained language model. Third, we convert the data formats of multi-source heterogeneous datasets and loss objectives of the auxiliary tasks into a consistent formalization closer to constituency parsing. Finally, we jointly train the model to transfer task and domain knowledge to cross-domain constituency parsing. We verify the effectiveness of our proposed model on five target domains of MCTB. Experimental results show that our knowledge transfer model outperforms various baseline models, including conventional chart-based and transition-based parsers and the current large-scale language model for zero-shot and few-shot settings."
"16595","Declarative Approaches to Outcome Determination in Judgment Aggregation","Ari Conati, Andreas Niskanen, Matti Järvisalo","University of Helsinki, University of Helsinki, University of Helsinki","https://www.jair.org/index.php/jair/article/download/16595/27105","Judgment aggregation (JA) offers a generic formal framework for modeling various settings involving information aggregation by social choice mechanisms. For many judgment aggregation rules, computing collective judgments is computationally notoriously hard. The central outcome determination problem, in particular, is often complete for higher levels of the polynomial hierarchy. This complexity barrier makes it challenging to develop practical exact algorithms to outcome determination. Taking on this challenge, in this work we develop practical exact algorithms for outcome determination under a range of the most central JA rules—namely Kemeny, Slater, MaxHamming, Young, Dodgson, Reversal scoring, Condorcet, Ranked agenda, and LexiMax—by harnessing the declarative approach, in particular, Boolean satisfiability (SAT) and integer programming techniques. For the Kemeny, Slater, MaxHamming, Young, and Dodgson rules, we detail direct approaches based on maximum satisfiability (MaxSAT) and integer programming. For the Reversal scoring, Condorcet, Ranked agenda, and LexiMax rules, we develop iterative algorithms, including algorithms based on the counterexample-guided abstraction refinement (CEGAR) paradigm, making use of recent advances in incremental MaxSAT solving and preferential SAT-based reasoning. We provide an open-source implementation of the algorithms, and empirically evaluate them using real-world preference data. We compare the performance of our implementation to a recent approach which makes use of declarative solver technology for answer set programming (ASP). The results demonstrate that our approaches scale significantly beyond the reach of the ASP-based algorithms for all of the judgment aggregation rules considered."
"15710","Proof Theory and Decision Procedures for Deontic STIT Logics","Tim S. Lyon, Kees van Berkel",", ","https://www.jair.org/index.php/jair/article/download/15710/27106","This paper provides a set of cut-free complete sequent-style calculi for deontic STIT (‘See To It That’) logics used to formally reason about choice-making, obligations, and norms in a multi-agent setting. We leverage these calculi to write a proof-search algorithm deciding deontic, multi-agent STIT logics with (un)limited choice and introduce a loop-checking mechanism to ensure the termination of the algorithm. Despite the acknowledged potential for deontic reasoning in the context of autonomous, multi-agent scenarios, this work is the first to provide a syntactic decision procedure for this class of logics. Our proofsearch procedure is designed to provide verifiable witnesses/certificates of the (in)validity of formulae, which permits an analysis of the (non)theoremhood of formulae and act as explanations thereof. We show how the proof system and decision algorithm can be used to automate normative reasoning tasks such as duty checking (viz. determining an agent’s obligations relative to a given knowledge base), compliance checking (viz. determining if a choice, considered by an agent as potential conduct, complies with the given knowledge base), and joint fulfillment checking (viz. determining whether under a specified factual context an agent can jointly fulfill all their duties)."
"16527","Selfishly Prepaying in Financial Credit Networks","Hao Zhou, Yongzhao Wang, Konstantinos Varsos, Nicholas Bishop, Rahul  Savani, Anisoara Calinescu, Michael Wooldridge",", , , , , , ","https://www.jair.org/index.php/jair/article/download/16527/27107","In financial credit networks, prepayments enable a firm to settle its debt obligations ahead of an agreed-upon due date. Prepayments have a transformative impact on the structure of networks, influencing the financial well-being (utility) of individual firms. This study investigates prepayments from both theoretical and empirical perspectives. We first establish the computational complexity of finding prepayments that maximize welfare, assuming global coordination among firms in the financial network. Subsequently, our focus shifts to understanding the strategic behavior of individual firms in the presence of prepayments. We introduce a prepayment game where firms strategically make prepayments, delineating the existence of pure strategy Nash equilibria and analyzing the price of anarchy (stability) within this game. Recognizing the computational challenges associated with determining Nash equilibria in prepayment games, we use a simulation-based approach, known as empirical game-theoretic analysis (EGTA). Through EGTA, we are able to find Nash equilibria among a carefully-chosen set of heuristic strategies. By examining the equilibrium behavior of firms, we outline the characteristics of high-performing strategies for strategic prepayments and establish connections between our empirical and theoretical findings."
"16694","Preserving Fairness in AI under Domain Shift","Serban Stan, Mohammad Rostami",", University of Pennsylvania","https://www.jair.org/index.php/jair/article/download/16694/27108","Existing algorithms for ensuring fairness in AI use a single-shot training strategy, where an AI model is trained on an annotated training dataset with sensitive attributes and then fielded for utilization. This training strategy is effective in problems with stationary distributions, where both the training and testing data are drawn from the same distribution. However, it is vulnerable with respect to distributional shifts in the input space that may occur after the initial training phase. As a result, the time-dependent nature of data can introduce biases and performance degradation into the model predictions, even if the model is initially fair. Model retraining from scratch using a new annotated dataset is a naive solution that is expensive and time-consuming. We develop an algorithm to adapt a fair model to remain fair and generalizable under domain shift using solely new unannotated data points. We recast this learning setting as an unsupervised domain adaptation (UDA) problem. Our algorithm is based on updating the model such that the internal representation of data remains unbiased despite distributional shifts in the input space. We provide empirical validation on three common fairness datasets to show that the challenge exists in practical setting and to demonstrate the effectiveness of our algorithm."
"14476","Human Activity Recognition in an Open World","Derek S. Prijatelj, Samuel Grieggs, Jin Huang, Dawei Du, Ameya Shringi, Christopher Funk, Adam Kaufman, Eric Robertson, Walter J. Scheirer","University of Notre Dame, University of Notre Dame, University of Notre Dame, Kitware, Inc., Kitware, Inc., Kitware, Inc., PAR Government, PAR Government, ","https://www.jair.org/index.php/jair/article/download/14476/27109","Managing novelty in perception-based human activity recognition (HAR) is critical in realistic settings to improve task performance over time and ensure solution generalization outside of prior seen samples. Novelty manifests in HAR as unseen samples, activities, objects, environments, and sensor changes, among other ways. Novelty may be task-relevant, such as a new class or new features, or task-irrelevant resulting in nuisance novelty, such as never before seen noise, blur, or distorted video recordings. To perform HAR optimally, algorithmic solutions must be tolerant to nuisance novelty, and learn over time in the face of novelty. This paper 1) formalizes the definition of novelty in HAR building upon the prior definition of novelty in classification tasks, 2) proposes an incremental open world learning (OWL) protocol and applies it to the Kinetics datasets to generate a new benchmark KOWL-718, 3) analyzes the performance of current stateof-the-art HAR models when novelty is introduced over time, 4) provides a containerized and packaged pipeline for reproducing the OWL protocol and for modifying for any future updates to Kinetics. The experimental analysis includes an ablation study of how the different models perform under various conditions as annotated by Kinetics-AVA. The code may be used to analyze different annotations and subsets of the Kinetics datasets in an incremental open world fashion, as well as be extended as further updates to Kinetics are released."
"16167","Quantization Aware Factorization for Deep Neural Network Compression","Daria Cherniuk, Stanislav Abukhovich, Anh-Huy Phan, Ivan Oseledets, Andrzej Cichocki, Julia Gusak","Skolkovo Institute of Science and Technology, Skolkovo Institute of Science and Technology, Skolkovo Institute of Science and Technology, Skolkovo Institute of Science and Technology, , ","https://www.jair.org/index.php/jair/article/download/16167/27110","Tensor decomposition of convolutional and fully-connected layers is an effective way to reduce parameters and FLOP in neural networks. Due to memory and power consumption limitations of mobile or embedded devices, the quantization step is usually necessary when pre-trained models are deployed. A conventional post-training quantization approach applied to networks with decomposed weights yields a drop in accuracy. This motivated us to develop an algorithm that finds tensor approximation directly with quantized factors and thus benefit from both compression techniques while keeping the prediction quality of the model. Namely, we propose to use Alternating Direction Method of Multipliers (ADMM) for Canonical Polyadic (CP) decomposition with factors whose elements lie on a specified quantization grid. We compress neural network weights with a devised algorithm and evaluate it’s prediction quality and performance. We compare our approach to state-of-the-art post-training quantization methods and demonstrate competitive results and high flexibility in achiving a desirable quality-performance tradeoff."
"16163","Satisfiability Modulo User Propagators","Katalin Fazekas, Aina Niemetz, Mathias Preiner, Markus Kirchweger, Stefan Szeider, Armin Biere","TU Wien, Stanford University, Stanford University, TU Wien, TU Wien, University of Freiburg","https://www.jair.org/index.php/jair/article/download/16163/27111","Modern SAT solvers are often integrated as sub-reasoning engines into more complex tools to address problems beyond the Boolean satisfiability problem. Consider, for example, solvers for Satisfiability Modulo Theories (SMT), combinatorial optimization, model enumeration, and model counting. There, the SAT solver can often provide relevant information beyond the satisfiability answer and the domain knowledge of the embedding system, such as symmetry properties or theory axioms, may benefit the CDCL search. However, this knowledge can often not be efficiently represented in clausal form. This paper proposes a general interface to inspect and influence the internal behaviour of CDCL SAT solvers. The aim is to capture the essential functionalities that simplify and improve use cases requiring a more fine-grained interaction with the SAT solver than provided via the standard IPASIR interface. For our experiments, the state-of-the-art SAT solver CaDiCaL is extended with the proposed interface and evaluated on two representative use cases: enumerating graphs within the SAT modulo Symmetries framework (SMS), and as the main CDCL(T) SAT engine of the SMT solver cvc5."
"16905","Improving Reproducibility in AI Research: Four Mechanisms Adopted by JAIR","Odd Erik Gundersen, Malte Helmert, Holger Hoos",", University of Basel, RWTH Aachen University & Universiteit Leiden","https://www.jair.org/index.php/jair/article/download/16905/27112","Background: Lately, the reproducibility of scientific results has become an increasing worry in the scientific community. Several studies show that artificial intelligence research is not spared from reproducibility issues. Objectives: As a pioneer in open and transparent research published on the Internet, the Journal of Artificial Intelligence Research (JAIR) seeks to promote good research practices and close the feedback loop between the original researchers and those reproducing their research.  Methods: Four different mechanisms will be adopted immediately by JAIR. These are: 1) reproducibility checklists, 2) structured abstracts, 3) reproducibility badges and 4) reproducibility reports. Results: All authors submitting articles to JAIR fill out a reproducibility checklist and are encouraged to use structured abstracts. Articles that fulfill certain criteria will receive reproducibility badges, and reproducibility reports can be submitted by anyone for any article published in JAIR. Conclusions: We believe that adopting the four mechanisms outlined in this paper will improve the reproducibility of research published in JAIR and thus make a contribution to addressing the broader reproducibility issue in artificial intelligence. We hope that JAIR’s reproducibility initiative will inspire similar efforts at other top-tier journals."
"16759","Bias Mitigation Methods: Applicability, Legality, and Recommendations for Development","Madeleine Waller, Odinaldo Rodrigues, Michelle Seng Ah Lee, Oana Cocarascu","King's College London, King's College London, University of Cambridge, King's College London","https://www.jair.org/index.php/jair/article/download/16759/27113","As algorithmic decision-making systems (ADMS) are increasingly deployed across various sectors, the importance of research on fairness in Artificial Intelligence (AI) continues to grow. In this paper we highlight a number of significant practical limitations and regulatory compliance issues associated with the application of existing bias mitigation methods to ADMS. We present an example of an algorithmic system used in recruitment to illustrate these limitations. Our analysis of existing methods indicates a pressing need for a change in the approach to the development of new methods. In order to address the limitations, we provide recommendations for key factors to consider in the development of new bias mitigation methods that aim to be effective in real-world scenarios and comply with legal requirements in the European Union, United Kingdom and United States, such as non-discrimination, data protection and sector-specific regulations. Further, we suggest a checklist relating to these recommendations that should be included with the development of new bias mitigation methods."
