Keywords: value alignment, verification, safety AI
TL;DR: We validate the claims made in the "Value Alignment Verification" (ICML 2021) paper and extend the experimental setup with larger action space, non-linear reward state-feature mapping, and other ablation studies.
Abstract: Scope of Reproducibility: The main goal of the paper "Value Alignment Verification" is to test the alignment of a robot's behavior efficiently with human expectations by constructing a minimal set of questions. To accomplish this, the authors propose algorithms and heuristics to create the above questionnaire. They choose a wide range of gridworld environments and a continuous autonomous driving domain to validate their put forth claims. We explore value alignment verification for gridworlds incorporating a non-linear feature reward mapping as well as an extended action space. Methodology: We re-implemented the pipeline with Python using mathematical libraries such as Numpy and Scipy. We spent approximately two months reproducing the targeted claims in the paper with the first month aimed at reproducing the results for algorithms and heuristics for exact value alignment verification. The second month focused on extending the action space, additional experiments, and refining the structure of our code. Since our experiments were not computationally expensive, we carried out the experiments on CPU. Results: The techniques proposed by authors can successfully address the value alignment verification problem in different settings. We empirically demonstrate the effectiveness of their proposals by performing exhaustive experiments with several variations to their original claims. We show high accuracy and low false positive and false negative rates in the value alignment verification task with a minimum number of questions for different algorithms and heuristics. What was easy: The problem statement, as well as the implementation of algorithms and heuristics, were straightforward. We also took aid from the original repository published with the paper. However, we implemented the entire pipeline from scratch and incorporated several variations to our code to perform additional designed experiments. What was difficult: Comprehending different algorithms and heuristics proposed in prior works along with their mathematical formulation and reasoning for their success in the given task was considerably difficult. Additionally, the original code base had several redundant files, which created initial confusion. We iterated and discussed the arguments in the paper and prior work several times to thoroughly understand the pipeline. Nevertheless, once the basics were clear, the implementation was comparatively simple. Communication with original authors: We reached out to the authors numerous times via email to seek clarifications and additional implementation details. The authors were incredibly receptive to our inquiries, and we appreciate their thorough and prompt responses.
Paper Url: https://icml.cc/Conferences/2021/Schedule?showEvent=9548
Paper Venue: ICML 2021
Supplementary Material: zip