Scaling Up Decision-Making Under Uncertainty

Mohit Rajpal

Published: 30 Oct 2024, Last Modified: 23 Dec 2025OpenReview Archive Direct UploadEveryoneRevisionsCC BY-NC-ND 4.0

Abstract: Recent advances in deep learning and reinforcement learning have demonstrated impressively scalable performance in many tasks. This naturally raises the question whether approaches rooted in probabilistic methods can also be increased in scalability to be competitive with these approaches. We consider this question in both the deep learning and reinforcement learning setting. We consider the problem scenario of decision-making under uncertainty, which is a general problem setting that arises both in deep learning and reinforcement learning. In deep learning, an important problem setting is the early pruning of neural network elements. Here, early pruning means pruning ineffective network elements during the training process to speed up the training process. This is posed as a decision-making problem where decisions are made on which elements to keep or prune and when to prune these elements. In this setting, we make the following contributions: - We formalize the early pruning problem into a constrained optimization problem. - We prove several helper lemmas to show how this constrained optimization problem can be efficiently solved. - We utilize Multi-Output Gaussian Process to infer the performance of neural network elements. - Using this inference model, and the helper lemmas, we propose an algorithm to perform early pruning with theoretical guarantees on its performance. - We perform extensive validation, showing how our approach outperforms approaches from the deep learning literature at better preserving network performance when a significant portion of network elements are pruned. - In addition, our approach is robust to changes in hyperparameter. In reinforcement learning, an important problem setting is finding the optimal policy for a given task. This is posed as a decision-making problem where decisions are made on which policy to attempt next given the history of previous policies attempted so far. In this setting, we make the following contributions: - We propose a parameter efficient policy model which is well suited for usage on memory-constrained devices such as Internet of Things (IoT) devices. - We propose a variant of Bayesian Optimization to optimize this policy model which scales to a higher number of dimensions. - We show that our Bayesian Optimization comes with strong regret guarantees. - We perform extensive validation showing our proposed approach outperforms competing reinforcement learning approaches in sparse or malformed reward scenarios. We make progress on the above decision-making problems and show superior performance to competing approaches under specific scenarios. To finalize our thesis, we propose the study of Adversarially Designed Games, built upon our work in the reinforcement learning setting. Adversarially Designed Games consider the question whether there exist games that with large suboptimality gaps when attempting to solve them with reinforcement learning approaches. In this setting we make the following contributions: - We design a family of games which are difficult for reinforcement learning methods to solve. - We prove theoretical results showing the difficulty of solving these games with reinforcement learning, and the relative ease of solving them with Bayesian Optimization thus indicating a suboptimality gap. - We validate this suboptimality gap showing poor performance using reinforcement learning methods, and good performance with Bayesian Optimization thus empirically confirming a suboptimality gap. These works show the recent developments and value of scaling up decision-making under uncertainty. Our proposal of Adversarially Designed Games also opens up further avenues for research in scaling up decision-making under uncertainty.