TL;DR: A robust multi-agent reinforcement learning framework using group DRO and contextual bandit that optimizes chute mapping in robotics sortation warehouses, reducing package recirculation by 80% under varying induction rates.
Abstract: In Amazon robotic warehouses, the destination-to-chute mapping problem is crucial for efficient package sorting. Often, however, this problem is complicated by uncertain and dynamic package induction rates, which can lead to increased package recirculation. To tackle this challenge, we introduce a Distributionally Robust Multi-Agent Reinforcement Learning (DRMARL) framework that learns a destination-to-chute mapping policy that is resilient to adversarial variations in induction rates. Specifically, DRMARL relies on group distributionally robust optimization (DRO) to learn a policy that performs well not only on average but also on each individual subpopulation of induction rates within the group that capture, for example, different seasonality or operation modes of the system. This approach is then combined with a novel contextual bandit-based estimator of the worst-case induction distribution for each state-action pair, significantly reducing the cost of exploration and thereby increasing the learning efficiency and scalability of our framework. Extensive simulations demonstrate that DRMARL achieves robust chute mapping in the presence of varying induction distributions, reducing package recirculation by an average of 80% in the simulation scenario.
Lay Summary: In robotic warehouses, packages arrive at unpredictable rates throughout the day, especially during seasonal peaks or special promotions. These fluctuating package induction rates make it difficult to assign destinations to eject chutes efficiently, often causing packages to be recirculated multiple times.
We developed a new reinforcement learning framework that learns robust chute assignment policies capable of handling these changing conditions. Our method, called Distributionally Robust Multi-Agent Reinforcement Learning (DRMARL), prepares for the worst-case induction patterns using historical data. To make training efficient, we also created a contextual bandit-based tool that estimates which package arrival patterns are most problematic for each decision.
This allows the system to adapt to future package arrival behaviors it hasn't seen before, improving throughput and reducing the number of times packages need to be recirculated. In simulations, our method reduced recirculation by up to 80% and remained stable across many different induction scenarios. This research improves the resilience and efficiency of warehouse automation, ensuring smoother operation even under unexpected load spikes.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Primary Area: Reinforcement Learning->Multi-agent
Keywords: Distributionally Robust Reinforcement Learning, Multi-agent Reinforcement Learning, Group Distributionally Robust Optimization, Robotic Sortation Warehouse
Submission Number: 4942
Loading