EMERGENT ROAD RULES IN MULTI-AGENT DRIVING ENVIRONMENTS

ICLR 2021 Submission

Abstract

In order for autonomous vehicles to share the road safely with human drivers, autonomous vehicles must abide by certain "road rules" that human drivers have agreed all road users must follow. "Road rules" include rules that drivers are required to follow by law -- such as the requirement that vehicles stop at red lights -- as well as more subtle social rules -- such as the implicit designation of "fast lanes" on the highway. In this paper, we provide empirical evidence that suggests that -- instead of hard-coding these road rules into self-driving algorithms -- a scalable alternative may be to design multi-agent environments such that agents within the environments ``discover for themselves'' that these road rules are mutually beneficial to follow. We analyze what components of our chosen multi-agent environment cause the emergence of such behavior and find that two crucial factors are noisy perception and the spatial density of agents. We provide qualitative and quantitative evidence of the emergence of seven social driving behaviors, ranging from stopping at a traffic signal to following lanes. Our results add empirical support for the social road rules that countries around the world have agreed on for safe driving.

Emergent Social Driving Rules

1. Stopping at a Traffic Signal


In a 4-way intersection, agents learn to obey traffic signals to safely navigate to the opposite road in minimum time. Note that the agents merely observe a ternary value representing the traffic light’s state, not color. To make the visualizations, we visually inspect rollouts for each converged policy to find a permutation of the ternary states that align with human red/yellow/green traffic light conventions


Lidar Noise = 0%
Lidar Noise = 25%
Lidar Noise = 50%
Lidar Noise = 75%

Transfer from Synthetic Map to a Real World Map

In this experiment we show that policies trained on the synthetic intersection above transfer to real-world intersections found in the nuScenes dataset.

2. Emergence of Lanes


When the agents are trained in an environment contraining 4 agents they follow a consistent lane till the time they cross the intersection.

Lidar Noise = 25%
Lidar Noise = 50%
Lidar Noise = 75%


Lidar Noise = 0%
Lidar Noise = 100%

When the number of agents during training are increased, the agents tend to follow the lanes consistently till they reach their destination. Additionally in case of 8 agents, we see the formation of multi-lane tracks. Qualitatively we can see that agents starting from the left side of the road tend to take the lane closer to the center and the ones starting on the right side take the extreme right lane. This additionally allows a smoother traffic flow.

Number of Training Agents = 4
Number of Training Agents = 8
Number of Training Agents = 8


Spatial Locations of the Agents when trained on 8 agent environment

3. Fast Lanes on a Highway


We observe that depending on how fast the agents are they choose to travel along the left hand side or right hand side of the road. This is similar to fast lanes which are present on the highways. (Darker shades denote faster cars)

4. Stopping at a Crosswalk


The agents detect the pedestrians (small green boxes) walking along the crosswalk, and slow down once they approach the crosswalk.

5. Communication


We denote the signals sent by the agents with the colors of the agents. In the left video where there is no perception noise, the agents' signals are not correlated with their actions/heading. In the right hand side video, we do see that agents that turn right, tend to be colored black while the ones turning right are colored red.

Lidar Noise = 0%
Lidar Noise = 100%

6. Rollouts on nuScenes


In these experiments, we attempt to show that agents learn to maintain a minimum distance between themselves as a function of their relative velocity. Additionally we observe the emergence of right of way where the agent which arrives first at the intersection gets to leave it first.

Rollouts from more nuScenes Intersections


Episode Return over Time



Obeying Traffic Signal


Episode Return over time for the Toy Intersection with 4 agents.