X       X       P       X       X       

O                       ↑1      O       

X       ↑0                      X       

X       D       X       S       X       


Timestep: 1
Joint action taken: ('stay', 'stay') 	 Reward: 0 + shaping_factor * [0, 0]
Action probs by index: [None, None]  
X       X       P       X       X       

O                       ↑1      O       

X       ↑0                      X       

X       D       X       S       X       



Timestep: 2
Joint action taken: ('←', '→') 	 Reward: 0 + shaping_factor * [0, 0]
Action probs by index: [None, None]  
X       X       P       X       X       

O                       →1      O       

X       ←0                      X       

X       D       X       S       X       



Timestep: 3
Joint action taken: ('←', '→') 	 Reward: 0 + shaping_factor * [0, 0]
Action probs by index: [None, None]  
X       X       P       X       X       

O                       →1      O       

X       ←0                      X       

X       D       X       S       X       



Timestep: 4
Joint action taken: ('stay', 'stay') 	 Reward: 0 + shaping_factor * [0, 0]
Action probs by index: [None, None]  
X       X       P       X       X       

O                       →1      O       

X       ←0                      X       

X       D       X       S       X       



Timestep: 5
Joint action taken: ('stay', 'stay') 	 Reward: 0 + shaping_factor * [0, 0]
Action probs by index: [None, None]  
X       X       P       X       X       

O                       →1      O       

X       ←0                      X       

X       D       X       S       X       



Timestep: 6
Joint action taken: ('stay', 'stay') 	 Reward: 0 + shaping_factor * [0, 0]
Action probs by index: [None, None]  
X       X       P       X       X       

O                       →1      O       

X       ←0                      X       

X       D       X       S       X       



Timestep: 7
Joint action taken: ('stay', 'stay') 	 Reward: 0 + shaping_factor * [0, 0]
Action probs by index: [None, None]  
X       X       P       X       X       

O                       →1      O       

X       ←0                      X       

X       D       X       S       X       



Timestep: 8
Joint action taken: ('stay', 'stay') 	 Reward: 0 + shaping_factor * [0, 0]
Action probs by index: [None, None]  
X       X       P       X       X       

O                       →1      O       

X       ←0                      X       

X       D       X       S       X       



Timestep: 9
Joint action taken: ('stay', 'stay') 	 Reward: 0 + shaping_factor * [0, 0]
Action probs by index: [None, None]  
X       X       P       X       X       

O                       →1      O       

X       ←0                      X       

X       D       X       S       X       



Timestep: 10
Joint action taken: ('stay', 'stay') 	 Reward: 0 + shaping_factor * [0, 0]
Action probs by index: [None, None]  
X       X       P       X       X       

O                       →1      O       

X       ←0                      X       

X       D       X       S       X       



Timestep: 11
Joint action taken: ('stay', 'stay') 	 Reward: 0 + shaping_factor * [0, 0]
Action probs by index: [None, None]  
X       X       P       X       X       

O                       →1      O       

X       ←0                      X       

X       D       X       S       X       



Timestep: 12
Joint action taken: ('stay', 'stay') 	 Reward: 0 + shaping_factor * [0, 0]
Action probs by index: [None, None]  
X       X       P       X       X       

O                       →1      O       

X       ←0                      X       

X       D       X       S       X       



Timestep: 13
Joint action taken: ('stay', 'stay') 	 Reward: 0 + shaping_factor * [0, 0]
Action probs by index: [None, None]  
X       X       P       X       X       

O                       →1      O       

X       ←0                      X       

X       D       X       S       X       



Timestep: 14
Joint action taken: ('stay', 'stay') 	 Reward: 0 + shaping_factor * [0, 0]
Action probs by index: [None, None]  
X       X       P       X       X       

O                       →1      O       

X       ←0                      X       

X       D       X       S       X       



Timestep: 15
Joint action taken: ('stay', 'stay') 	 Reward: 0 + shaping_factor * [0, 0]
Action probs by index: [None, None]  
X       X       P       X       X       

O                       →1      O       

X       ←0                      X       

X       D       X       S       X       



Timestep: 16
Joint action taken: ('stay', 'stay') 	 Reward: 0 + shaping_factor * [0, 0]
Action probs by index: [None, None]  
X       X       P       X       X       

O                       →1      O       

X       ←0                      X       

X       D       X       S       X       



Timestep: 17
Joint action taken: ('stay', 'stay') 	 Reward: 0 + shaping_factor * [0, 0]
Action probs by index: [None, None]  
X       X       P       X       X       

O                       →1      O       

X       ←0                      X       

X       D       X       S       X       



Timestep: 18
Joint action taken: ('stay', 'stay') 	 Reward: 0 + shaping_factor * [0, 0]
Action probs by index: [None, None]  
X       X       P       X       X       

O                       →1      O       

X       ←0                      X       

X       D       X       S       X       



Timestep: 19
Joint action taken: ('stay', 'stay') 	 Reward: 0 + shaping_factor * [0, 0]
Action probs by index: [None, None]  
X       X       P       X       X       

O                       →1      O       

X       ←0                      X       

X       D       X       S       X       



Timestep: 20
Joint action taken: ('stay', 'stay') 	 Reward: 0 + shaping_factor * [0, 0]
Action probs by index: [None, None]  
X       X       P       X       X       

O                       →1      O       

X       ←0                      X       

X       D       X       S       X       



