2025-05-09 22:15:58,812 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac
2025-05-09 22:15:58,812 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac
2025-05-09 22:15:58,813 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x7ebc78e40f70>}
2025-05-09 22:15:58,813 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1111 [DEBUG]: using device: cpu
2025-05-09 22:15:58,817 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1133 [INFO]: Creating new trainer
2025-05-09 22:15:58,823 baseline-sac-noisy-walker2d:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=17, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-05-09 22:15:58,823 baseline-sac-noisy-walker2d:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-09 22:15:58,972 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1194 [DEBUG]: Starting training session...
2025-05-09 22:15:58,972 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 1/100
2025-05-09 22:18:12,604 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:18:12,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: -4.63640 ± 7.686
2025-05-09 22:18:12,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [-13.07456, -9.996544, -0.30376405, 12.588349, -0.055925053, -11.801523, -4.849387, 1.4672188, -8.670544, -11.667306]
2025-05-09 22:18:12,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [18.0, 18.0, 13.0, 37.0, 19.0, 18.0, 20.0, 13.0, 22.0, 20.0]
2025-05-09 22:18:12,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (-4.64) for latency MM1Queue_a033_s075
2025-05-09 22:18:12,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-09 22:18:12,824 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 22:18:12,829 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 40 minutes, 51 seconds)
2025-05-09 22:20:43,295 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:20:43,741 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 5.21867 ± 11.460
2025-05-09 22:20:43,741 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [5.841715, -0.96120316, 1.8198873, 13.571374, -8.173797, 35.468727, 4.228409, -3.4033697, 2.8211246, 0.97379476]
2025-05-09 22:20:43,741 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [26.0, 41.0, 69.0, 41.0, 64.0, 67.0, 19.0, 26.0, 20.0, 16.0]
2025-05-09 22:20:43,741 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (5.22) for latency MM1Queue_a033_s075
2025-05-09 22:20:43,741 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-09 22:20:43,744 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 22:20:43,749 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 52 minutes, 34 seconds)
2025-05-09 22:23:08,272 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:23:09,008 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 25.40589 ± 25.968
2025-05-09 22:23:09,008 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [-5.887037, 36.74664, 4.934671, 34.878773, 28.36444, 57.44478, -1.159456, 79.21531, 10.309001, 9.2118025]
2025-05-09 22:23:09,008 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [22.0, 68.0, 29.0, 83.0, 58.0, 157.0, 38.0, 89.0, 77.0, 21.0]
2025-05-09 22:23:09,009 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (25.41) for latency MM1Queue_a033_s075
2025-05-09 22:23:09,009 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-09 22:23:09,012 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 22:23:09,017 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 51 minutes, 44 seconds)
2025-05-09 22:25:34,211 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:25:35,230 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 71.41424 ± 63.827
2025-05-09 22:25:35,230 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [134.94, 160.31082, 11.862241, 190.16585, 16.640966, 83.19789, 50.49442, 35.33177, 10.960143, 20.2383]
2025-05-09 22:25:35,230 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [105.0, 100.0, 138.0, 112.0, 41.0, 75.0, 103.0, 94.0, 65.0, 53.0]
2025-05-09 22:25:35,230 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (71.41) for latency MM1Queue_a033_s075
2025-05-09 22:25:35,230 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-09 22:25:35,234 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 22:25:35,239 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 50 minutes, 30 seconds)
2025-05-09 22:28:02,417 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:28:03,323 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 85.94231 ± 98.836
2025-05-09 22:28:03,323 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [239.7023, 6.0374594, 201.39319, 256.65353, 57.019436, 17.251938, 43.143272, 8.30748, 44.077774, -14.163323]
2025-05-09 22:28:03,323 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [140.0, 27.0, 99.0, 171.0, 72.0, 58.0, 64.0, 60.0, 77.0, 23.0]
2025-05-09 22:28:03,323 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (85.94) for latency MM1Queue_a033_s075
2025-05-09 22:28:03,324 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-09 22:28:03,327 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 22:28:03,332 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 49 minutes, 22 seconds)
2025-05-09 22:30:30,951 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:30:31,735 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 61.09752 ± 77.423
2025-05-09 22:30:31,735 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [11.6615095, 2.2376506, 164.5738, 179.58269, -0.3735532, 48.12711, 4.123715, 187.9555, 1.5313522, 11.555412]
2025-05-09 22:30:31,735 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [47.0, 51.0, 130.0, 154.0, 12.0, 64.0, 19.0, 93.0, 39.0, 77.0]
2025-05-09 22:30:31,737 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 51 minutes, 31 seconds)
2025-05-09 22:32:59,848 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:33:01,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 192.08495 ± 136.749
2025-05-09 22:33:01,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [330.81213, 3.9469504, 46.60908, 266.38416, 371.89505, 8.131151, 260.2366, 270.28394, 300.67633, 61.874176]
2025-05-09 22:33:01,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [203.0, 14.0, 94.0, 169.0, 264.0, 24.0, 148.0, 184.0, 188.0, 110.0]
2025-05-09 22:33:01,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (192.08) for latency MM1Queue_a033_s075
2025-05-09 22:33:01,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-09 22:33:01,517 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 22:33:01,540 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 48 minutes, 42 seconds)
2025-05-09 22:35:30,707 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:35:32,897 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 211.25090 ± 136.780
2025-05-09 22:35:32,897 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [203.54712, 25.079756, 322.5049, 226.40999, 2.4037247, 268.28857, 234.20988, 307.53955, 63.66537, 458.8599]
2025-05-09 22:35:32,897 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [205.0, 36.0, 133.0, 302.0, 14.0, 163.0, 114.0, 186.0, 69.0, 538.0]
2025-05-09 22:35:32,898 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (211.25) for latency MM1Queue_a033_s075
2025-05-09 22:35:32,898 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-09 22:35:32,901 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 22:35:32,906 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 48 minutes, 7 seconds)
2025-05-09 22:38:05,095 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:38:06,663 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 127.79917 ± 87.586
2025-05-09 22:38:06,664 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [212.33136, 42.156033, 294.38474, 182.40797, 154.20265, 90.4697, 11.313297, 188.42891, 65.52605, 36.771023]
2025-05-09 22:38:06,664 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [270.0, 83.0, 193.0, 259.0, 121.0, 113.0, 51.0, 123.0, 56.0, 58.0]
2025-05-09 22:38:06,666 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 47 minutes, 55 seconds)
2025-05-09 22:40:35,378 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:40:36,857 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 174.77483 ± 143.167
2025-05-09 22:40:36,857 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [231.50113, 32.66996, 157.87039, 16.440569, 265.24933, 13.513193, 27.689922, 321.25708, 449.8736, 231.68304]
2025-05-09 22:40:36,857 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [141.0, 93.0, 98.0, 45.0, 135.0, 48.0, 50.0, 174.0, 308.0, 145.0]
2025-05-09 22:40:36,859 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 3 hours, 46 minutes, 3 seconds)
2025-05-09 22:43:19,636 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:43:21,462 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 196.99101 ± 97.992
2025-05-09 22:43:21,462 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [268.6497, 89.7751, 163.01634, 178.68604, 70.87265, 328.52643, 320.11642, 252.40903, 253.44215, 44.41623]
2025-05-09 22:43:21,462 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [149.0, 177.0, 128.0, 242.0, 88.0, 188.0, 200.0, 139.0, 131.0, 86.0]
2025-05-09 22:43:21,464 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 3 hours, 48 minutes, 21 seconds)
2025-05-09 22:45:38,247 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:45:39,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 169.31438 ± 197.287
2025-05-09 22:45:39,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [185.38237, 241.00362, 17.789928, 21.440706, 49.84418, 693.2531, 21.676205, 58.96414, 117.88449, 285.90503]
2025-05-09 22:45:39,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [121.0, 127.0, 44.0, 38.0, 98.0, 318.0, 64.0, 63.0, 165.0, 175.0]
2025-05-09 22:45:39,671 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 3 hours, 42 minutes, 23 seconds)
2025-05-09 22:48:14,315 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:48:17,385 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 300.35855 ± 240.731
2025-05-09 22:48:17,385 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [161.3639, 99.87636, -6.486427, 305.75763, 379.7652, 141.13933, 921.9199, 278.71182, 366.04602, 355.49173]
2025-05-09 22:48:17,385 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [172.0, 158.0, 48.0, 167.0, 174.0, 65.0, 978.0, 157.0, 209.0, 184.0]
2025-05-09 22:48:17,385 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (300.36) for latency MM1Queue_a033_s075
2025-05-09 22:48:17,386 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-09 22:48:17,390 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 22:48:17,397 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 3 hours, 41 minutes, 42 seconds)
2025-05-09 22:50:45,962 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:50:48,586 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 309.26791 ± 98.271
2025-05-09 22:50:48,586 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [122.64265, 215.2222, 381.65607, 421.1138, 300.75494, 353.14658, 306.16376, 185.40405, 411.01938, 395.556]
2025-05-09 22:50:48,586 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [185.0, 145.0, 232.0, 260.0, 170.0, 195.0, 177.0, 257.0, 276.0, 234.0]
2025-05-09 22:50:48,586 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (309.27) for latency MM1Queue_a033_s075
2025-05-09 22:50:48,586 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-09 22:50:48,589 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 22:50:48,595 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 3 hours, 38 minutes, 25 seconds)
2025-05-09 22:53:17,485 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:53:19,587 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 160.94557 ± 103.651
2025-05-09 22:53:19,587 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [328.23737, 64.95299, 165.77979, 109.514984, 82.428894, 68.09811, 359.87906, 103.674675, 233.84958, 93.04019]
2025-05-09 22:53:19,587 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [227.0, 116.0, 207.0, 86.0, 159.0, 131.0, 202.0, 178.0, 276.0, 152.0]
2025-05-09 22:53:19,590 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 3 hours, 36 minutes, 6 seconds)
2025-05-09 22:55:50,800 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:55:52,789 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 246.67659 ± 106.001
2025-05-09 22:55:52,790 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [337.82745, 72.56627, 260.48285, 291.7969, 415.77646, 242.25784, 137.6928, 298.629, 93.456635, 316.27988]
2025-05-09 22:55:52,790 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [171.0, 117.0, 143.0, 167.0, 239.0, 115.0, 200.0, 183.0, 150.0, 164.0]
2025-05-09 22:55:52,792 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 3 hours, 30 minutes, 22 seconds)
2025-05-09 22:58:33,601 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:58:35,288 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 194.10579 ± 128.755
2025-05-09 22:58:35,288 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [92.16613, 400.6118, 253.5989, 41.52261, 20.519344, 266.09293, 343.68292, 146.24672, 302.05136, 74.565216]
2025-05-09 22:58:35,288 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [101.0, 174.0, 136.0, 76.0, 50.0, 145.0, 213.0, 206.0, 180.0, 120.0]
2025-05-09 22:58:35,292 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 3 hours, 34 minutes, 35 seconds)
2025-05-09 23:00:56,156 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:00:58,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 194.23628 ± 126.998
2025-05-09 23:00:58,125 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [165.50021, 386.0385, 115.22878, 167.66649, 307.63422, 53.41175, 73.44487, 407.8085, 42.5813, 223.04817]
2025-05-09 23:00:58,125 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [138.0, 245.0, 166.0, 225.0, 183.0, 109.0, 125.0, 241.0, 99.0, 115.0]
2025-05-09 23:00:58,128 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 3 hours, 27 minutes, 55 seconds)
2025-05-09 23:03:28,981 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:03:30,518 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 185.26515 ± 141.249
2025-05-09 23:03:30,518 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [66.008484, 366.03024, 428.91318, 244.87479, 225.73376, 298.1331, 31.184904, 102.526794, 95.47663, -6.230347]
2025-05-09 23:03:30,518 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [84.0, 159.0, 244.0, 217.0, 129.0, 140.0, 49.0, 124.0, 131.0, 39.0]
2025-05-09 23:03:30,522 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 3 hours, 25 minutes, 43 seconds)
2025-05-09 23:06:01,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:06:03,087 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 232.50888 ± 121.025
2025-05-09 23:06:03,087 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [329.28723, 233.11832, 350.32498, 371.4746, 135.82234, 309.47366, 102.06589, 354.8267, 28.001947, 110.69337]
2025-05-09 23:06:03,087 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [150.0, 137.0, 378.0, 179.0, 103.0, 178.0, 138.0, 163.0, 55.0, 136.0]
2025-05-09 23:06:03,090 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 23 minutes, 36 seconds)
2025-05-09 23:08:32,779 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:08:35,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 234.47420 ± 270.619
2025-05-09 23:08:35,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [118.56232, 425.00912, 255.54175, 13.857506, 339.1511, 129.46251, 936.1016, 8.64848, 0.92955077, 117.47806]
2025-05-09 23:08:35,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [154.0, 195.0, 144.0, 47.0, 168.0, 152.0, 972.0, 17.0, 48.0, 158.0]
2025-05-09 23:08:35,517 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 20 minutes, 51 seconds)
2025-05-09 23:11:04,952 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:11:08,051 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 323.86594 ± 243.887
2025-05-09 23:11:08,051 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [175.27347, 51.103123, 397.34875, 132.20493, 907.6121, 384.37982, 406.9682, 346.79178, 421.13928, 15.83805]
2025-05-09 23:11:08,051 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [130.0, 62.0, 228.0, 160.0, 940.0, 198.0, 194.0, 177.0, 211.0, 64.0]
2025-05-09 23:11:08,051 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (323.87) for latency MM1Queue_a033_s075
2025-05-09 23:11:08,051 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-09 23:11:08,054 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 23:11:08,061 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 15 minutes, 43 seconds)
2025-05-09 23:13:39,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:13:41,868 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 258.74161 ± 163.676
2025-05-09 23:13:41,869 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [380.412, 419.27112, 213.97981, 441.6756, 140.35066, 115.79648, 18.241756, 31.211117, 407.926, 418.5516]
2025-05-09 23:13:41,869 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [248.0, 218.0, 118.0, 166.0, 243.0, 167.0, 36.0, 46.0, 225.0, 221.0]
2025-05-09 23:13:41,873 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 16 minutes, 1 second)
2025-05-09 23:16:10,927 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:16:12,931 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 294.38715 ± 187.310
2025-05-09 23:16:12,931 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [519.5283, 73.17346, 375.3337, 212.6455, 382.90704, 31.511953, 7.711561, 388.1984, 519.95886, 432.90262]
2025-05-09 23:16:12,931 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [318.0, 135.0, 181.0, 126.0, 164.0, 90.0, 38.0, 193.0, 245.0, 194.0]
2025-05-09 23:16:12,936 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 13 minutes, 8 seconds)
2025-05-09 23:18:43,353 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:18:45,427 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 265.74191 ± 98.712
2025-05-09 23:18:45,427 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [171.65274, 312.24823, 437.00943, 378.3316, 183.69164, 290.5422, 311.36743, 82.81835, 262.98273, 226.7748]
2025-05-09 23:18:45,427 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [201.0, 236.0, 197.0, 206.0, 196.0, 171.0, 151.0, 92.0, 156.0, 128.0]
2025-05-09 23:18:45,432 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 10 minutes, 35 seconds)
2025-05-09 23:21:15,795 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:21:17,134 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 172.99794 ± 154.677
2025-05-09 23:21:17,134 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [28.075832, 307.93448, 258.84622, 6.839044, 329.37003, 388.84406, -7.74647, 80.93702, 5.9062004, 330.97293]
2025-05-09 23:21:17,134 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [42.0, 191.0, 124.0, 32.0, 171.0, 201.0, 47.0, 115.0, 17.0, 179.0]
2025-05-09 23:21:17,138 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 7 minutes, 51 seconds)
2025-05-09 23:23:49,315 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:23:51,120 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 268.48605 ± 153.409
2025-05-09 23:23:51,120 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [540.94147, 325.64035, 347.00385, 40.498505, 392.0738, 279.2275, 300.16983, 115.374405, 315.54977, 28.381208]
2025-05-09 23:23:51,120 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [292.0, 162.0, 162.0, 99.0, 180.0, 132.0, 145.0, 104.0, 168.0, 77.0]
2025-05-09 23:23:51,125 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 5 minutes, 40 seconds)
2025-05-09 23:26:18,956 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:26:22,394 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 395.50040 ± 141.289
2025-05-09 23:26:22,394 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [359.5392, 287.4068, 320.05225, 252.85162, 460.69485, 333.19165, 318.02582, 766.69135, 485.4696, 371.08054]
2025-05-09 23:26:22,394 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [170.0, 158.0, 174.0, 205.0, 246.0, 251.0, 186.0, 798.0, 306.0, 179.0]
2025-05-09 23:26:22,394 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (395.50) for latency MM1Queue_a033_s075
2025-05-09 23:26:22,395 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-09 23:26:22,398 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 23:26:22,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 2 minutes, 31 seconds)
2025-05-09 23:28:52,682 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:28:54,580 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 219.48296 ± 125.918
2025-05-09 23:28:54,580 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [69.10516, 119.76986, 313.12463, 287.9735, 369.1772, 317.38446, 334.17114, 31.48249, 295.31693, 57.324223]
2025-05-09 23:28:54,580 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [131.0, 186.0, 156.0, 146.0, 211.0, 161.0, 258.0, 115.0, 150.0, 74.0]
2025-05-09 23:28:54,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 15 seconds)
2025-05-09 23:31:28,409 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:31:30,443 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 305.33170 ± 202.999
2025-05-09 23:31:30,443 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [622.9704, 6.3858395, 239.76811, 26.79149, 403.20523, 200.0114, 189.68713, 465.367, 310.50552, 588.62463]
2025-05-09 23:31:30,443 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [295.0, 57.0, 143.0, 67.0, 188.0, 137.0, 149.0, 218.0, 134.0, 308.0]
2025-05-09 23:31:30,449 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 58 minutes, 30 seconds)
2025-05-09 23:33:59,173 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:34:01,660 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 358.09894 ± 124.847
2025-05-09 23:34:01,660 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [550.87054, 415.4381, 206.9491, 422.9214, 386.43427, 406.97687, 92.367134, 462.37872, 334.56598, 302.08728]
2025-05-09 23:34:01,660 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [419.0, 171.0, 120.0, 186.0, 194.0, 206.0, 172.0, 224.0, 176.0, 178.0]
2025-05-09 23:34:01,665 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 55 minutes, 50 seconds)
2025-05-09 23:36:30,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:36:33,125 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 328.04398 ± 114.298
2025-05-09 23:36:33,125 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [395.01898, 119.81509, 417.17447, 400.71158, 483.46017, 195.37396, 214.26683, 377.83698, 416.41898, 260.3625]
2025-05-09 23:36:33,125 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [168.0, 162.0, 196.0, 193.0, 236.0, 258.0, 167.0, 180.0, 232.0, 236.0]
2025-05-09 23:36:33,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 52 minutes, 43 seconds)
2025-05-09 23:39:14,436 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:39:16,402 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 315.29736 ± 131.426
2025-05-09 23:39:16,403 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [109.058235, 323.35767, 415.63495, 311.59164, 408.16534, 383.39526, 344.8872, 342.34805, 479.12628, 35.409058]
2025-05-09 23:39:16,403 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [131.0, 152.0, 195.0, 173.0, 199.0, 185.0, 186.0, 158.0, 216.0, 47.0]
2025-05-09 23:39:16,408 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 52 minutes, 51 seconds)
2025-05-09 23:41:36,502 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:41:38,819 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 322.43491 ± 156.851
2025-05-09 23:41:38,819 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [437.4909, 464.46906, 56.87376, 28.531513, 350.77417, 379.13754, 209.41864, 391.53226, 456.10867, 450.01257]
2025-05-09 23:41:38,819 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [220.0, 205.0, 114.0, 72.0, 175.0, 170.0, 293.0, 209.0, 235.0, 220.0]
2025-05-09 23:41:38,824 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 48 minutes, 7 seconds)
2025-05-09 23:44:09,390 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:44:12,005 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 409.77264 ± 115.562
2025-05-09 23:44:12,005 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [507.74207, 457.0827, 487.94382, 398.62122, 155.78432, 233.59035, 435.28125, 407.0903, 496.20312, 518.387]
2025-05-09 23:44:12,005 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [226.0, 225.0, 228.0, 198.0, 220.0, 162.0, 205.0, 243.0, 239.0, 209.0]
2025-05-09 23:44:12,005 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (409.77) for latency MM1Queue_a033_s075
2025-05-09 23:44:12,005 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-09 23:44:12,009 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 23:44:12,018 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 45 minutes)
2025-05-09 23:46:43,827 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:46:46,256 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 313.00803 ± 121.675
2025-05-09 23:46:46,256 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [157.14645, 408.83902, 366.0011, 356.32736, 233.06502, 435.67746, 151.28825, 411.6399, 466.9748, 143.12076]
2025-05-09 23:46:46,256 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [239.0, 198.0, 210.0, 159.0, 159.0, 210.0, 198.0, 218.0, 212.0, 187.0]
2025-05-09 23:46:46,262 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 43 minutes, 6 seconds)
2025-05-09 23:49:16,747 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:49:19,263 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 367.36377 ± 172.156
2025-05-09 23:49:19,263 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [614.80975, 105.010284, 80.08329, 390.5579, 512.23706, 207.71352, 354.27173, 438.27203, 442.1089, 528.5731]
2025-05-09 23:49:19,264 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [371.0, 156.0, 134.0, 199.0, 261.0, 151.0, 145.0, 231.0, 210.0, 213.0]
2025-05-09 23:49:19,269 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 40 minutes, 53 seconds)
2025-05-09 23:51:52,245 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:51:54,803 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 385.44980 ± 90.915
2025-05-09 23:51:54,803 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [453.1434, 341.91837, 227.67406, 411.1552, 456.80237, 387.83835, 390.232, 221.91875, 473.1396, 490.67566]
2025-05-09 23:51:54,803 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [197.0, 186.0, 169.0, 333.0, 238.0, 162.0, 222.0, 154.0, 184.0, 271.0]
2025-05-09 23:51:54,809 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 36 minutes, 44 seconds)
2025-05-09 23:54:21,645 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:54:23,849 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 366.35480 ± 116.237
2025-05-09 23:54:23,849 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [425.58118, 74.94276, 328.32498, 350.72574, 430.33588, 331.1946, 440.00906, 345.27658, 548.86786, 388.28912]
2025-05-09 23:54:23,849 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [174.0, 125.0, 171.0, 180.0, 175.0, 155.0, 210.0, 176.0, 261.0, 194.0]
2025-05-09 23:54:23,855 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 35 minutes, 33 seconds)
2025-05-09 23:56:55,010 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:56:57,005 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 342.98642 ± 123.687
2025-05-09 23:56:57,005 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [317.0077, 395.11194, 420.82065, 384.80835, 342.63177, 18.864674, 409.533, 245.6641, 426.53738, 468.88467]
2025-05-09 23:56:57,005 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [148.0, 187.0, 177.0, 168.0, 161.0, 43.0, 181.0, 172.0, 200.0, 226.0]
2025-05-09 23:56:57,011 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 32 minutes, 59 seconds)
2025-05-09 23:59:27,090 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:59:29,619 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 457.69498 ± 93.812
2025-05-09 23:59:29,619 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [624.83905, 474.49057, 440.4081, 398.11377, 535.1338, 512.6683, 277.6424, 525.6019, 364.18033, 423.87152]
2025-05-09 23:59:29,619 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [247.0, 215.0, 191.0, 154.0, 205.0, 233.0, 131.0, 289.0, 232.0, 190.0]
2025-05-09 23:59:29,619 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (457.69) for latency MM1Queue_a033_s075
2025-05-09 23:59:29,619 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-09 23:59:29,623 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 23:59:29,632 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 30 minutes, 7 seconds)
2025-05-10 00:02:05,078 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:02:07,568 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 413.27631 ± 156.180
2025-05-10 00:02:07,568 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [184.30142, 584.3962, 504.65436, 477.90955, 636.59625, 414.9966, 409.80505, 100.81321, 376.95465, 442.33594]
2025-05-10 00:02:07,568 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [123.0, 282.0, 231.0, 257.0, 264.0, 196.0, 210.0, 156.0, 167.0, 186.0]
2025-05-10 00:02:07,574 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 28 minutes, 32 seconds)
2025-05-10 00:04:33,434 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:04:37,080 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 430.25171 ± 219.883
2025-05-10 00:04:37,080 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [365.5117, 393.38635, 331.5929, 995.72046, 548.8265, 421.05026, 73.88899, 431.23328, 360.5497, 380.75674]
2025-05-10 00:04:37,080 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [166.0, 185.0, 188.0, 1000.0, 336.0, 205.0, 86.0, 248.0, 181.0, 165.0]
2025-05-10 00:04:37,087 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 24 minutes, 49 seconds)
2025-05-10 00:07:10,106 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:07:12,889 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 465.03320 ± 153.182
2025-05-10 00:07:12,889 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [798.6779, 578.3183, 496.87015, 372.80658, 422.2446, 458.0222, 182.67847, 557.49927, 396.05603, 387.15866]
2025-05-10 00:07:12,889 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [304.0, 342.0, 254.0, 195.0, 222.0, 225.0, 106.0, 288.0, 168.0, 164.0]
2025-05-10 00:07:12,890 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (465.03) for latency MM1Queue_a033_s075
2025-05-10 00:07:12,890 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-10 00:07:12,894 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 00:07:12,906 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 23 minutes, 33 seconds)
2025-05-10 00:09:42,470 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:09:44,875 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 359.72748 ± 206.297
2025-05-10 00:09:44,875 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [151.35565, 169.16037, 304.67825, 666.56555, 521.853, 435.91473, 213.4231, 678.08887, 56.09989, 400.1356]
2025-05-10 00:09:44,875 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [175.0, 239.0, 133.0, 261.0, 246.0, 184.0, 154.0, 277.0, 100.0, 216.0]
2025-05-10 00:09:44,881 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 20 minutes, 46 seconds)
2025-05-10 00:12:16,586 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:12:19,341 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 478.83716 ± 117.664
2025-05-10 00:12:19,342 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [386.63376, 583.99664, 496.96484, 502.02994, 397.64197, 329.68488, 700.9243, 569.1001, 304.80704, 516.58777]
2025-05-10 00:12:19,342 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [160.0, 203.0, 178.0, 216.0, 205.0, 324.0, 263.0, 307.0, 160.0, 245.0]
2025-05-10 00:12:19,342 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (478.84) for latency MM1Queue_a033_s075
2025-05-10 00:12:19,342 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-10 00:12:19,346 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 00:12:19,358 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 18 minutes, 33 seconds)
2025-05-10 00:14:47,678 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:14:50,855 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 446.88217 ± 156.830
2025-05-10 00:14:50,856 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [437.6081, 749.5248, 530.96423, 523.1248, 436.89337, 128.24426, 253.02579, 478.85855, 449.71298, 480.86508]
2025-05-10 00:14:50,856 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [192.0, 432.0, 296.0, 246.0, 216.0, 181.0, 130.0, 223.0, 409.0, 223.0]
2025-05-10 00:14:50,863 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 14 minutes, 50 seconds)
2025-05-10 00:17:22,737 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:17:25,032 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 395.28183 ± 82.443
2025-05-10 00:17:25,033 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [469.97147, 335.5987, 368.5454, 318.0844, 261.5352, 431.2705, 415.37976, 406.23282, 372.26715, 573.9331]
2025-05-10 00:17:25,033 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [234.0, 274.0, 188.0, 120.0, 117.0, 187.0, 177.0, 187.0, 167.0, 267.0]
2025-05-10 00:17:25,040 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 13 minutes, 6 seconds)
2025-05-10 00:19:56,012 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:19:59,219 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 498.60684 ± 100.718
2025-05-10 00:19:59,219 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [589.56537, 421.32666, 517.60724, 717.1083, 485.0419, 384.35608, 415.72995, 452.6475, 593.15173, 409.53354]
2025-05-10 00:19:59,219 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [323.0, 233.0, 220.0, 369.0, 311.0, 170.0, 211.0, 227.0, 328.0, 184.0]
2025-05-10 00:19:59,219 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (498.61) for latency MM1Queue_a033_s075
2025-05-10 00:19:59,219 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-10 00:19:59,222 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 00:19:59,233 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 10 minutes, 16 seconds)
2025-05-10 00:22:30,491 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:22:32,917 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 377.39084 ± 111.304
2025-05-10 00:22:32,917 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [364.81396, 307.54703, 310.38354, 499.6205, 216.88763, 631.4236, 422.98425, 296.8523, 355.84323, 367.55252]
2025-05-10 00:22:32,917 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [197.0, 150.0, 170.0, 204.0, 157.0, 409.0, 201.0, 148.0, 172.0, 191.0]
2025-05-10 00:22:32,925 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 8 minutes)
2025-05-10 00:25:00,638 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:25:03,284 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 430.02667 ± 76.004
2025-05-10 00:25:03,285 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [331.4869, 458.8661, 600.10443, 464.97287, 424.672, 429.59363, 322.9587, 398.32083, 482.26944, 387.0214]
2025-05-10 00:25:03,285 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [181.0, 231.0, 261.0, 217.0, 238.0, 190.0, 152.0, 235.0, 305.0, 174.0]
2025-05-10 00:25:03,292 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 4 minutes, 46 seconds)
2025-05-10 00:27:36,611 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:27:39,292 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 438.61566 ± 199.523
2025-05-10 00:27:39,292 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [605.9966, 382.68347, 516.8864, 612.7972, 465.12097, 533.74, 575.20135, 567.28357, 106.1298, 20.317215]
2025-05-10 00:27:39,292 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [252.0, 177.0, 259.0, 348.0, 214.0, 246.0, 224.0, 314.0, 107.0, 59.0]
2025-05-10 00:27:39,300 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 2 minutes, 56 seconds)
2025-05-10 00:30:08,084 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:30:10,503 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 440.70978 ± 84.560
2025-05-10 00:30:10,503 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [391.61176, 444.44162, 383.09808, 601.07886, 335.1201, 305.92136, 472.0856, 527.29535, 465.1584, 481.2865]
2025-05-10 00:30:10,504 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [211.0, 173.0, 167.0, 335.0, 145.0, 130.0, 191.0, 199.0, 232.0, 209.0]
2025-05-10 00:30:10,512 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 59 minutes, 55 seconds)
2025-05-10 00:32:39,823 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:32:42,103 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 355.60489 ± 168.098
2025-05-10 00:32:42,103 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [449.4931, 96.560745, 118.14631, 600.48956, 423.1721, 452.87497, 117.48621, 465.06525, 426.38895, 406.37186]
2025-05-10 00:32:42,103 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [190.0, 118.0, 134.0, 238.0, 201.0, 188.0, 177.0, 227.0, 196.0, 217.0]
2025-05-10 00:32:42,112 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 56 minutes, 58 seconds)
2025-05-10 00:35:12,116 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:35:14,409 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 377.06119 ± 170.023
2025-05-10 00:35:14,409 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [142.84744, 109.3425, 493.9848, 442.29214, 284.42975, 321.54865, 717.9867, 495.33713, 344.65967, 418.18307]
2025-05-10 00:35:14,409 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [110.0, 161.0, 196.0, 240.0, 157.0, 141.0, 277.0, 260.0, 183.0, 181.0]
2025-05-10 00:35:14,417 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 54 minutes, 13 seconds)
2025-05-10 00:37:44,959 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:37:47,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 498.47955 ± 83.944
2025-05-10 00:37:47,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [476.30038, 502.93933, 401.04483, 462.38272, 462.78668, 476.94357, 473.89093, 734.9664, 475.74506, 517.79596]
2025-05-10 00:37:47,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [239.0, 235.0, 191.0, 237.0, 212.0, 226.0, 187.0, 287.0, 222.0, 286.0]
2025-05-10 00:37:47,793 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 52 minutes, 7 seconds)
2025-05-10 00:40:20,443 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:40:23,177 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 448.04965 ± 142.051
2025-05-10 00:40:23,177 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [145.92645, 531.4233, 664.9735, 341.46143, 455.9201, 450.53674, 308.74396, 461.63397, 545.06335, 574.8139]
2025-05-10 00:40:23,178 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [112.0, 292.0, 258.0, 212.0, 238.0, 191.0, 153.0, 258.0, 280.0, 256.0]
2025-05-10 00:40:23,186 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 49 minutes, 29 seconds)
2025-05-10 00:42:54,523 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:42:56,467 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 358.67883 ± 202.039
2025-05-10 00:42:56,467 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [528.97864, 487.7109, 0.18230969, 588.3625, 290.54178, 384.93842, 560.50555, 329.00696, 419.33148, -2.770298]
2025-05-10 00:42:56,467 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [213.0, 238.0, 33.0, 243.0, 122.0, 177.0, 248.0, 160.0, 176.0, 19.0]
2025-05-10 00:42:56,477 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 47 minutes, 14 seconds)
2025-05-10 00:45:23,707 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:45:26,011 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 372.74982 ± 192.235
2025-05-10 00:45:26,011 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [688.3639, 67.10354, 371.26245, 486.43237, 383.85486, 25.184332, 436.61588, 564.99634, 371.16632, 332.5184]
2025-05-10 00:45:26,011 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [304.0, 122.0, 187.0, 184.0, 189.0, 46.0, 219.0, 322.0, 166.0, 167.0]
2025-05-10 00:45:26,020 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 44 minutes, 24 seconds)
2025-05-10 00:47:57,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:47:59,837 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 378.86444 ± 184.162
2025-05-10 00:47:59,837 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [99.22618, 95.38365, 474.545, 412.54654, 491.95398, 503.45935, 188.72647, 565.02124, 638.85767, 318.9245]
2025-05-10 00:47:59,837 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [93.0, 133.0, 254.0, 191.0, 249.0, 307.0, 146.0, 252.0, 248.0, 179.0]
2025-05-10 00:47:59,847 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 42 minutes, 3 seconds)
2025-05-10 00:50:28,581 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:50:31,180 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 419.41537 ± 163.294
2025-05-10 00:50:31,180 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [507.08337, 452.16605, 528.5807, 411.25195, 22.005175, 224.64014, 584.78503, 412.67465, 533.93475, 517.03217]
2025-05-10 00:50:31,180 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [277.0, 208.0, 205.0, 200.0, 43.0, 154.0, 331.0, 220.0, 243.0, 266.0]
2025-05-10 00:50:31,189 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 39 minutes, 14 seconds)
2025-05-10 00:53:02,424 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:53:04,946 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 414.71939 ± 139.410
2025-05-10 00:53:04,946 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [336.42648, 251.75267, 150.06293, 511.5167, 444.13464, 618.50684, 423.5906, 582.3305, 490.87317, 337.99963]
2025-05-10 00:53:04,946 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [174.0, 197.0, 118.0, 243.0, 210.0, 257.0, 241.0, 256.0, 224.0, 176.0]
2025-05-10 00:53:04,956 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 36 minutes, 29 seconds)
2025-05-10 00:55:33,852 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:55:36,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 486.15698 ± 179.451
2025-05-10 00:55:36,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [22.996094, 618.3292, 498.85742, 632.9786, 359.14352, 399.4699, 580.15875, 650.2111, 529.15497, 570.2706]
2025-05-10 00:55:36,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [41.0, 223.0, 251.0, 271.0, 175.0, 233.0, 280.0, 265.0, 280.0, 307.0]
2025-05-10 00:55:36,701 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 33 minutes, 45 seconds)
2025-05-10 00:58:08,246 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:58:10,383 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 339.68250 ± 140.263
2025-05-10 00:58:10,383 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [502.52026, 48.360874, 418.15912, 410.11737, 278.77902, 310.42993, 532.99615, 311.86673, 175.74257, 407.8528]
2025-05-10 00:58:10,383 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [202.0, 102.0, 204.0, 179.0, 141.0, 141.0, 226.0, 159.0, 192.0, 243.0]
2025-05-10 00:58:10,392 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 31 minutes, 43 seconds)
2025-05-10 01:00:40,307 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:00:42,438 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 368.46445 ± 169.913
2025-05-10 01:00:42,438 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [583.52246, 446.66165, 452.61475, 22.818216, 483.00327, 376.85184, 114.61384, 293.38177, 530.8807, 380.29626]
2025-05-10 01:00:42,438 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [209.0, 217.0, 205.0, 86.0, 221.0, 175.0, 125.0, 134.0, 220.0, 178.0]
2025-05-10 01:00:42,447 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 28 minutes, 58 seconds)
2025-05-10 01:03:16,352 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:03:18,870 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 408.02310 ± 196.841
2025-05-10 01:03:18,870 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [613.15137, 458.2039, 554.71844, 534.22253, 412.5964, 472.93314, 44.18934, 509.0907, 466.96317, 14.16223]
2025-05-10 01:03:18,870 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [407.0, 231.0, 228.0, 251.0, 192.0, 245.0, 69.0, 220.0, 189.0, 49.0]
2025-05-10 01:03:18,879 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 27 minutes)
2025-05-10 01:05:47,596 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:05:50,278 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 446.09747 ± 164.469
2025-05-10 01:05:50,278 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [424.6183, 353.10156, 576.3896, 667.2394, 617.26227, 446.46188, 84.90519, 442.4256, 292.39676, 556.174]
2025-05-10 01:05:50,278 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [208.0, 198.0, 234.0, 266.0, 247.0, 254.0, 111.0, 227.0, 183.0, 276.0]
2025-05-10 01:05:50,289 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 24 minutes, 11 seconds)
2025-05-10 01:08:19,717 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:08:22,621 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 538.81000 ± 94.134
2025-05-10 01:08:22,621 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [359.20227, 663.9046, 480.57022, 605.5994, 514.9769, 452.509, 468.35513, 572.69196, 626.10364, 644.1869]
2025-05-10 01:08:22,621 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [173.0, 294.0, 230.0, 263.0, 219.0, 251.0, 191.0, 260.0, 251.0, 264.0]
2025-05-10 01:08:22,621 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (538.81) for latency MM1Queue_a033_s075
2025-05-10 01:08:22,622 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-10 01:08:22,625 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 01:08:22,641 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 21 minutes, 42 seconds)
2025-05-10 01:10:52,066 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:10:54,500 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 477.63062 ± 143.296
2025-05-10 01:10:54,500 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [352.18094, 290.1751, 627.0959, 738.4319, 549.5343, 446.04276, 460.38196, 304.30197, 618.52716, 389.6336]
2025-05-10 01:10:54,500 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [158.0, 148.0, 244.0, 277.0, 236.0, 185.0, 212.0, 153.0, 259.0, 171.0]
2025-05-10 01:10:54,512 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 18 minutes, 57 seconds)
2025-05-10 01:13:28,055 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:13:30,986 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 520.25671 ± 115.579
2025-05-10 01:13:30,986 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [385.7529, 367.20825, 517.1372, 572.3251, 785.39813, 575.82367, 408.64725, 567.8157, 534.4759, 487.98358]
2025-05-10 01:13:30,986 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [177.0, 194.0, 272.0, 238.0, 383.0, 244.0, 233.0, 215.0, 198.0, 246.0]
2025-05-10 01:13:30,996 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 16 minutes, 51 seconds)
2025-05-10 01:15:57,033 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:15:59,779 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 522.48669 ± 107.678
2025-05-10 01:15:59,779 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [486.92548, 433.7999, 530.1201, 440.63925, 634.8831, 683.8442, 658.4795, 507.18665, 316.19968, 532.7885]
2025-05-10 01:15:59,779 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [200.0, 185.0, 209.0, 185.0, 354.0, 280.0, 236.0, 218.0, 146.0, 240.0]
2025-05-10 01:15:59,805 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 13 minutes, 33 seconds)
2025-05-10 01:18:33,486 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:18:36,138 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 467.26016 ± 146.907
2025-05-10 01:18:36,138 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [726.51935, 570.7932, 443.06143, 414.7557, 499.2205, 291.9833, 361.1741, 537.3524, 212.71437, 615.0271]
2025-05-10 01:18:36,138 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [273.0, 308.0, 196.0, 189.0, 217.0, 193.0, 177.0, 236.0, 165.0, 224.0]
2025-05-10 01:18:36,149 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 11 minutes, 28 seconds)
2025-05-10 01:21:02,938 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:21:05,247 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 389.26019 ± 181.327
2025-05-10 01:21:05,248 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [367.22314, 537.679, 405.58136, 574.6343, 497.36688, 21.76353, 107.212296, 481.9517, 571.0204, 328.16934]
2025-05-10 01:21:05,248 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [186.0, 311.0, 174.0, 216.0, 281.0, 43.0, 84.0, 210.0, 261.0, 143.0]
2025-05-10 01:21:05,258 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 8 minutes, 38 seconds)
2025-05-10 01:23:36,223 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:23:38,371 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 391.54657 ± 146.289
2025-05-10 01:23:38,371 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [516.3864, 492.85794, 419.2065, 331.19492, 441.5333, 360.19394, 29.13249, 387.12897, 333.0264, 604.805]
2025-05-10 01:23:38,371 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [208.0, 231.0, 186.0, 144.0, 176.0, 142.0, 79.0, 169.0, 231.0, 224.0]
2025-05-10 01:23:38,381 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 6 minutes, 12 seconds)
2025-05-10 01:26:08,492 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:26:10,814 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 420.44638 ± 165.900
2025-05-10 01:26:10,814 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [419.79807, 514.8947, 371.86108, 496.2446, 8.668813, 434.37018, 471.521, 427.70786, 353.75262, 705.6448]
2025-05-10 01:26:10,814 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [177.0, 274.0, 156.0, 201.0, 37.0, 182.0, 221.0, 191.0, 151.0, 327.0]
2025-05-10 01:26:10,825 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 3 minutes, 19 seconds)
2025-05-10 01:28:43,592 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:28:45,718 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 345.44775 ± 177.856
2025-05-10 01:28:45,718 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [639.7503, 441.87473, 365.09564, 410.62466, 269.65253, 27.191036, 484.67804, 166.46815, 156.55745, 492.58514]
2025-05-10 01:28:45,718 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [236.0, 212.0, 250.0, 169.0, 135.0, 67.0, 186.0, 169.0, 179.0, 179.0]
2025-05-10 01:28:45,729 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 1 minute, 16 seconds)
2025-05-10 01:31:15,400 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:31:17,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 369.47998 ± 198.409
2025-05-10 01:31:17,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [666.1615, 381.66327, 492.75977, -0.104906976, 427.94684, 354.4352, 504.46375, 434.71814, 417.24963, 15.506876]
2025-05-10 01:31:17,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [377.0, 174.0, 241.0, 9.0, 174.0, 161.0, 240.0, 198.0, 225.0, 34.0]
2025-05-10 01:31:17,626 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 58 minutes, 22 seconds)
2025-05-10 01:33:45,944 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:33:48,457 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 457.98715 ± 135.249
2025-05-10 01:33:48,457 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [726.4714, 569.9211, 462.9555, 444.56213, 248.14888, 423.67175, 498.21487, 478.82605, 235.0853, 492.01477]
2025-05-10 01:33:48,457 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [325.0, 216.0, 236.0, 188.0, 97.0, 201.0, 230.0, 230.0, 160.0, 216.0]
2025-05-10 01:33:48,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 55 minutes, 58 seconds)
2025-05-10 01:36:16,867 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:36:19,530 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 516.87225 ± 175.703
2025-05-10 01:36:19,530 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [507.47507, 586.10767, 404.0878, 402.947, 506.84814, 855.24744, 295.6887, 429.4486, 371.1791, 809.6927]
2025-05-10 01:36:19,530 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [209.0, 322.0, 171.0, 202.0, 234.0, 306.0, 127.0, 216.0, 152.0, 277.0]
2025-05-10 01:36:19,542 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 53 minutes, 16 seconds)
2025-05-10 01:38:51,442 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:38:54,135 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 494.28253 ± 100.033
2025-05-10 01:38:54,135 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [496.55414, 484.1451, 482.91858, 510.9918, 480.95026, 576.423, 285.08463, 472.98914, 708.53094, 444.23755]
2025-05-10 01:38:54,135 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [242.0, 258.0, 192.0, 224.0, 236.0, 212.0, 138.0, 217.0, 285.0, 206.0]
2025-05-10 01:38:54,147 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 50 minutes, 53 seconds)
2025-05-10 01:41:24,345 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:41:26,550 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 394.80679 ± 203.097
2025-05-10 01:41:26,550 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [684.19257, 402.92178, 575.55756, 7.6686525, 470.77634, 425.8779, 384.34036, 255.87813, 614.14026, 126.71426]
2025-05-10 01:41:26,551 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [310.0, 225.0, 239.0, 28.0, 195.0, 197.0, 170.0, 142.0, 253.0, 65.0]
2025-05-10 01:41:26,562 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 48 minutes, 11 seconds)
2025-05-10 01:43:53,530 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:43:56,366 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 563.35974 ± 190.360
2025-05-10 01:43:56,366 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [459.23267, 595.43274, 338.0394, 329.5768, 669.4193, 1027.7628, 500.8243, 572.8831, 486.36438, 654.06177]
2025-05-10 01:43:56,366 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [210.0, 343.0, 177.0, 159.0, 243.0, 301.0, 219.0, 222.0, 170.0, 284.0]
2025-05-10 01:43:56,366 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (563.36) for latency MM1Queue_a033_s075
2025-05-10 01:43:56,366 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-10 01:43:56,369 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 01:43:56,385 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 45 minutes, 31 seconds)
2025-05-10 01:46:27,789 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:46:31,009 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 575.19055 ± 110.909
2025-05-10 01:46:31,009 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [495.3662, 705.2583, 560.49725, 665.96924, 554.01715, 488.75806, 430.27414, 765.6852, 652.5528, 433.52762]
2025-05-10 01:46:31,009 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [269.0, 389.0, 233.0, 250.0, 209.0, 259.0, 186.0, 355.0, 288.0, 174.0]
2025-05-10 01:46:31,009 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (575.19) for latency MM1Queue_a033_s075
2025-05-10 01:46:31,010 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-10 01:46:31,013 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 01:46:31,028 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 43 minutes, 12 seconds)
2025-05-10 01:49:02,040 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:49:05,184 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 505.97760 ± 183.246
2025-05-10 01:49:05,184 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [440.83557, 671.92114, 462.40588, 524.8456, 601.218, 570.28485, 737.6534, 21.592379, 521.6249, 507.39395]
2025-05-10 01:49:05,184 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [253.0, 392.0, 234.0, 256.0, 300.0, 225.0, 322.0, 49.0, 273.0, 243.0]
2025-05-10 01:49:05,210 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 40 minutes, 50 seconds)
2025-05-10 01:51:33,323 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:51:36,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 535.60388 ± 174.302
2025-05-10 01:51:36,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [646.7417, 640.6195, 433.0746, 91.69584, 431.61087, 646.67413, 698.2072, 663.5955, 495.5089, 608.31006]
2025-05-10 01:51:36,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [254.0, 263.0, 215.0, 102.0, 216.0, 262.0, 274.0, 309.0, 254.0, 232.0]
2025-05-10 01:51:36,247 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 38 minutes, 6 seconds)
2025-05-10 01:54:08,446 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:54:10,874 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 440.10260 ± 189.563
2025-05-10 01:54:10,874 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [28.994509, 499.4708, 355.09497, 583.43085, 189.34566, 512.6847, 426.62067, 638.4157, 655.1172, 511.85062]
2025-05-10 01:54:10,874 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [56.0, 205.0, 149.0, 222.0, 133.0, 245.0, 185.0, 315.0, 290.0, 209.0]
2025-05-10 01:54:10,889 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 35 minutes, 40 seconds)
2025-05-10 01:56:42,079 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:56:44,790 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 486.11954 ± 171.045
2025-05-10 01:56:44,791 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [27.895748, 449.4583, 489.48926, 600.3492, 412.00662, 628.8412, 619.98956, 596.0349, 447.08017, 590.0509]
2025-05-10 01:56:44,791 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [54.0, 197.0, 236.0, 268.0, 204.0, 253.0, 284.0, 291.0, 215.0, 227.0]
2025-05-10 01:56:44,803 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 33 minutes, 17 seconds)
2025-05-10 01:59:11,180 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:59:14,182 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 551.68365 ± 198.622
2025-05-10 01:59:14,182 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [211.29642, 342.63547, 620.2339, 952.3795, 711.8948, 534.24066, 464.60403, 699.07965, 548.77313, 431.69885]
2025-05-10 01:59:14,182 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [108.0, 165.0, 238.0, 334.0, 254.0, 242.0, 226.0, 280.0, 265.0, 356.0]
2025-05-10 01:59:14,195 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 30 minutes, 31 seconds)
2025-05-10 02:01:44,241 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 02:01:47,242 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 495.90869 ± 247.520
2025-05-10 02:01:47,243 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [121.954216, 764.5067, 668.0192, 556.36566, 671.34436, 370.62878, 473.4799, -1.3471923, 581.72797, 752.4072]
2025-05-10 02:01:47,243 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [207.0, 270.0, 275.0, 252.0, 425.0, 241.0, 204.0, 38.0, 215.0, 312.0]
2025-05-10 02:01:47,258 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 27 minutes, 56 seconds)
2025-05-10 02:04:18,659 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 02:04:21,340 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 473.89532 ± 239.153
2025-05-10 02:04:21,341 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [610.719, 299.44244, 654.6716, 671.64594, 788.13885, 157.44502, 674.49713, 371.5765, 485.5501, 25.266308]
2025-05-10 02:04:21,341 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [268.0, 143.0, 273.0, 258.0, 304.0, 180.0, 307.0, 177.0, 240.0, 54.0]
2025-05-10 02:04:21,354 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 25 minutes, 30 seconds)
2025-05-10 02:06:50,186 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 02:06:52,612 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 405.82214 ± 219.083
2025-05-10 02:06:52,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [191.22609, 37.140663, 220.10304, 470.38873, 664.6986, 661.467, 216.05362, 363.76953, 627.95514, 605.4192]
2025-05-10 02:06:52,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [100.0, 55.0, 149.0, 208.0, 278.0, 274.0, 136.0, 188.0, 295.0, 313.0]
2025-05-10 02:06:52,626 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 22 minutes, 51 seconds)
2025-05-10 02:09:22,625 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 02:09:25,362 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 462.40878 ± 245.856
2025-05-10 02:09:25,362 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [941.67267, 480.27316, 446.01266, 596.60956, 494.95303, 541.94653, 2.335639, 544.4917, 101.53517, 474.25797]
2025-05-10 02:09:25,363 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [490.0, 221.0, 179.0, 252.0, 222.0, 221.0, 58.0, 264.0, 124.0, 202.0]
2025-05-10 02:09:25,378 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 20 minutes, 16 seconds)
2025-05-10 02:11:57,971 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 02:12:00,689 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 502.74139 ± 147.040
2025-05-10 02:12:00,689 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [495.17758, 545.8651, 738.4681, 452.69046, 250.32652, 411.86703, 552.1322, 314.60626, 554.3836, 711.89703]
2025-05-10 02:12:00,689 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [246.0, 306.0, 272.0, 218.0, 141.0, 180.0, 233.0, 156.0, 236.0, 259.0]
2025-05-10 02:12:00,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 17 minutes, 53 seconds)
2025-05-10 02:14:30,863 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 02:14:32,877 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 351.98380 ± 280.654
2025-05-10 02:14:32,877 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [165.81822, 16.536358, 883.0564, 400.50525, -7.1035485, 595.91425, 433.15082, 584.7115, 31.344791, 415.90375]
2025-05-10 02:14:32,877 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [80.0, 42.0, 376.0, 180.0, 33.0, 242.0, 190.0, 278.0, 50.0, 192.0]
2025-05-10 02:14:32,891 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 15 minutes, 18 seconds)
2025-05-10 02:16:59,967 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 02:17:03,257 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 599.77112 ± 157.024
2025-05-10 02:17:03,257 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [892.2371, 363.05338, 535.59607, 435.6293, 744.41626, 692.3248, 474.79437, 606.0599, 509.9146, 743.68494]
2025-05-10 02:17:03,258 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [351.0, 185.0, 227.0, 221.0, 294.0, 332.0, 212.0, 260.0, 229.0, 351.0]
2025-05-10 02:17:03,258 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (599.77) for latency MM1Queue_a033_s075
2025-05-10 02:17:03,258 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-10 02:17:03,262 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 02:17:03,281 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 12 minutes, 41 seconds)
2025-05-10 02:19:35,248 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 02:19:38,678 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 580.86664 ± 123.746
2025-05-10 02:19:38,678 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [513.9216, 475.2577, 615.7412, 633.3548, 458.6925, 847.3699, 461.11856, 596.78546, 729.66644, 476.7586]
2025-05-10 02:19:38,678 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [271.0, 236.0, 263.0, 372.0, 220.0, 343.0, 260.0, 255.0, 306.0, 246.0]
2025-05-10 02:19:38,692 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 10 minutes, 12 seconds)
2025-05-10 02:22:08,911 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 02:22:11,752 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 537.43085 ± 468.677
2025-05-10 02:22:11,753 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [594.42615, -12.800389, 621.3984, 280.0036, 690.9261, -1.9388671, 546.93945, 1331.8572, 1310.0804, 13.416337]
2025-05-10 02:22:11,753 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [239.0, 48.0, 248.0, 137.0, 293.0, 23.0, 244.0, 436.0, 559.0, 52.0]
2025-05-10 02:22:11,766 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 7 minutes, 39 seconds)
2025-05-10 02:24:42,726 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 02:24:46,204 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 604.80261 ± 212.372
2025-05-10 02:24:46,204 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [840.514, 1034.6842, 534.47687, 368.92444, 557.44824, 716.0398, 396.06534, 411.3489, 758.9452, 429.57883]
2025-05-10 02:24:46,204 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [347.0, 410.0, 233.0, 174.0, 316.0, 340.0, 214.0, 176.0, 367.0, 226.0]
2025-05-10 02:24:46,204 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (604.80) for latency MM1Queue_a033_s075
2025-05-10 02:24:46,205 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-10 02:24:46,208 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 02:24:46,225 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 5 minutes, 6 seconds)
2025-05-10 02:27:16,631 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 02:27:19,620 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 578.25427 ± 186.298
2025-05-10 02:27:19,620 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [617.7659, 801.4901, 460.82007, 809.10266, 625.355, 555.8582, 603.66754, 640.5743, 108.14762, 559.7615]
2025-05-10 02:27:19,620 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [224.0, 293.0, 198.0, 300.0, 296.0, 259.0, 253.0, 285.0, 121.0, 213.0]
2025-05-10 02:27:19,634 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 33 seconds)
2025-05-10 02:29:47,378 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 02:29:50,291 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 529.24390 ± 120.006
2025-05-10 02:29:50,291 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [366.52676, 477.91092, 440.3208, 773.3574, 470.23068, 394.80603, 640.56696, 639.5985, 524.46625, 564.6544]
2025-05-10 02:29:50,292 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [153.0, 255.0, 216.0, 334.0, 225.0, 181.0, 279.0, 271.0, 227.0, 242.0]
2025-05-10 02:29:50,305 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1251 [DEBUG]: Training session finished
