2025-05-10 16:03:51,862 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem16
2025-05-10 16:03:51,862 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem16
2025-05-10 16:03:51,862 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x75f1ca23df70>}
2025-05-10 16:03:51,862 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1111 [DEBUG]: using device: cpu
2025-05-10 16:03:51,862 baseline-sac-noisy-walker2d:77 [WARNING]: args.memorize_actions != args.horizon: 16 != 24
2025-05-10 16:03:51,868 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1133 [INFO]: Creating new trainer
2025-05-10 16:03:51,874 baseline-sac-noisy-walker2d:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=113, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-05-10 16:03:51,874 baseline-sac-noisy-walker2d:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=119, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-10 16:03:52,071 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1194 [DEBUG]: Starting training session...
2025-05-10 16:03:52,071 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 1/100
2025-05-10 16:06:26,568 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 16:06:27,874 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 13.32186 ± 5.321
2025-05-10 16:06:27,874 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [12.050088, 5.08329, 21.655699, 13.785176, 20.338297, 7.056106, 7.2917914, 14.247225, 17.501875, 14.209026]
2025-05-10 16:06:27,874 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [86.0, 84.0, 87.0, 82.0, 92.0, 83.0, 87.0, 87.0, 86.0, 87.0]
2025-05-10 16:06:27,874 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (13.32) for latency MM1Queue_a033_s075
2025-05-10 16:06:27,874 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-10 16:06:27,878 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 16:06:27,883 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 17 minutes, 5 seconds)
2025-05-10 16:09:14,399 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 16:09:16,234 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 47.39625 ± 58.826
2025-05-10 16:09:16,234 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [72.31604, 19.008152, 210.11523, 15.113552, 22.82343, 40.21286, 48.36752, 21.050728, 43.046654, -18.091671]
2025-05-10 16:09:16,234 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [197.0, 34.0, 129.0, 165.0, 149.0, 100.0, 116.0, 97.0, 66.0, 132.0]
2025-05-10 16:09:16,234 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (47.40) for latency MM1Queue_a033_s075
2025-05-10 16:09:16,234 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-10 16:09:16,238 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 16:09:16,244 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 24 minutes, 44 seconds)
2025-05-10 16:12:01,294 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 16:12:02,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: -2.46962 ± 14.017
2025-05-10 16:12:02,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [-4.8907824, 9.233344, 10.838653, -30.243336, 7.190449, 15.095129, -22.49381, -0.3557363, -9.9021, 0.83194447]
2025-05-10 16:12:02,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [112.0, 90.0, 85.0, 135.0, 58.0, 132.0, 103.0, 92.0, 94.0, 92.0]
2025-05-10 16:12:02,821 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 24 minutes, 27 seconds)
2025-05-10 16:14:49,829 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 16:14:51,965 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 138.48410 ± 86.059
2025-05-10 16:14:51,965 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [50.56388, 25.701138, 239.17207, 79.71836, 57.0505, 274.878, 78.05735, 226.11487, 160.67052, 192.91428]
2025-05-10 16:14:51,965 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [114.0, 74.0, 129.0, 136.0, 149.0, 202.0, 167.0, 163.0, 115.0, 122.0]
2025-05-10 16:14:51,965 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (138.48) for latency MM1Queue_a033_s075
2025-05-10 16:14:51,966 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-10 16:14:51,970 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 16:14:51,976 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 23 minutes, 57 seconds)
2025-05-10 16:17:37,771 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 16:17:39,917 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 68.28742 ± 26.590
2025-05-10 16:17:39,917 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [87.890686, 68.724335, 91.47527, 22.779411, 53.51013, 83.48015, 119.89848, 42.635654, 53.400436, 59.07966]
2025-05-10 16:17:39,917 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [162.0, 96.0, 216.0, 87.0, 89.0, 133.0, 223.0, 114.0, 90.0, 171.0]
2025-05-10 16:17:39,919 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 22 minutes, 9 seconds)
2025-05-10 16:20:27,984 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 16:20:29,961 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 169.90326 ± 110.269
2025-05-10 16:20:29,961 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [173.69028, 259.85776, 267.32544, 7.82786, 338.29456, 253.08939, 86.26142, 47.001137, 226.65814, 39.026684]
2025-05-10 16:20:29,961 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [104.0, 138.0, 186.0, 19.0, 238.0, 130.0, 114.0, 112.0, 118.0, 118.0]
2025-05-10 16:20:29,962 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (169.90) for latency MM1Queue_a033_s075
2025-05-10 16:20:29,962 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-10 16:20:29,966 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 16:20:29,972 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 23 minutes, 51 seconds)
2025-05-10 16:23:15,828 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 16:23:18,492 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 214.29703 ± 81.085
2025-05-10 16:23:18,492 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [259.659, 268.6521, 131.13042, 282.46857, 137.25876, 219.04585, 52.55983, 343.305, 214.286, 234.60461]
2025-05-10 16:23:18,492 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [194.0, 180.0, 201.0, 165.0, 160.0, 131.0, 125.0, 248.0, 141.0, 155.0]
2025-05-10 16:23:18,492 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (214.30) for latency MM1Queue_a033_s075
2025-05-10 16:23:18,493 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-10 16:23:18,496 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 16:23:18,525 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 21 minutes, 6 seconds)
2025-05-10 16:26:06,624 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 16:26:08,577 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 215.05331 ± 59.507
2025-05-10 16:26:08,577 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [268.08984, 155.94739, 217.8785, 273.65738, 232.6808, 211.7492, 147.22179, 168.0974, 144.01836, 331.19257]
2025-05-10 16:26:08,577 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [145.0, 98.0, 120.0, 164.0, 126.0, 109.0, 92.0, 104.0, 87.0, 214.0]
2025-05-10 16:26:08,577 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (215.05) for latency MM1Queue_a033_s075
2025-05-10 16:26:08,577 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-10 16:26:08,581 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 16:26:08,588 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 19 minutes, 22 seconds)
2025-05-10 16:28:55,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 16:28:57,273 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 135.59361 ± 93.825
2025-05-10 16:28:57,273 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [35.378834, 70.19516, 74.23725, 279.188, 319.31613, 77.08507, 73.9593, 156.0611, 198.23714, 72.278175]
2025-05-10 16:28:57,273 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [81.0, 109.0, 107.0, 161.0, 216.0, 153.0, 127.0, 106.0, 212.0, 129.0]
2025-05-10 16:28:57,275 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 16 minutes, 24 seconds)
2025-05-10 16:31:46,525 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 16:31:49,013 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 213.02786 ± 101.136
2025-05-10 16:31:49,013 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [234.35336, 208.4384, 296.4756, 293.7283, 373.4794, 66.42561, 308.22205, 150.06158, 65.53301, 133.56128]
2025-05-10 16:31:49,013 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [148.0, 169.0, 159.0, 195.0, 231.0, 94.0, 175.0, 183.0, 84.0, 158.0]
2025-05-10 16:31:49,016 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 14 minutes, 43 seconds)
2025-05-10 16:34:34,056 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 16:34:37,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 278.46738 ± 96.175
2025-05-10 16:34:37,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [242.39586, 162.6323, 352.6562, 413.96652, 266.26498, 395.2877, 278.66122, 338.87375, 90.79806, 243.13724]
2025-05-10 16:34:37,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [155.0, 244.0, 208.0, 255.0, 159.0, 287.0, 160.0, 204.0, 172.0, 156.0]
2025-05-10 16:34:37,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (278.47) for latency MM1Queue_a033_s075
2025-05-10 16:34:37,229 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-10 16:34:37,232 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 16:34:37,240 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 11 minutes, 21 seconds)
2025-05-10 16:37:25,970 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 16:37:28,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 227.62285 ± 36.213
2025-05-10 16:37:28,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [308.6017, 200.62712, 193.36014, 189.42752, 220.53685, 258.1096, 222.21898, 203.48932, 214.43083, 265.42645]
2025-05-10 16:37:28,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [242.0, 130.0, 110.0, 108.0, 139.0, 157.0, 135.0, 127.0, 142.0, 162.0]
2025-05-10 16:37:28,231 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 9 minutes, 14 seconds)
2025-05-10 16:40:16,552 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 16:40:19,455 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 270.71005 ± 75.242
2025-05-10 16:40:19,455 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [179.52077, 259.85037, 227.76933, 358.72577, 298.4605, 223.45544, 449.2742, 238.79527, 246.11525, 225.13383]
2025-05-10 16:40:19,456 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [127.0, 140.0, 153.0, 240.0, 232.0, 158.0, 348.0, 158.0, 147.0, 144.0]
2025-05-10 16:40:19,459 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 6 minutes, 45 seconds)
2025-05-10 16:43:05,983 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 16:43:09,440 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 311.40079 ± 74.878
2025-05-10 16:43:09,440 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [190.03062, 404.00598, 292.82614, 365.07263, 253.64413, 382.91013, 244.62836, 244.696, 421.71213, 314.48178]
2025-05-10 16:43:09,440 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [130.0, 203.0, 151.0, 546.0, 132.0, 248.0, 158.0, 154.0, 242.0, 185.0]
2025-05-10 16:43:09,440 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (311.40) for latency MM1Queue_a033_s075
2025-05-10 16:43:09,441 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-10 16:43:09,445 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 16:43:09,453 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 4 minutes, 17 seconds)
2025-05-10 16:45:54,458 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 16:45:56,705 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 271.98941 ± 52.198
2025-05-10 16:45:56,705 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [271.32953, 272.8179, 221.18837, 343.11026, 292.90262, 307.75754, 317.21344, 290.77118, 148.83372, 253.96956]
2025-05-10 16:45:56,705 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [140.0, 172.0, 113.0, 167.0, 166.0, 155.0, 167.0, 141.0, 89.0, 139.0]
2025-05-10 16:45:56,709 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 10 seconds)
2025-05-10 16:48:45,186 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 16:48:49,556 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 452.28256 ± 153.620
2025-05-10 16:48:49,557 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [357.34598, 466.45926, 349.089, 244.80208, 551.8072, 466.6214, 384.67264, 847.2071, 402.9548, 451.86636]
2025-05-10 16:48:49,557 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [264.0, 263.0, 164.0, 222.0, 288.0, 228.0, 292.0, 446.0, 203.0, 253.0]
2025-05-10 16:48:49,557 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (452.28) for latency MM1Queue_a033_s075
2025-05-10 16:48:49,557 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-10 16:48:49,561 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 16:48:49,569 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 3 hours, 58 minutes, 39 seconds)
2025-05-10 16:51:48,723 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 16:51:53,698 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 443.03555 ± 107.412
2025-05-10 16:51:53,698 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [414.9842, 563.6499, 465.47876, 564.1374, 412.36975, 607.74695, 362.46014, 243.81573, 455.01877, 340.69385]
2025-05-10 16:51:53,699 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [207.0, 460.0, 264.0, 329.0, 341.0, 521.0, 248.0, 124.0, 256.0, 198.0]
2025-05-10 16:51:53,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 3 hours, 59 minutes, 26 seconds)
2025-05-10 16:54:55,305 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 16:54:57,762 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 295.65155 ± 80.196
2025-05-10 16:54:57,762 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [384.34073, 256.25272, 348.54095, 369.66946, 303.4537, 278.3198, 130.12587, 190.11453, 381.554, 314.14346]
2025-05-10 16:54:57,762 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [185.0, 185.0, 160.0, 218.0, 147.0, 124.0, 86.0, 96.0, 179.0, 136.0]
2025-05-10 16:54:57,766 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 4 seconds)
2025-05-10 16:57:52,233 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 16:57:56,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 412.32047 ± 181.425
2025-05-10 16:57:56,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [569.56305, 722.16223, 75.22183, 162.04997, 388.50693, 533.2071, 359.44678, 384.77866, 527.5661, 400.702]
2025-05-10 16:57:56,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [339.0, 460.0, 95.0, 143.0, 176.0, 293.0, 202.0, 173.0, 331.0, 240.0]
2025-05-10 16:57:56,343 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 3 hours, 59 minutes, 27 seconds)
2025-05-10 17:00:52,745 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 17:00:55,392 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 335.08463 ± 119.049
2025-05-10 17:00:55,392 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [268.08093, 299.3984, 323.45447, 362.58118, 633.80316, 156.9882, 362.04355, 287.86353, 253.9088, 402.7242]
2025-05-10 17:00:55,392 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [131.0, 156.0, 140.0, 162.0, 323.0, 75.0, 152.0, 135.0, 136.0, 212.0]
2025-05-10 17:00:55,396 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 59 minutes, 38 seconds)
2025-05-10 17:03:52,243 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 17:03:54,217 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 247.57642 ± 79.814
2025-05-10 17:03:54,217 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [142.2274, 140.03392, 366.0255, 276.6554, 321.8451, 296.2751, 137.45456, 308.2689, 278.36472, 208.61354]
2025-05-10 17:03:54,217 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [73.0, 73.0, 170.0, 125.0, 157.0, 124.0, 68.0, 150.0, 150.0, 132.0]
2025-05-10 17:03:54,221 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 58 minutes, 13 seconds)
2025-05-10 17:06:52,271 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 17:06:55,040 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 327.19836 ± 73.443
2025-05-10 17:06:55,040 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [274.72137, 385.52368, 388.20782, 420.83942, 308.40082, 258.22696, 178.566, 416.585, 322.54953, 318.363]
2025-05-10 17:06:55,040 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [121.0, 220.0, 236.0, 201.0, 145.0, 116.0, 163.0, 206.0, 153.0, 134.0]
2025-05-10 17:06:55,062 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 54 minutes, 21 seconds)
2025-05-10 17:09:51,523 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 17:09:54,477 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 335.50250 ± 105.697
2025-05-10 17:09:54,477 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [430.0726, 295.25427, 299.4277, 550.65985, 300.35153, 304.83948, 149.93083, 265.63504, 315.36194, 443.49176]
2025-05-10 17:09:54,477 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [193.0, 164.0, 136.0, 310.0, 152.0, 144.0, 178.0, 140.0, 147.0, 243.0]
2025-05-10 17:09:54,481 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 50 minutes, 9 seconds)
2025-05-10 17:12:52,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 17:12:54,803 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 282.81131 ± 38.887
2025-05-10 17:12:54,803 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [295.6999, 276.21872, 221.9275, 261.78284, 323.67807, 293.08392, 308.31198, 336.797, 211.09114, 299.52197]
2025-05-10 17:12:54,803 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [143.0, 128.0, 114.0, 125.0, 156.0, 149.0, 156.0, 173.0, 111.0, 124.0]
2025-05-10 17:12:54,808 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 47 minutes, 36 seconds)
2025-05-10 17:15:51,024 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 17:15:53,205 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 290.56787 ± 110.695
2025-05-10 17:15:53,205 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [267.07016, 289.75552, 263.7163, 518.6893, 182.8798, 403.29526, 166.9283, 395.40155, 260.56064, 157.38213]
2025-05-10 17:15:53,205 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [136.0, 126.0, 121.0, 208.0, 98.0, 196.0, 88.0, 175.0, 122.0, 80.0]
2025-05-10 17:15:53,210 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 44 minutes, 27 seconds)
2025-05-10 17:18:50,122 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 17:18:52,966 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 334.95215 ± 81.814
2025-05-10 17:18:52,966 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [248.56673, 365.89896, 451.1999, 270.71713, 269.9393, 299.42575, 235.35197, 339.06964, 381.9489, 487.4032]
2025-05-10 17:18:52,966 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [136.0, 193.0, 211.0, 201.0, 142.0, 146.0, 122.0, 163.0, 176.0, 253.0]
2025-05-10 17:18:52,971 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 41 minutes, 41 seconds)
2025-05-10 17:21:51,245 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 17:21:54,328 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 390.15649 ± 107.360
2025-05-10 17:21:54,329 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [406.63495, 266.20282, 251.90013, 274.52258, 421.5565, 417.32623, 602.2561, 311.8252, 476.7042, 472.63617]
2025-05-10 17:21:54,329 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [217.0, 131.0, 124.0, 135.0, 200.0, 175.0, 306.0, 133.0, 238.0, 215.0]
2025-05-10 17:21:54,334 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 38 minutes, 49 seconds)
2025-05-10 17:24:51,345 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 17:24:54,725 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 414.75739 ± 175.892
2025-05-10 17:24:54,726 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [357.98022, 307.93906, 468.0505, 479.11005, 273.82138, 483.872, 842.4724, 472.93665, 163.45166, 297.9397]
2025-05-10 17:24:54,726 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [159.0, 173.0, 230.0, 243.0, 119.0, 238.0, 411.0, 227.0, 101.0, 141.0]
2025-05-10 17:24:54,731 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 36 minutes, 3 seconds)
2025-05-10 17:27:54,345 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 17:27:57,708 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 421.96173 ± 85.048
2025-05-10 17:27:57,708 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [493.3162, 527.4822, 554.9994, 491.14258, 374.42667, 345.11337, 342.30994, 436.54858, 348.52496, 305.7532]
2025-05-10 17:27:57,708 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [236.0, 255.0, 290.0, 191.0, 191.0, 183.0, 174.0, 240.0, 153.0, 127.0]
2025-05-10 17:27:57,713 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 33 minutes, 41 seconds)
2025-05-10 17:30:53,833 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 17:30:56,611 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 368.54373 ± 62.933
2025-05-10 17:30:56,611 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [318.19592, 527.62274, 346.28674, 318.47812, 406.7074, 388.8233, 378.85678, 288.25903, 360.21417, 351.99326]
2025-05-10 17:30:56,611 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [143.0, 253.0, 152.0, 147.0, 183.0, 164.0, 191.0, 129.0, 188.0, 155.0]
2025-05-10 17:30:56,617 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 30 minutes, 47 seconds)
2025-05-10 17:33:53,927 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 17:33:57,117 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 361.55301 ± 31.771
2025-05-10 17:33:57,117 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [343.30905, 412.87213, 328.7911, 352.23676, 338.95795, 321.37082, 385.97778, 348.09195, 367.31226, 416.6104]
2025-05-10 17:33:57,117 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [188.0, 217.0, 177.0, 201.0, 184.0, 170.0, 205.0, 184.0, 197.0, 219.0]
2025-05-10 17:33:57,123 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 27 minutes, 57 seconds)
2025-05-10 17:36:56,364 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 17:36:59,261 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 401.09402 ± 35.655
2025-05-10 17:36:59,261 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [313.4314, 391.0134, 427.60974, 419.98782, 420.70395, 375.56912, 387.3288, 437.80627, 400.43887, 437.05048]
2025-05-10 17:36:59,261 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [135.0, 181.0, 204.0, 188.0, 173.0, 155.0, 180.0, 201.0, 177.0, 183.0]
2025-05-10 17:36:59,267 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 25 minutes, 7 seconds)
2025-05-10 17:39:57,072 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 17:40:00,004 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 379.68408 ± 46.011
2025-05-10 17:40:00,004 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [347.3141, 437.88046, 384.25803, 371.57053, 359.21057, 354.31998, 482.39728, 395.41342, 320.2687, 344.20767]
2025-05-10 17:40:00,005 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [167.0, 210.0, 186.0, 168.0, 170.0, 169.0, 240.0, 173.0, 153.0, 157.0]
2025-05-10 17:40:00,011 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 22 minutes, 10 seconds)
2025-05-10 17:42:56,606 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 17:42:59,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 331.67960 ± 36.563
2025-05-10 17:42:59,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [414.56668, 339.16653, 304.0024, 287.668, 284.81714, 348.19226, 360.19592, 332.3621, 335.22665, 310.59836]
2025-05-10 17:42:59,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [173.0, 167.0, 146.0, 122.0, 123.0, 172.0, 168.0, 158.0, 170.0, 154.0]
2025-05-10 17:42:59,137 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 18 minutes, 18 seconds)
2025-05-10 17:45:59,214 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 17:46:03,011 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 500.07904 ± 129.716
2025-05-10 17:46:03,011 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [525.1653, 492.06708, 391.67078, 331.57938, 314.58685, 704.5777, 479.70474, 505.81558, 532.6296, 722.9935]
2025-05-10 17:46:03,012 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [311.0, 217.0, 180.0, 142.0, 126.0, 364.0, 215.0, 194.0, 214.0, 328.0]
2025-05-10 17:46:03,012 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (500.08) for latency MM1Queue_a033_s075
2025-05-10 17:46:03,012 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-10 17:46:03,016 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 17:46:03,027 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 16 minutes, 23 seconds)
2025-05-10 17:48:57,341 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 17:49:01,548 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 535.44928 ± 107.000
2025-05-10 17:49:01,549 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [537.39813, 634.04846, 593.73615, 744.07355, 512.6325, 341.69507, 460.97272, 600.48566, 465.6651, 463.78546]
2025-05-10 17:49:01,549 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [409.0, 293.0, 258.0, 390.0, 202.0, 144.0, 178.0, 257.0, 190.0, 202.0]
2025-05-10 17:49:01,549 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (535.45) for latency MM1Queue_a033_s075
2025-05-10 17:49:01,549 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-10 17:49:01,553 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 17:49:01,564 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 12 minutes, 56 seconds)
2025-05-10 17:51:59,181 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 17:52:05,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 657.05066 ± 207.886
2025-05-10 17:52:05,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [580.371, 533.4839, 793.82544, 510.58997, 1214.121, 592.976, 708.4852, 643.89813, 464.9472, 527.80853]
2025-05-10 17:52:05,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [248.0, 260.0, 457.0, 251.0, 670.0, 295.0, 389.0, 364.0, 210.0, 395.0]
2025-05-10 17:52:05,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (657.05) for latency MM1Queue_a033_s075
2025-05-10 17:52:05,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-10 17:52:05,240 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 17:52:05,251 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 10 minutes, 15 seconds)
2025-05-10 17:55:05,825 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 17:55:09,961 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 531.51843 ± 245.336
2025-05-10 17:55:09,961 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [493.7721, 1241.6847, 401.6703, 350.1869, 482.0571, 424.10458, 386.91302, 456.59024, 485.38852, 592.817]
2025-05-10 17:55:09,961 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [204.0, 609.0, 186.0, 149.0, 228.0, 186.0, 183.0, 203.0, 225.0, 289.0]
2025-05-10 17:55:09,968 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 8 minutes, 3 seconds)
2025-05-10 17:58:11,459 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 17:58:14,818 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 445.50067 ± 80.319
2025-05-10 17:58:14,819 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [537.0062, 370.03302, 375.16092, 597.1399, 451.73816, 374.41183, 549.8565, 405.86273, 386.35913, 407.43817]
2025-05-10 17:58:14,819 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [238.0, 176.0, 177.0, 273.0, 215.0, 171.0, 239.0, 193.0, 176.0, 190.0]
2025-05-10 17:58:14,826 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 6 minutes, 11 seconds)
2025-05-10 18:01:07,570 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 18:01:11,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 518.41724 ± 141.250
2025-05-10 18:01:11,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [547.2996, 371.7895, 476.56873, 311.0334, 861.1834, 575.00244, 448.98752, 517.9089, 481.49124, 592.9076]
2025-05-10 18:01:11,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [261.0, 174.0, 226.0, 148.0, 405.0, 246.0, 223.0, 239.0, 212.0, 268.0]
2025-05-10 18:01:11,592 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 1 minute, 42 seconds)
2025-05-10 18:04:13,008 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 18:04:17,048 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 562.63586 ± 145.733
2025-05-10 18:04:17,049 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [409.02576, 347.35318, 579.5515, 852.70105, 445.61798, 674.7955, 604.16235, 430.89096, 654.1021, 628.15814]
2025-05-10 18:04:17,049 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [156.0, 144.0, 254.0, 389.0, 194.0, 288.0, 260.0, 183.0, 275.0, 276.0]
2025-05-10 18:04:17,056 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 2 seconds)
2025-05-10 18:07:13,775 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 18:07:19,340 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 708.84216 ± 212.690
2025-05-10 18:07:19,340 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [560.9096, 626.2493, 633.374, 864.68677, 629.1455, 610.3352, 538.2762, 724.1274, 610.5481, 1290.7695]
2025-05-10 18:07:19,340 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [249.0, 291.0, 298.0, 389.0, 266.0, 276.0, 244.0, 332.0, 290.0, 622.0]
2025-05-10 18:07:19,340 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (708.84) for latency MM1Queue_a033_s075
2025-05-10 18:07:19,341 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-10 18:07:19,344 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 18:07:19,357 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 56 minutes, 43 seconds)
2025-05-10 18:10:19,112 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 18:10:25,693 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 782.82477 ± 376.462
2025-05-10 18:10:25,694 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [974.00757, 1709.8014, 907.34296, 292.0913, 569.8543, 701.27155, 538.62177, 549.962, 1015.14923, 570.1452]
2025-05-10 18:10:25,694 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [463.0, 841.0, 448.0, 120.0, 267.0, 372.0, 250.0, 258.0, 506.0, 269.0]
2025-05-10 18:10:25,694 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (782.82) for latency MM1Queue_a033_s075
2025-05-10 18:10:25,694 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-10 18:10:25,698 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 18:10:25,710 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 53 minutes, 59 seconds)
2025-05-10 18:12:53,450 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 18:12:58,185 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 707.66589 ± 242.566
2025-05-10 18:12:58,185 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [639.287, 922.13477, 410.0967, 553.1965, 654.7485, 543.2742, 579.89636, 528.4111, 1088.6472, 1156.967]
2025-05-10 18:12:58,186 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [287.0, 427.0, 169.0, 251.0, 320.0, 241.0, 248.0, 219.0, 484.0, 517.0]
2025-05-10 18:12:58,192 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 44 minutes, 53 seconds)
2025-05-10 18:15:20,937 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 18:15:26,330 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 773.90723 ± 237.630
2025-05-10 18:15:26,330 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1070.6952, 512.8611, 949.3724, 663.6888, 528.8768, 971.1179, 928.3973, 1094.5388, 496.1952, 523.32935]
2025-05-10 18:15:26,330 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [504.0, 235.0, 456.0, 314.0, 229.0, 453.0, 446.0, 500.0, 207.0, 243.0]
2025-05-10 18:15:26,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 36 minutes, 42 seconds)
2025-05-10 18:17:50,512 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 18:17:56,246 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 794.49036 ± 173.285
2025-05-10 18:17:56,246 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [652.4508, 1127.1256, 937.72565, 908.7863, 612.2304, 908.2568, 681.2025, 885.95764, 663.0148, 568.1527]
2025-05-10 18:17:56,246 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [329.0, 551.0, 454.0, 428.0, 286.0, 428.0, 317.0, 428.0, 314.0, 262.0]
2025-05-10 18:17:56,246 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (794.49) for latency MM1Queue_a033_s075
2025-05-10 18:17:56,246 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-10 18:17:56,249 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 18:17:56,261 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 27 minutes, 27 seconds)
2025-05-10 18:20:20,865 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 18:20:25,538 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 681.34271 ± 244.316
2025-05-10 18:20:25,538 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [891.30554, 866.3644, 359.99744, 489.6449, 562.5873, 742.30597, 700.40314, 415.1272, 576.5731, 1209.1183]
2025-05-10 18:20:25,538 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [437.0, 412.0, 146.0, 202.0, 230.0, 355.0, 320.0, 167.0, 257.0, 587.0]
2025-05-10 18:20:25,545 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 18 minutes, 53 seconds)
2025-05-10 18:22:49,658 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 18:22:54,068 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 626.35596 ± 84.833
2025-05-10 18:22:54,068 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [840.02405, 587.8261, 589.21893, 567.63794, 580.8401, 721.03217, 633.3987, 581.22906, 543.78436, 618.5686]
2025-05-10 18:22:54,068 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [396.0, 288.0, 286.0, 283.0, 289.0, 321.0, 304.0, 286.0, 255.0, 280.0]
2025-05-10 18:22:54,075 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 9 minutes, 42 seconds)
2025-05-10 18:25:16,910 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 18:25:21,215 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 631.78149 ± 149.735
2025-05-10 18:25:21,216 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [719.78436, 312.49557, 637.14984, 640.7811, 838.6451, 604.5902, 602.09283, 866.2016, 525.4997, 570.5747]
2025-05-10 18:25:21,216 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [332.0, 134.0, 325.0, 292.0, 378.0, 273.0, 275.0, 401.0, 240.0, 259.0]
2025-05-10 18:25:21,223 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 6 minutes, 18 seconds)
2025-05-10 18:27:46,754 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 18:27:51,535 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 685.61261 ± 123.073
2025-05-10 18:27:51,536 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [919.043, 603.1739, 476.1897, 609.17804, 705.46906, 601.8636, 838.5266, 624.72327, 750.9805, 726.97864]
2025-05-10 18:27:51,536 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [451.0, 314.0, 186.0, 287.0, 312.0, 300.0, 394.0, 289.0, 358.0, 322.0]
2025-05-10 18:27:51,543 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 4 minutes, 12 seconds)
2025-05-10 18:30:19,001 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 18:30:25,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 883.69629 ± 276.502
2025-05-10 18:30:25,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [887.10284, 870.48755, 690.31006, 1575.9194, 764.07623, 1116.9556, 485.81863, 783.1126, 780.5767, 882.604]
2025-05-10 18:30:25,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [440.0, 421.0, 333.0, 772.0, 392.0, 559.0, 207.0, 359.0, 392.0, 448.0]
2025-05-10 18:30:25,649 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (883.70) for latency MM1Queue_a033_s075
2025-05-10 18:30:25,649 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-10 18:30:25,652 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 18:30:25,664 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 2 minutes, 24 seconds)
2025-05-10 18:32:49,200 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 18:32:53,910 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 687.07404 ± 171.442
2025-05-10 18:32:53,910 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [958.7118, 921.8458, 898.0268, 487.4563, 647.9253, 501.21204, 743.9491, 582.2329, 550.3596, 579.0207]
2025-05-10 18:32:53,910 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [460.0, 455.0, 420.0, 211.0, 275.0, 199.0, 354.0, 264.0, 217.0, 304.0]
2025-05-10 18:32:53,918 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 59 minutes, 44 seconds)
2025-05-10 18:35:19,848 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 18:35:24,400 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 646.63928 ± 216.478
2025-05-10 18:35:24,400 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [771.13153, 589.43945, 825.8044, 419.21982, 1089.9211, 688.53546, 274.42047, 571.18005, 513.62683, 723.1132]
2025-05-10 18:35:24,400 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [370.0, 275.0, 395.0, 170.0, 531.0, 333.0, 119.0, 297.0, 215.0, 342.0]
2025-05-10 18:35:24,408 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 57 minutes, 33 seconds)
2025-05-10 18:37:49,896 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 18:37:54,999 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 732.30823 ± 129.833
2025-05-10 18:37:54,999 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [665.89465, 883.0918, 901.8007, 700.3075, 489.08203, 563.9438, 853.237, 818.4931, 692.32263, 754.9085]
2025-05-10 18:37:54,999 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [346.0, 378.0, 427.0, 340.0, 208.0, 250.0, 397.0, 339.0, 340.0, 371.0]
2025-05-10 18:37:55,007 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 55 minutes, 34 seconds)
2025-05-10 18:40:19,548 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 18:40:25,828 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 882.52576 ± 196.537
2025-05-10 18:40:25,828 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [874.18085, 836.5578, 963.6989, 1200.701, 571.0168, 1046.1759, 855.3704, 597.7331, 1118.8107, 761.01306]
2025-05-10 18:40:25,829 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [417.0, 395.0, 469.0, 558.0, 262.0, 466.0, 343.0, 310.0, 552.0, 357.0]
2025-05-10 18:40:25,837 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 53 minutes, 8 seconds)
2025-05-10 18:42:50,984 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 18:42:58,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1021.43054 ± 302.317
2025-05-10 18:42:58,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1773.4786, 1146.239, 693.11755, 1170.4325, 895.3302, 780.2932, 789.8048, 787.81976, 1013.28723, 1164.5028]
2025-05-10 18:42:58,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [876.0, 511.0, 348.0, 536.0, 402.0, 348.0, 340.0, 366.0, 457.0, 499.0]
2025-05-10 18:42:58,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (1021.43) for latency MM1Queue_a033_s075
2025-05-10 18:42:58,229 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-10 18:42:58,232 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 18:42:58,244 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 50 minutes, 22 seconds)
2025-05-10 18:45:27,079 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 18:45:32,183 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 751.64160 ± 195.896
2025-05-10 18:45:32,184 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [766.409, 349.452, 791.88165, 620.83923, 768.30316, 747.76886, 929.846, 1131.2233, 811.72174, 598.97064]
2025-05-10 18:45:32,184 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [346.0, 147.0, 364.0, 296.0, 345.0, 346.0, 430.0, 507.0, 363.0, 278.0]
2025-05-10 18:45:32,192 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 48 minutes, 41 seconds)
2025-05-10 18:47:58,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 18:48:05,973 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1026.24512 ± 489.527
2025-05-10 18:48:05,973 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [726.8526, 465.29553, 640.43976, 1126.2084, 1968.4865, 688.67975, 908.6607, 1059.3022, 782.2264, 1896.2998]
2025-05-10 18:48:05,973 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [342.0, 340.0, 256.0, 498.0, 1000.0, 305.0, 387.0, 479.0, 375.0, 1000.0]
2025-05-10 18:48:05,973 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (1026.25) for latency MM1Queue_a033_s075
2025-05-10 18:48:05,973 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-10 18:48:05,976 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 18:48:05,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 46 minutes, 37 seconds)
2025-05-10 18:50:34,158 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 18:50:40,361 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 741.65686 ± 277.992
2025-05-10 18:50:40,361 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [809.52686, 521.79517, 526.68365, 664.7346, 555.86426, 463.89957, 1114.9902, 1288.443, 977.82355, 492.80762]
2025-05-10 18:50:40,361 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [374.0, 302.0, 396.0, 282.0, 435.0, 283.0, 534.0, 628.0, 558.0, 288.0]
2025-05-10 18:50:40,370 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 44 minutes, 35 seconds)
2025-05-10 18:53:02,275 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 18:53:08,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 923.44452 ± 225.603
2025-05-10 18:53:08,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [848.4411, 878.12585, 717.4579, 850.5745, 1507.3496, 753.982, 975.0035, 695.4763, 1094.2216, 913.813]
2025-05-10 18:53:08,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [351.0, 367.0, 321.0, 363.0, 683.0, 333.0, 413.0, 302.0, 623.0, 392.0]
2025-05-10 18:53:08,584 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 41 minutes, 41 seconds)
2025-05-10 18:55:33,088 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 18:55:38,995 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 876.30725 ± 158.203
2025-05-10 18:55:38,995 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [735.98553, 1168.9401, 754.17114, 779.4086, 665.41693, 793.18384, 900.8037, 1011.66986, 853.9286, 1099.5632]
2025-05-10 18:55:38,995 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [331.0, 538.0, 344.0, 353.0, 313.0, 349.0, 379.0, 446.0, 375.0, 501.0]
2025-05-10 18:55:39,004 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 38 minutes, 53 seconds)
2025-05-10 18:58:02,673 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 18:58:12,628 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1280.52319 ± 543.843
2025-05-10 18:58:12,628 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [601.1362, 1353.8114, 1029.8701, 1027.7343, 2163.502, 417.45425, 1706.2651, 1161.0654, 2082.7446, 1261.6481]
2025-05-10 18:58:12,628 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [257.0, 626.0, 635.0, 454.0, 977.0, 165.0, 1000.0, 508.0, 1000.0, 573.0]
2025-05-10 18:58:12,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (1280.52) for latency MM1Queue_a033_s075
2025-05-10 18:58:12,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-10 18:58:12,632 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 18:58:12,646 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 36 minutes, 19 seconds)
2025-05-10 19:00:37,358 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 19:00:45,557 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1097.47815 ± 453.738
2025-05-10 19:00:45,557 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [643.04956, 898.6601, 1866.6503, 931.9578, 644.8349, 979.52454, 1281.1455, 1889.1064, 565.3345, 1274.5175]
2025-05-10 19:00:45,557 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [328.0, 432.0, 895.0, 443.0, 307.0, 439.0, 567.0, 944.0, 242.0, 608.0]
2025-05-10 19:00:45,566 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 33 minutes, 40 seconds)
2025-05-10 19:03:09,557 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 19:03:17,826 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1071.28442 ± 497.536
2025-05-10 19:03:17,826 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [617.1243, 890.588, 1976.6263, 645.7824, 1060.6371, 541.18976, 1228.414, 671.2119, 1134.818, 1946.4523]
2025-05-10 19:03:17,826 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [317.0, 412.0, 1000.0, 311.0, 559.0, 227.0, 566.0, 314.0, 508.0, 1000.0]
2025-05-10 19:03:17,836 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 30 minutes, 53 seconds)
2025-05-10 19:05:47,261 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 19:05:54,378 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1018.63849 ± 387.774
2025-05-10 19:05:54,378 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [575.57306, 557.22614, 837.85925, 732.2207, 1141.0092, 1185.5126, 1494.3296, 1846.4847, 929.1078, 887.06274]
2025-05-10 19:05:54,378 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [250.0, 243.0, 374.0, 390.0, 502.0, 532.0, 675.0, 835.0, 431.0, 377.0]
2025-05-10 19:05:54,388 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 29 minutes, 20 seconds)
2025-05-10 19:08:15,615 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 19:08:23,374 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1138.41333 ± 504.762
2025-05-10 19:08:23,374 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [791.5304, 1959.7006, 1337.0211, 1155.3823, 1437.1909, 1811.3755, 469.63144, 617.013, 489.8347, 1315.4529]
2025-05-10 19:08:23,374 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [324.0, 885.0, 589.0, 490.0, 648.0, 772.0, 197.0, 265.0, 204.0, 587.0]
2025-05-10 19:08:23,384 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 26 minutes, 37 seconds)
2025-05-10 19:10:51,845 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 19:10:58,677 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 988.76581 ± 426.372
2025-05-10 19:10:58,678 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [679.9368, 906.7828, 793.33984, 470.59552, 709.6368, 1446.924, 815.14545, 828.34564, 1249.4426, 1987.5089]
2025-05-10 19:10:58,678 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [297.0, 425.0, 341.0, 209.0, 315.0, 643.0, 354.0, 323.0, 611.0, 894.0]
2025-05-10 19:10:58,688 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 24 minutes, 15 seconds)
2025-05-10 19:13:22,386 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 19:13:33,465 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1436.52881 ± 480.056
2025-05-10 19:13:33,465 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1877.3369, 960.5123, 1662.1853, 925.2344, 1991.4088, 1940.9277, 955.44403, 2048.0452, 1116.3959, 887.799]
2025-05-10 19:13:33,465 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 444.0, 797.0, 406.0, 1000.0, 884.0, 421.0, 1000.0, 525.0, 372.0]
2025-05-10 19:13:33,465 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (1436.53) for latency MM1Queue_a033_s075
2025-05-10 19:13:33,465 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-10 19:13:33,469 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 19:13:33,483 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 21 minutes, 54 seconds)
2025-05-10 19:16:06,079 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 19:16:11,054 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 783.62579 ± 133.154
2025-05-10 19:16:11,054 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [638.3248, 668.6234, 883.8882, 647.89905, 879.44293, 881.52277, 852.8664, 1039.7064, 686.5899, 657.3939]
2025-05-10 19:16:11,054 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [293.0, 289.0, 370.0, 272.0, 371.0, 375.0, 353.0, 438.0, 294.0, 270.0]
2025-05-10 19:16:11,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 19 minutes, 54 seconds)
2025-05-10 19:18:27,982 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 19:18:33,256 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 846.91327 ± 175.113
2025-05-10 19:18:33,256 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1028.328, 653.3591, 887.9491, 911.2447, 807.0232, 680.95636, 924.90497, 909.7983, 523.8501, 1141.719]
2025-05-10 19:18:33,256 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [437.0, 282.0, 373.0, 364.0, 316.0, 305.0, 378.0, 386.0, 211.0, 473.0]
2025-05-10 19:18:33,267 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 15 minutes, 53 seconds)
2025-05-10 19:21:00,621 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 19:21:08,102 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1095.68115 ± 639.585
2025-05-10 19:21:08,102 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2105.5107, 1729.3508, 394.55716, 571.71185, 714.6381, 1026.9161, 2205.5078, 437.6793, 873.3894, 897.5502]
2025-05-10 19:21:08,102 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [931.0, 774.0, 154.0, 212.0, 300.0, 426.0, 1000.0, 174.0, 388.0, 362.0]
2025-05-10 19:21:08,113 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 13 minutes, 55 seconds)
2025-05-10 19:23:32,247 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 19:23:41,440 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1257.98145 ± 489.647
2025-05-10 19:23:41,440 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [876.9718, 637.25574, 1461.0833, 2073.5054, 992.81854, 1115.6383, 2089.7373, 757.5604, 1036.4504, 1538.7928]
2025-05-10 19:23:41,440 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [371.0, 249.0, 638.0, 1000.0, 446.0, 511.0, 1000.0, 333.0, 485.0, 748.0]
2025-05-10 19:23:41,453 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 11 minutes, 11 seconds)
2025-05-10 19:26:11,476 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 19:26:19,346 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1176.14917 ± 499.710
2025-05-10 19:26:19,346 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1905.3976, 948.43823, 814.6917, 635.76044, 1069.5795, 2219.5686, 582.69135, 1038.0182, 1259.6022, 1287.7446]
2025-05-10 19:26:19,346 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [817.0, 405.0, 331.0, 271.0, 452.0, 1000.0, 241.0, 441.0, 519.0, 548.0]
2025-05-10 19:26:19,357 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 8 minutes, 55 seconds)
2025-05-10 19:28:44,159 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 19:28:56,478 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1609.52222 ± 535.475
2025-05-10 19:28:56,479 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [901.1415, 1588.9393, 2062.041, 2128.3794, 1451.1849, 2091.936, 2107.1177, 845.85486, 2074.705, 843.9221]
2025-05-10 19:28:56,479 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [385.0, 705.0, 1000.0, 1000.0, 674.0, 1000.0, 1000.0, 373.0, 1000.0, 377.0]
2025-05-10 19:28:56,479 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (1609.52) for latency MM1Queue_a033_s075
2025-05-10 19:28:56,479 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-10 19:28:56,482 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 19:28:56,497 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 6 minutes, 20 seconds)
2025-05-10 19:31:17,967 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 19:31:25,005 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1103.14221 ± 197.404
2025-05-10 19:31:25,005 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1309.725, 1307.2693, 1154.403, 941.9898, 1009.7094, 892.0087, 950.1034, 1508.769, 1026.2682, 931.1754]
2025-05-10 19:31:25,005 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [540.0, 535.0, 472.0, 397.0, 419.0, 382.0, 407.0, 618.0, 436.0, 400.0]
2025-05-10 19:31:25,016 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 4 minutes, 18 seconds)
2025-05-10 19:34:00,113 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 19:34:06,927 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1054.03125 ± 444.278
2025-05-10 19:34:06,927 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [558.46643, 567.86487, 1099.2811, 704.49634, 1268.683, 2135.8384, 949.16705, 910.9694, 958.7434, 1386.8014]
2025-05-10 19:34:06,927 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [213.0, 216.0, 472.0, 293.0, 504.0, 918.0, 381.0, 385.0, 417.0, 615.0]
2025-05-10 19:34:06,938 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 2 minutes, 18 seconds)
2025-05-10 19:36:26,428 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 19:36:35,948 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1314.49365 ± 465.566
2025-05-10 19:36:35,948 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [900.19635, 677.8644, 1625.8567, 1761.8982, 1840.232, 2107.945, 1103.4088, 1177.9623, 758.5745, 1190.9994]
2025-05-10 19:36:35,948 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [398.0, 309.0, 706.0, 869.0, 864.0, 1000.0, 497.0, 521.0, 331.0, 506.0]
2025-05-10 19:36:35,959 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 59 minutes, 22 seconds)
2025-05-10 19:39:06,505 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 19:39:12,641 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 936.85760 ± 284.385
2025-05-10 19:39:12,642 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [687.9844, 775.6563, 955.55597, 896.76874, 874.1092, 1618.2397, 887.6378, 600.4564, 795.9087, 1276.2594]
2025-05-10 19:39:12,642 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [300.0, 335.0, 388.0, 373.0, 375.0, 732.0, 372.0, 241.0, 341.0, 567.0]
2025-05-10 19:39:12,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 56 minutes, 42 seconds)
2025-05-10 19:41:29,420 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 19:41:34,472 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 792.33411 ± 181.244
2025-05-10 19:41:34,472 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [609.5091, 880.5642, 600.91974, 750.4212, 565.3679, 1133.1622, 768.52814, 1058.2126, 709.1849, 847.47125]
2025-05-10 19:41:34,472 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [266.0, 369.0, 251.0, 315.0, 232.0, 500.0, 344.0, 447.0, 307.0, 345.0]
2025-05-10 19:41:34,484 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 53 minutes, 3 seconds)
2025-05-10 19:44:01,333 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 19:44:05,953 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 733.85968 ± 109.639
2025-05-10 19:44:05,954 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [804.7236, 599.7259, 830.63855, 632.0658, 620.9586, 635.73047, 843.10645, 892.0141, 834.8684, 644.7646]
2025-05-10 19:44:05,954 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [323.0, 269.0, 351.0, 274.0, 272.0, 259.0, 356.0, 378.0, 350.0, 275.0]
2025-05-10 19:44:05,965 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 50 minutes, 43 seconds)
2025-05-10 19:46:34,701 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 19:46:39,398 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 741.49719 ± 111.401
2025-05-10 19:46:39,399 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [612.57135, 693.00653, 841.0744, 595.21545, 829.8348, 912.74097, 773.6776, 868.6017, 660.45416, 627.7947]
2025-05-10 19:46:39,399 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [256.0, 294.0, 354.0, 259.0, 361.0, 391.0, 330.0, 366.0, 284.0, 272.0]
2025-05-10 19:46:39,410 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 47 minutes, 39 seconds)
2025-05-10 19:48:59,686 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 19:49:03,845 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 679.04755 ± 146.086
2025-05-10 19:49:03,845 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [594.61646, 729.4207, 635.6837, 552.45404, 512.2969, 581.1285, 968.8495, 615.06256, 676.236, 924.7273]
2025-05-10 19:49:03,845 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [245.0, 299.0, 261.0, 228.0, 222.0, 228.0, 391.0, 254.0, 287.0, 388.0]
2025-05-10 19:49:03,857 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 44 minutes, 52 seconds)
2025-05-10 19:51:33,579 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 19:51:38,401 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 770.89673 ± 130.422
2025-05-10 19:51:38,401 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [623.261, 919.39026, 843.88586, 664.51685, 882.3836, 937.2754, 900.5308, 614.4558, 708.22253, 615.0452]
2025-05-10 19:51:38,401 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [265.0, 370.0, 365.0, 275.0, 374.0, 390.0, 381.0, 248.0, 302.0, 266.0]
2025-05-10 19:51:38,413 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 42 minutes, 15 seconds)
2025-05-10 19:54:02,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 19:54:06,687 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 722.16937 ± 180.473
2025-05-10 19:54:06,687 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [928.2683, 337.5059, 695.5056, 619.7148, 899.9792, 540.2118, 650.58167, 796.64197, 898.2519, 855.03253]
2025-05-10 19:54:06,687 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [373.0, 144.0, 285.0, 238.0, 365.0, 202.0, 259.0, 339.0, 370.0, 352.0]
2025-05-10 19:54:06,700 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 40 minutes, 7 seconds)
2025-05-10 19:56:30,556 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 19:56:36,656 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 965.99670 ± 371.316
2025-05-10 19:56:36,656 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [494.62323, 800.40234, 681.3194, 786.55676, 1791.2843, 805.49335, 1156.1703, 1229.2085, 1276.3988, 638.5095]
2025-05-10 19:56:36,656 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [185.0, 340.0, 285.0, 337.0, 706.0, 348.0, 489.0, 511.0, 534.0, 271.0]
2025-05-10 19:56:36,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 37 minutes, 32 seconds)
2025-05-10 19:59:00,340 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 19:59:04,458 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 674.07068 ± 159.758
2025-05-10 19:59:04,458 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [644.5512, 631.58606, 420.40085, 886.64667, 610.8117, 600.13513, 1031.4147, 619.79364, 620.18634, 675.1804]
2025-05-10 19:59:04,458 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [271.0, 266.0, 188.0, 350.0, 248.0, 253.0, 418.0, 264.0, 260.0, 275.0]
2025-05-10 19:59:04,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 34 minutes, 46 seconds)
2025-05-10 20:01:31,279 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 20:01:36,328 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 831.34894 ± 177.591
2025-05-10 20:01:36,328 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [773.17456, 721.2964, 807.53827, 1001.9432, 1015.6907, 625.891, 887.48627, 676.7004, 1182.0692, 621.6993]
2025-05-10 20:01:36,328 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [323.0, 284.0, 324.0, 399.0, 427.0, 258.0, 368.0, 268.0, 464.0, 260.0]
2025-05-10 20:01:36,341 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 32 minutes, 36 seconds)
2025-05-10 20:04:01,836 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 20:04:08,317 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 990.67950 ± 295.765
2025-05-10 20:04:08,317 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [629.8539, 831.5825, 1518.3218, 801.0918, 814.543, 740.9469, 1004.7292, 1233.5667, 857.2126, 1474.9465]
2025-05-10 20:04:08,317 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [277.0, 353.0, 638.0, 352.0, 354.0, 322.0, 437.0, 520.0, 352.0, 637.0]
2025-05-10 20:04:08,330 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 29 minutes, 59 seconds)
2025-05-10 20:06:31,683 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 20:06:40,285 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1305.55347 ± 320.740
2025-05-10 20:06:40,285 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [852.7891, 1216.5938, 1135.9271, 1448.1825, 1790.843, 1769.2142, 1163.365, 1475.4818, 798.3216, 1404.8169]
2025-05-10 20:06:40,285 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [357.0, 514.0, 468.0, 597.0, 761.0, 761.0, 488.0, 625.0, 342.0, 601.0]
2025-05-10 20:06:40,298 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 27 minutes, 37 seconds)
2025-05-10 20:09:08,784 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 20:09:14,220 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 866.52130 ± 198.347
2025-05-10 20:09:14,220 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [805.43036, 885.73486, 1325.0151, 676.2041, 798.0358, 615.07745, 839.46484, 805.1805, 793.3686, 1121.701]
2025-05-10 20:09:14,220 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [351.0, 366.0, 570.0, 287.0, 323.0, 261.0, 370.0, 330.0, 321.0, 451.0]
2025-05-10 20:09:14,233 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 25 minutes, 15 seconds)
2025-05-10 20:11:36,997 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 20:11:44,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1168.14648 ± 313.838
2025-05-10 20:11:44,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [876.23724, 1083.6553, 1145.1012, 1378.2019, 1351.3446, 759.64453, 960.53534, 859.15674, 1827.7961, 1439.792]
2025-05-10 20:11:44,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [338.0, 441.0, 458.0, 554.0, 532.0, 339.0, 401.0, 350.0, 743.0, 599.0]
2025-05-10 20:11:44,345 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 22 minutes, 47 seconds)
2025-05-10 20:14:12,373 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 20:14:18,593 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1016.64221 ± 316.821
2025-05-10 20:14:18,594 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1130.6869, 902.4485, 775.5931, 962.608, 1719.0117, 692.1751, 879.3121, 599.7009, 1168.2462, 1336.6392]
2025-05-10 20:14:18,594 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [457.0, 350.0, 314.0, 392.0, 702.0, 267.0, 343.0, 233.0, 484.0, 530.0]
2025-05-10 20:14:18,607 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 20 minutes, 19 seconds)
2025-05-10 20:16:54,722 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 20:17:04,515 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1497.40991 ± 705.983
2025-05-10 20:17:04,515 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [693.0989, 2454.047, 848.3187, 816.65955, 1700.5472, 1111.7039, 889.1484, 1477.9071, 2515.206, 2467.463]
2025-05-10 20:17:04,515 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [278.0, 1000.0, 350.0, 339.0, 681.0, 462.0, 358.0, 597.0, 1000.0, 1000.0]
2025-05-10 20:17:04,529 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 18 minutes, 6 seconds)
2025-05-10 20:19:18,266 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 20:19:27,656 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1433.06848 ± 626.843
2025-05-10 20:19:27,656 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [823.38055, 1206.8912, 2432.6929, 1665.597, 830.0628, 1660.7009, 2380.8699, 1820.6716, 941.10046, 568.7171]
2025-05-10 20:19:27,656 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [323.0, 517.0, 972.0, 685.0, 332.0, 699.0, 1000.0, 756.0, 384.0, 228.0]
2025-05-10 20:19:27,670 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 15 minutes, 20 seconds)
2025-05-10 20:21:57,265 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 20:22:02,069 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 809.36658 ± 304.816
2025-05-10 20:22:02,069 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [844.3416, 500.54535, 1618.0797, 765.23425, 538.22327, 824.9211, 652.20105, 564.30066, 938.13293, 847.6864]
2025-05-10 20:22:02,069 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [325.0, 192.0, 628.0, 331.0, 209.0, 328.0, 274.0, 234.0, 360.0, 333.0]
2025-05-10 20:22:02,083 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 12 minutes, 47 seconds)
2025-05-10 20:24:23,256 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 20:24:38,091 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 2005.76294 ± 341.302
2025-05-10 20:24:38,091 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2323.816, 2225.085, 2146.8098, 1423.5303, 2203.5955, 2225.0913, 2169.5552, 1661.6215, 1409.8605, 2268.6643]
2025-05-10 20:24:38,092 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 571.0, 1000.0, 1000.0, 1000.0, 706.0, 639.0, 1000.0]
2025-05-10 20:24:38,092 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (2005.76) for latency MM1Queue_a033_s075
2025-05-10 20:24:38,092 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-10 20:24:38,095 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 20:24:38,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 10 minutes, 19 seconds)
2025-05-10 20:27:06,026 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 20:27:16,081 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1490.51538 ± 623.142
2025-05-10 20:27:16,082 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1140.734, 864.58887, 2359.6182, 1199.8015, 2450.3833, 939.6174, 2109.1116, 1856.9578, 1373.3555, 610.9861]
2025-05-10 20:27:16,082 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [470.0, 334.0, 1000.0, 502.0, 1000.0, 394.0, 1000.0, 748.0, 547.0, 245.0]
2025-05-10 20:27:16,096 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 7 minutes, 46 seconds)
2025-05-10 20:29:41,737 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 20:29:52,034 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1540.27173 ± 894.448
2025-05-10 20:29:52,034 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [842.7882, 659.7937, 2537.774, 2398.4, 2430.7646, 445.24588, 2311.8206, 901.6508, 439.53418, 2434.9443]
2025-05-10 20:29:52,034 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [329.0, 261.0, 1000.0, 1000.0, 1000.0, 178.0, 1000.0, 354.0, 172.0, 1000.0]
2025-05-10 20:29:52,049 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 5 minutes, 7 seconds)
2025-05-10 20:32:19,249 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 20:32:28,541 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1417.68335 ± 482.320
2025-05-10 20:32:28,541 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [968.6347, 1328.3, 1331.1643, 2232.0356, 910.0901, 1564.92, 2352.441, 969.77875, 1405.3741, 1114.0948]
2025-05-10 20:32:28,542 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [398.0, 505.0, 512.0, 1000.0, 362.0, 629.0, 1000.0, 404.0, 569.0, 485.0]
2025-05-10 20:32:28,556 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 36 seconds)
2025-05-10 20:34:51,384 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 20:35:01,602 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1544.81262 ± 767.555
2025-05-10 20:35:01,602 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2489.686, 904.2218, 827.1278, 922.66235, 1428.7855, 2438.4539, 2565.2253, 2263.3499, 1135.8405, 472.77298]
2025-05-10 20:35:01,602 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 348.0, 360.0, 369.0, 601.0, 1000.0, 1000.0, 1000.0, 460.0, 184.0]
2025-05-10 20:35:01,617 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1251 [DEBUG]: Training session finished
