2025-05-11 20:37:21,862 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem4
2025-05-11 20:37:21,862 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem4
2025-05-11 20:37:21,863 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x7bf16fe40f70>}
2025-05-11 20:37:21,863 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1111 [DEBUG]: using device: cpu
2025-05-11 20:37:21,863 baseline-sac-noisy-walker2d:77 [WARNING]: args.memorize_actions != args.horizon: 4 != 24
2025-05-11 20:37:21,872 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1133 [INFO]: Creating new trainer
2025-05-11 20:37:21,882 baseline-sac-noisy-walker2d:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=41, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-05-11 20:37:21,882 baseline-sac-noisy-walker2d:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=47, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-11 20:37:22,081 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1194 [DEBUG]: Starting training session...
2025-05-11 20:37:22,082 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 1/100
2025-05-11 20:39:53,496 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:39:54,766 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: -14.57052 ± 8.884
2025-05-11 20:39:54,766 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [-16.301706, -21.747324, -24.053484, -5.61412, -19.301777, -11.017665, -5.6043167, -28.39735, 1.4814407, -15.148884]
2025-05-11 20:39:54,766 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [91.0, 114.0, 93.0, 111.0, 128.0, 104.0, 105.0, 98.0, 120.0, 70.0]
2025-05-11 20:39:54,766 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (-14.57) for latency MM1Queue_a033_s075
2025-05-11 20:39:54,767 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 20:39:54,771 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 20:39:54,776 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 11 minutes, 56 seconds)
2025-05-11 20:42:40,666 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:42:42,287 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 60.84432 ± 86.299
2025-05-11 20:42:42,287 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [87.725975, 254.61331, 44.620815, -2.6855447, 5.166, -11.675959, 4.963322, 182.84508, -13.460539, 56.330727]
2025-05-11 20:42:42,287 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [70.0, 146.0, 85.0, 138.0, 125.0, 116.0, 121.0, 185.0, 147.0, 204.0]
2025-05-11 20:42:42,287 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (60.84) for latency MM1Queue_a033_s075
2025-05-11 20:42:42,287 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 20:42:42,293 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 20:42:42,299 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 21 minutes, 30 seconds)
2025-05-11 20:45:30,442 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:45:31,837 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 62.47016 ± 76.137
2025-05-11 20:45:31,837 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [49.52337, 23.328732, -3.3926122, 15.961071, 272.9509, 38.74665, 18.84057, 65.34512, 109.25566, 34.142124]
2025-05-11 20:45:31,837 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [87.0, 73.0, 22.0, 203.0, 175.0, 55.0, 90.0, 137.0, 172.0, 86.0]
2025-05-11 20:45:31,837 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (62.47) for latency MM1Queue_a033_s075
2025-05-11 20:45:31,837 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 20:45:31,842 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 20:45:31,848 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 23 minutes, 55 seconds)
2025-05-11 20:48:18,846 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:48:21,414 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 165.03244 ± 183.934
2025-05-11 20:48:21,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [21.95493, 96.441444, 583.8837, 43.0977, 78.247894, 161.60349, 48.540203, 110.94021, 456.9986, 48.616188]
2025-05-11 20:48:21,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [34.0, 202.0, 475.0, 130.0, 81.0, 259.0, 124.0, 259.0, 320.0, 110.0]
2025-05-11 20:48:21,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (165.03) for latency MM1Queue_a033_s075
2025-05-11 20:48:21,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 20:48:21,419 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 20:48:21,425 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 23 minutes, 44 seconds)
2025-05-11 20:51:04,530 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:51:06,756 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 181.10461 ± 139.378
2025-05-11 20:51:06,756 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [38.649002, 400.1425, 288.8872, 355.49994, 306.71188, 174.16641, 71.05654, 144.20984, 10.710093, 21.012724]
2025-05-11 20:51:06,757 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [79.0, 483.0, 176.0, 215.0, 174.0, 141.0, 169.0, 184.0, 96.0, 33.0]
2025-05-11 20:51:06,757 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (181.10) for latency MM1Queue_a033_s075
2025-05-11 20:51:06,757 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 20:51:06,761 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 20:51:06,767 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 21 minutes, 9 seconds)
2025-05-11 20:53:56,661 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:53:59,460 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 251.08633 ± 181.122
2025-05-11 20:53:59,460 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [43.448444, 252.54163, 74.086, 292.56378, 89.856346, 236.37447, 84.2098, 369.4768, 642.68585, 425.6201]
2025-05-11 20:53:59,460 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [61.0, 198.0, 111.0, 197.0, 155.0, 332.0, 116.0, 296.0, 359.0, 305.0]
2025-05-11 20:53:59,460 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (251.09) for latency MM1Queue_a033_s075
2025-05-11 20:53:59,461 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 20:53:59,464 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 20:53:59,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 24 minutes, 40 seconds)
2025-05-11 20:56:50,577 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:56:52,605 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 176.75439 ± 137.295
2025-05-11 20:56:52,605 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [154.18408, 308.77982, 58.687588, 43.278866, 255.48598, 380.10675, 81.72352, 394.8658, 49.99509, 40.43652]
2025-05-11 20:56:52,605 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [202.0, 183.0, 90.0, 149.0, 142.0, 251.0, 98.0, 271.0, 97.0, 86.0]
2025-05-11 20:56:52,607 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 23 minutes, 35 seconds)
2025-05-11 20:59:39,126 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:59:42,545 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 273.27899 ± 260.464
2025-05-11 20:59:42,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [252.13446, 987.73944, 269.92148, 97.443306, 85.689285, 110.14581, 309.5018, 219.20137, 27.368402, 373.64465]
2025-05-11 20:59:42,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [142.0, 1000.0, 155.0, 146.0, 135.0, 171.0, 176.0, 157.0, 68.0, 221.0]
2025-05-11 20:59:42,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (273.28) for latency MM1Queue_a033_s075
2025-05-11 20:59:42,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 20:59:42,550 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 20:59:42,558 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 20 minutes, 53 seconds)
2025-05-11 21:02:30,641 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 21:02:32,533 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 153.68436 ± 126.534
2025-05-11 21:02:32,533 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [325.87796, 146.54286, 390.08774, 46.61022, 68.07438, 61.09775, 292.86877, 11.933797, 121.71052, 72.039474]
2025-05-11 21:02:32,533 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [177.0, 76.0, 262.0, 109.0, 130.0, 120.0, 170.0, 90.0, 184.0, 132.0]
2025-05-11 21:02:32,536 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 18 minutes, 10 seconds)
2025-05-11 21:05:22,884 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 21:05:28,110 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 473.44247 ± 323.341
2025-05-11 21:05:28,111 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [167.5422, 75.34609, 403.95056, 999.461, 489.75665, 5.498519, 475.42593, 598.0747, 521.70087, 997.6681]
2025-05-11 21:05:28,111 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [88.0, 98.0, 188.0, 1000.0, 270.0, 19.0, 241.0, 282.0, 256.0, 1000.0]
2025-05-11 21:05:28,111 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (473.44) for latency MM1Queue_a033_s075
2025-05-11 21:05:28,111 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 21:05:28,116 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 21:05:28,123 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 18 minutes, 24 seconds)
2025-05-11 21:08:17,719 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 21:08:23,536 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 777.21674 ± 473.603
2025-05-11 21:08:23,536 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1621.7972, 601.688, 949.25867, 11.219775, 671.1771, 863.8449, 979.0518, 70.97094, 1335.9232, 667.23553]
2025-05-11 21:08:23,537 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [772.0, 362.0, 485.0, 19.0, 359.0, 468.0, 503.0, 116.0, 638.0, 376.0]
2025-05-11 21:08:23,537 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (777.22) for latency MM1Queue_a033_s075
2025-05-11 21:08:23,537 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 21:08:23,541 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 21:08:23,549 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 16 minutes, 20 seconds)
2025-05-11 21:11:16,584 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 21:11:21,267 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 387.79688 ± 362.027
2025-05-11 21:11:21,267 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [389.67215, 7.4995346, 894.067, 357.245, 48.885204, 715.86365, 1016.28064, 18.24212, -2.915584, 433.1291]
2025-05-11 21:11:21,267 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [233.0, 19.0, 536.0, 211.0, 65.0, 755.0, 1000.0, 38.0, 23.0, 249.0]
2025-05-11 21:11:21,270 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 14 minutes, 48 seconds)
2025-05-11 21:14:07,293 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 21:14:11,218 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 466.29865 ± 123.769
2025-05-11 21:14:11,218 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [610.2902, 293.7935, 476.45856, 452.056, 196.79504, 499.7818, 615.5806, 476.07086, 509.8097, 532.3502]
2025-05-11 21:14:11,219 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [384.0, 249.0, 267.0, 241.0, 139.0, 317.0, 376.0, 298.0, 336.0, 317.0]
2025-05-11 21:14:11,222 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 11 minutes, 54 seconds)
2025-05-11 21:16:59,763 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 21:17:02,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 398.14868 ± 155.209
2025-05-11 21:17:02,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [469.55762, 22.937786, 411.19385, 413.76596, 497.28552, 524.7042, 476.63635, 460.50354, 187.32768, 517.5741]
2025-05-11 21:17:02,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [264.0, 44.0, 257.0, 236.0, 277.0, 287.0, 271.0, 250.0, 243.0, 285.0]
2025-05-11 21:17:02,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 9 minutes, 31 seconds)
2025-05-11 21:19:55,475 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 21:19:59,153 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 500.92236 ± 59.465
2025-05-11 21:19:59,153 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [495.76233, 370.09097, 526.99414, 530.3348, 542.93884, 400.8391, 528.06726, 539.03705, 540.1185, 535.0405]
2025-05-11 21:19:59,153 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [269.0, 204.0, 282.0, 306.0, 301.0, 217.0, 287.0, 303.0, 286.0, 281.0]
2025-05-11 21:19:59,157 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 6 minutes, 47 seconds)
2025-05-11 21:22:48,852 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 21:22:52,244 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 417.83185 ± 234.812
2025-05-11 21:22:52,244 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [722.1714, 176.24057, 467.65427, 44.45563, 693.1616, 553.6118, 50.210407, 435.5267, 451.0844, 584.2016]
2025-05-11 21:22:52,244 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [381.0, 237.0, 260.0, 48.0, 365.0, 359.0, 65.0, 237.0, 250.0, 306.0]
2025-05-11 21:22:52,249 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 3 minutes, 14 seconds)
2025-05-11 21:25:42,354 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 21:25:47,407 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 657.53723 ± 390.478
2025-05-11 21:25:47,407 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [727.4488, 825.74713, 464.91327, 16.99861, 597.57904, 815.9336, 1003.8498, 314.74805, 331.15833, 1476.9958]
2025-05-11 21:25:47,407 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [395.0, 452.0, 240.0, 38.0, 322.0, 506.0, 581.0, 150.0, 166.0, 834.0]
2025-05-11 21:25:47,411 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 3 hours, 59 minutes, 37 seconds)
2025-05-11 21:28:57,015 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 21:29:04,084 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 699.58752 ± 317.775
2025-05-11 21:29:04,084 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [764.2559, 1072.085, 1136.4913, 582.86444, 847.5421, 407.1968, 650.34296, 13.59458, 567.7675, 953.7342]
2025-05-11 21:29:04,084 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [414.0, 592.0, 767.0, 304.0, 462.0, 207.0, 333.0, 40.0, 295.0, 560.0]
2025-05-11 21:29:04,090 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 4 minutes, 3 seconds)
2025-05-11 21:32:34,674 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 21:32:38,391 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 409.19437 ± 245.336
2025-05-11 21:32:38,391 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [626.3563, 73.13005, 534.5736, 611.3298, 787.4356, 99.652596, 282.9101, 430.59402, 83.12174, 562.83984]
2025-05-11 21:32:38,391 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [351.0, 197.0, 315.0, 299.0, 400.0, 152.0, 149.0, 246.0, 157.0, 309.0]
2025-05-11 21:32:38,395 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 12 minutes, 33 seconds)
2025-05-11 21:35:27,217 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 21:35:30,906 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 488.65372 ± 471.101
2025-05-11 21:35:30,906 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [53.617813, -0.96030974, 912.20135, 716.13916, 1196.9341, 246.46518, 1217.9445, 4.1792545, 531.47723, 8.53902]
2025-05-11 21:35:30,906 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [99.0, 24.0, 457.0, 339.0, 603.0, 130.0, 629.0, 61.0, 271.0, 17.0]
2025-05-11 21:35:30,911 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 4 hours, 8 minutes, 28 seconds)
2025-05-11 21:38:23,739 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 21:38:27,197 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 445.60074 ± 382.048
2025-05-11 21:38:27,197 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [877.5327, 674.8576, 5.8421464, 74.21327, 64.8381, 21.197025, 620.17975, 847.10626, 1022.772, 247.46815]
2025-05-11 21:38:27,197 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [472.0, 343.0, 15.0, 72.0, 66.0, 63.0, 320.0, 441.0, 570.0, 129.0]
2025-05-11 21:38:27,203 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 4 hours, 6 minutes, 12 seconds)
2025-05-11 21:41:35,865 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 21:41:39,591 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 540.31213 ± 161.571
2025-05-11 21:41:39,591 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [596.73865, 609.6337, 493.86432, 655.7558, 546.31195, 596.56836, 554.4207, 84.60165, 701.41376, 563.8123]
2025-05-11 21:41:39,591 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [287.0, 290.0, 234.0, 314.0, 254.0, 284.0, 268.0, 149.0, 397.0, 276.0]
2025-05-11 21:41:39,596 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 4 hours, 7 minutes, 34 seconds)
2025-05-11 21:44:27,960 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 21:44:31,880 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 533.15009 ± 180.436
2025-05-11 21:44:31,881 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [634.52795, 291.89996, 546.96, 681.0468, 92.34836, 552.6191, 648.204, 635.18835, 619.03357, 629.67285]
2025-05-11 21:44:31,881 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [332.0, 140.0, 269.0, 344.0, 180.0, 284.0, 337.0, 356.0, 311.0, 325.0]
2025-05-11 21:44:31,886 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 58 minutes, 8 seconds)
2025-05-11 21:47:17,530 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 21:47:21,566 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 569.46497 ± 146.241
2025-05-11 21:47:21,566 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [607.9693, 705.0901, 589.4035, 548.8301, 606.1976, 602.07605, 614.9405, 687.61426, 580.90826, 151.62083]
2025-05-11 21:47:21,566 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [306.0, 392.0, 344.0, 267.0, 293.0, 304.0, 313.0, 376.0, 285.0, 84.0]
2025-05-11 21:47:21,572 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 43 minutes, 44 seconds)
2025-05-11 21:50:09,811 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 21:50:16,897 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 915.00519 ± 212.130
2025-05-11 21:50:16,897 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [972.1164, 911.85297, 1182.371, 732.384, 880.16296, 673.99493, 1386.1317, 898.08765, 827.4481, 685.50226]
2025-05-11 21:50:16,897 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [529.0, 516.0, 652.0, 387.0, 484.0, 321.0, 755.0, 499.0, 446.0, 363.0]
2025-05-11 21:50:16,897 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (915.01) for latency MM1Queue_a033_s075
2025-05-11 21:50:16,898 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 21:50:16,902 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 21:50:16,912 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 41 minutes, 30 seconds)
2025-05-11 21:53:05,682 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 21:53:09,398 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 518.09045 ± 241.760
2025-05-11 21:53:09,398 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [619.61163, 200.11778, 896.1253, 741.24384, 624.1878, 650.3705, 575.7361, 354.67386, 49.94943, 468.8883]
2025-05-11 21:53:09,398 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [321.0, 167.0, 460.0, 399.0, 295.0, 314.0, 319.0, 197.0, 48.0, 240.0]
2025-05-11 21:53:09,404 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 37 minutes, 36 seconds)
2025-05-11 21:55:57,719 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 21:56:01,488 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 505.65240 ± 175.605
2025-05-11 21:56:01,488 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [536.06067, 462.1705, 512.44147, 571.5752, 11.918546, 716.24457, 557.5567, 558.2276, 580.96796, 549.361]
2025-05-11 21:56:01,488 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [271.0, 330.0, 253.0, 297.0, 108.0, 366.0, 285.0, 303.0, 315.0, 281.0]
2025-05-11 21:56:01,494 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 29 minutes, 43 seconds)
2025-05-11 21:58:48,067 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 21:58:50,945 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 380.00372 ± 264.801
2025-05-11 21:58:50,946 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [607.853, 228.47934, 704.8225, 15.423819, 706.5748, 38.38895, 446.72656, 475.77124, 546.8351, 29.161823]
2025-05-11 21:58:50,946 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [319.0, 124.0, 388.0, 33.0, 380.0, 54.0, 269.0, 259.0, 281.0, 40.0]
2025-05-11 21:58:50,952 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 26 minutes, 10 seconds)
2025-05-11 22:01:39,290 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 22:01:43,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 567.10657 ± 57.862
2025-05-11 22:01:43,236 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [586.2957, 592.97345, 586.9965, 621.75885, 399.8771, 589.0317, 589.05414, 574.77985, 559.4171, 570.88135]
2025-05-11 22:01:43,236 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [315.0, 346.0, 305.0, 313.0, 208.0, 303.0, 300.0, 306.0, 314.0, 297.0]
2025-05-11 22:01:43,242 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 23 minutes, 55 seconds)
2025-05-11 22:04:28,679 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 22:04:34,439 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 747.12976 ± 263.486
2025-05-11 22:04:34,439 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [213.587, 615.16327, 601.95966, 614.2387, 752.5763, 773.0985, 789.0805, 1287.9714, 957.73785, 865.88434]
2025-05-11 22:04:34,439 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [301.0, 310.0, 286.0, 315.0, 395.0, 421.0, 434.0, 691.0, 587.0, 480.0]
2025-05-11 22:04:34,445 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 20 minutes, 5 seconds)
2025-05-11 22:07:24,431 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 22:07:29,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 703.27075 ± 335.833
2025-05-11 22:07:29,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [207.22487, 75.61141, 718.6622, 968.2867, 1159.5696, 620.0809, 719.85565, 1140.3824, 784.8046, 638.229]
2025-05-11 22:07:29,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [306.0, 109.0, 351.0, 513.0, 645.0, 305.0, 396.0, 602.0, 395.0, 316.0]
2025-05-11 22:07:29,791 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 17 minutes, 53 seconds)
2025-05-11 22:10:14,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 22:10:19,692 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 811.57214 ± 144.559
2025-05-11 22:10:19,692 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [791.3748, 531.04626, 708.31854, 814.7865, 909.75616, 866.60126, 745.5274, 1089.7003, 939.27246, 719.3374]
2025-05-11 22:10:19,692 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [400.0, 299.0, 363.0, 416.0, 469.0, 434.0, 344.0, 558.0, 519.0, 372.0]
2025-05-11 22:10:19,699 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 14 minutes, 31 seconds)
2025-05-11 22:13:11,375 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 22:13:17,713 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 808.04718 ± 388.406
2025-05-11 22:13:17,713 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1031.6505, 94.03839, 671.6188, 1009.3938, 1216.8219, 1248.4985, 657.1264, 994.1, 150.90538, 1006.3181]
2025-05-11 22:13:17,714 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [571.0, 174.0, 352.0, 538.0, 659.0, 678.0, 341.0, 528.0, 198.0, 566.0]
2025-05-11 22:13:17,720 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 13 minutes, 34 seconds)
2025-05-11 22:16:01,835 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 22:16:08,591 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 868.42072 ± 385.625
2025-05-11 22:16:08,592 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [4.735629, 1279.58, 771.80273, 969.32587, 880.53296, 1124.2114, 1251.4957, 384.7346, 826.1284, 1191.6594]
2025-05-11 22:16:08,592 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [14.0, 693.0, 394.0, 599.0, 477.0, 636.0, 696.0, 249.0, 439.0, 630.0]
2025-05-11 22:16:08,598 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 10 minutes, 22 seconds)
2025-05-11 22:18:54,461 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 22:18:58,214 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 443.45786 ± 391.005
2025-05-11 22:18:58,214 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [694.75726, 109.16318, 601.4557, 1087.8346, 1138.0226, 281.65228, 114.44782, 97.24575, 82.09515, 227.9043]
2025-05-11 22:18:58,214 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [344.0, 152.0, 284.0, 587.0, 575.0, 147.0, 162.0, 153.0, 129.0, 274.0]
2025-05-11 22:18:58,221 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 7 minutes, 9 seconds)
2025-05-11 22:21:47,444 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 22:21:51,202 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 568.78796 ± 213.138
2025-05-11 22:21:51,202 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [657.41095, 66.91831, 832.63086, 639.62305, 608.12085, 283.4962, 649.9535, 579.3283, 645.3151, 725.082]
2025-05-11 22:21:51,202 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [324.0, 137.0, 428.0, 308.0, 286.0, 125.0, 312.0, 262.0, 314.0, 384.0]
2025-05-11 22:21:51,209 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 3 minutes, 46 seconds)
2025-05-11 22:24:38,332 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 22:24:42,578 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 658.76306 ± 63.378
2025-05-11 22:24:42,578 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [603.207, 643.8392, 653.45233, 698.1099, 816.32074, 610.69446, 589.19696, 704.5241, 634.88293, 633.4031]
2025-05-11 22:24:42,578 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [298.0, 311.0, 312.0, 364.0, 407.0, 301.0, 293.0, 328.0, 309.0, 309.0]
2025-05-11 22:24:42,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 1 minute, 12 seconds)
2025-05-11 22:27:26,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 22:27:31,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 620.13483 ± 100.910
2025-05-11 22:27:31,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [389.33188, 562.103, 732.7214, 580.0874, 758.93494, 726.25757, 610.7862, 624.3582, 609.58185, 607.18555]
2025-05-11 22:27:31,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [189.0, 277.0, 365.0, 291.0, 368.0, 399.0, 294.0, 308.0, 290.0, 301.0]
2025-05-11 22:27:31,028 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 56 minutes, 21 seconds)
2025-05-11 22:30:16,132 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 22:30:19,681 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 558.54999 ± 91.675
2025-05-11 22:30:19,682 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [545.09674, 560.73, 611.78143, 609.0058, 542.96454, 605.37695, 295.27646, 592.68787, 610.42596, 612.15405]
2025-05-11 22:30:19,682 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [269.0, 274.0, 304.0, 304.0, 238.0, 298.0, 141.0, 298.0, 328.0, 278.0]
2025-05-11 22:30:19,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 53 minutes, 3 seconds)
2025-05-11 22:33:05,848 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 22:33:10,946 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 734.23792 ± 141.955
2025-05-11 22:33:10,947 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [651.5723, 823.29895, 910.66174, 712.1007, 662.83826, 1043.7386, 611.09674, 666.3577, 710.6304, 550.08325]
2025-05-11 22:33:10,947 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [343.0, 442.0, 496.0, 369.0, 363.0, 564.0, 310.0, 364.0, 389.0, 279.0]
2025-05-11 22:33:10,954 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 50 minutes, 32 seconds)
2025-05-11 22:35:54,766 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 22:35:59,033 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 623.31848 ± 188.883
2025-05-11 22:35:59,034 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [667.153, 76.74043, 644.8394, 620.5511, 821.1059, 680.7497, 685.0394, 675.3501, 687.0241, 674.6313]
2025-05-11 22:35:59,034 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [341.0, 112.0, 327.0, 312.0, 414.0, 333.0, 347.0, 340.0, 369.0, 343.0]
2025-05-11 22:35:59,042 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 46 minutes, 44 seconds)
2025-05-11 22:38:53,259 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 22:38:59,158 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 723.00397 ± 65.308
2025-05-11 22:38:59,159 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [676.92633, 693.41656, 678.2702, 731.1908, 712.22144, 631.62537, 742.3122, 754.3767, 720.104, 889.5964]
2025-05-11 22:38:59,159 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [349.0, 367.0, 350.0, 369.0, 359.0, 318.0, 384.0, 416.0, 372.0, 457.0]
2025-05-11 22:38:59,169 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 45 minutes, 36 seconds)
2025-05-11 22:42:14,009 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 22:42:18,910 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 749.73340 ± 129.053
2025-05-11 22:42:18,910 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [677.8166, 1011.5048, 728.7526, 748.91266, 970.8202, 642.07886, 761.824, 607.3696, 680.96857, 667.2855]
2025-05-11 22:42:18,910 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [317.0, 542.0, 338.0, 358.0, 514.0, 335.0, 405.0, 300.0, 348.0, 323.0]
2025-05-11 22:42:18,918 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 48 minutes, 41 seconds)
2025-05-11 22:45:11,856 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 22:45:18,161 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 739.85657 ± 29.436
2025-05-11 22:45:18,161 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [781.07825, 737.50214, 764.04565, 679.4898, 742.37537, 728.0025, 723.8189, 778.7263, 750.26776, 713.2589]
2025-05-11 22:45:18,161 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [363.0, 385.0, 366.0, 328.0, 364.0, 363.0, 372.0, 407.0, 369.0, 343.0]
2025-05-11 22:45:18,173 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 47 minutes, 43 seconds)
2025-05-11 22:48:23,398 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 22:48:27,306 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 643.34705 ± 79.979
2025-05-11 22:48:27,306 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [667.2782, 652.9167, 583.2589, 425.29144, 673.0794, 676.6793, 722.6491, 691.56885, 676.402, 664.34656]
2025-05-11 22:48:27,306 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [320.0, 301.0, 278.0, 186.0, 319.0, 311.0, 343.0, 307.0, 335.0, 324.0]
2025-05-11 22:48:27,314 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 47 minutes, 59 seconds)
2025-05-11 22:51:15,627 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 22:51:23,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 930.39581 ± 368.903
2025-05-11 22:51:23,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [627.8951, 1600.8716, 764.3395, 654.29034, 927.1144, 1027.6244, 1651.1847, 709.0341, 630.43036, 711.1735]
2025-05-11 22:51:23,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [321.0, 1000.0, 404.0, 329.0, 485.0, 594.0, 1000.0, 381.0, 320.0, 353.0]
2025-05-11 22:51:23,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (930.40) for latency MM1Queue_a033_s075
2025-05-11 22:51:23,338 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 22:51:23,341 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 22:51:23,356 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 46 minutes, 22 seconds)
2025-05-11 22:54:19,043 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 22:54:23,877 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 742.32825 ± 96.159
2025-05-11 22:54:23,878 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [812.82666, 747.98706, 830.466, 695.5575, 848.22217, 748.20215, 694.45953, 494.61697, 789.80066, 761.1441]
2025-05-11 22:54:23,878 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [387.0, 374.0, 384.0, 317.0, 393.0, 351.0, 322.0, 238.0, 384.0, 360.0]
2025-05-11 22:54:23,887 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 43 minutes, 22 seconds)
2025-05-11 22:57:08,478 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 22:57:12,773 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 686.66418 ± 86.423
2025-05-11 22:57:12,773 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [682.0701, 721.7739, 594.8475, 687.0308, 804.0526, 729.932, 767.1093, 644.98895, 743.02716, 491.80887]
2025-05-11 22:57:12,773 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [318.0, 329.0, 274.0, 299.0, 377.0, 331.0, 344.0, 303.0, 362.0, 225.0]
2025-05-11 22:57:12,783 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 34 minutes, 56 seconds)
2025-05-11 23:00:05,214 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 23:00:09,155 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 602.34534 ± 90.574
2025-05-11 23:00:09,155 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [634.51984, 662.89734, 428.4938, 579.60486, 464.7947, 648.0405, 670.26404, 615.2629, 744.0696, 575.5063]
2025-05-11 23:00:09,155 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [303.0, 321.0, 212.0, 269.0, 237.0, 324.0, 325.0, 306.0, 349.0, 276.0]
2025-05-11 23:00:09,166 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 31 minutes, 28 seconds)
2025-05-11 23:02:56,610 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 23:03:01,621 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 726.58923 ± 182.975
2025-05-11 23:03:01,621 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [656.54755, 525.97675, 688.9032, 650.1362, 627.42456, 1091.4504, 663.9825, 1069.3204, 595.704, 696.44666]
2025-05-11 23:03:01,621 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [311.0, 254.0, 310.0, 306.0, 303.0, 592.0, 318.0, 577.0, 286.0, 354.0]
2025-05-11 23:03:01,632 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 25 minutes, 43 seconds)
2025-05-11 23:05:59,417 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 23:06:04,089 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 684.56750 ± 67.829
2025-05-11 23:06:04,089 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [634.67554, 666.4108, 625.07715, 751.16406, 607.4561, 727.7716, 661.98065, 845.157, 672.2375, 653.7447]
2025-05-11 23:06:04,089 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [302.0, 305.0, 285.0, 346.0, 274.0, 376.0, 286.0, 446.0, 317.0, 304.0]
2025-05-11 23:06:04,101 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 23 minutes, 51 seconds)
2025-05-11 23:08:58,353 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 23:09:02,938 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 684.35413 ± 113.681
2025-05-11 23:09:02,938 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [552.05237, 744.633, 533.97656, 942.29553, 606.0846, 675.2494, 697.77576, 682.08563, 628.1181, 781.2707]
2025-05-11 23:09:02,938 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [266.0, 346.0, 266.0, 432.0, 278.0, 317.0, 310.0, 303.0, 314.0, 368.0]
2025-05-11 23:09:02,949 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 20 minutes, 38 seconds)
2025-05-11 23:11:52,682 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 23:11:56,936 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 723.81311 ± 64.760
2025-05-11 23:11:56,937 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [648.07947, 691.9182, 731.58606, 711.4978, 717.93335, 772.1459, 696.6888, 838.95264, 810.6943, 618.63495]
2025-05-11 23:11:56,937 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [321.0, 322.0, 338.0, 331.0, 327.0, 352.0, 318.0, 375.0, 384.0, 288.0]
2025-05-11 23:11:56,946 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 18 minutes, 31 seconds)
2025-05-11 23:14:41,427 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 23:14:46,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 810.29993 ± 88.830
2025-05-11 23:14:46,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [844.2658, 897.16486, 966.3211, 843.5574, 829.0271, 768.69226, 714.0651, 805.252, 807.05884, 627.59467]
2025-05-11 23:14:46,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [375.0, 391.0, 404.0, 364.0, 348.0, 344.0, 331.0, 362.0, 345.0, 258.0]
2025-05-11 23:14:46,074 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 14 minutes, 27 seconds)
2025-05-11 23:17:35,240 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 23:17:39,226 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 627.41467 ± 42.345
2025-05-11 23:17:39,226 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [607.3016, 721.24347, 642.2873, 599.2453, 634.575, 598.3376, 648.3147, 656.7045, 611.2008, 554.9361]
2025-05-11 23:17:39,226 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [273.0, 399.0, 298.0, 274.0, 293.0, 270.0, 307.0, 318.0, 284.0, 246.0]
2025-05-11 23:17:39,237 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 11 minutes, 38 seconds)
2025-05-11 23:20:29,892 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 23:20:35,107 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 861.03192 ± 148.153
2025-05-11 23:20:35,107 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [560.7047, 909.262, 807.5438, 958.8385, 913.2831, 1118.59, 981.89215, 875.2416, 706.7137, 778.2495]
2025-05-11 23:20:35,107 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [264.0, 432.0, 366.0, 414.0, 414.0, 449.0, 457.0, 414.0, 342.0, 371.0]
2025-05-11 23:20:35,118 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 7 minutes, 44 seconds)
2025-05-11 23:23:18,727 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 23:23:23,981 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 868.16974 ± 229.793
2025-05-11 23:23:23,982 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [616.85803, 1302.0112, 751.44696, 1196.1398, 1109.045, 752.2317, 735.95715, 770.2177, 811.40436, 636.3849]
2025-05-11 23:23:23,982 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [286.0, 617.0, 337.0, 554.0, 471.0, 337.0, 329.0, 330.0, 368.0, 301.0]
2025-05-11 23:23:23,993 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 3 minutes, 24 seconds)
2025-05-11 23:26:14,304 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 23:26:21,231 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1005.60217 ± 362.438
2025-05-11 23:26:21,231 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1234.0864, 527.6987, 703.6591, 690.95667, 1055.8601, 1114.6478, 1733.8384, 729.698, 824.871, 1440.7057]
2025-05-11 23:26:21,232 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [573.0, 390.0, 321.0, 300.0, 481.0, 574.0, 781.0, 560.0, 359.0, 655.0]
2025-05-11 23:26:21,232 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (1005.60) for latency MM1Queue_a033_s075
2025-05-11 23:26:21,232 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 23:26:21,236 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 23:26:21,252 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 1 minute)
2025-05-11 23:29:07,399 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 23:29:14,853 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1136.94128 ± 337.461
2025-05-11 23:29:14,854 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [901.57666, 1224.8661, 1344.6385, 491.21735, 1431.7979, 1309.6832, 864.0457, 944.21844, 1099.3287, 1758.0403]
2025-05-11 23:29:14,854 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [386.0, 604.0, 587.0, 210.0, 674.0, 609.0, 397.0, 408.0, 515.0, 760.0]
2025-05-11 23:29:14,854 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (1136.94) for latency MM1Queue_a033_s075
2025-05-11 23:29:14,854 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 23:29:14,859 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 23:29:14,875 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 58 minutes, 44 seconds)
2025-05-11 23:32:10,265 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 23:32:15,915 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 956.28369 ± 274.419
2025-05-11 23:32:15,915 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [923.8292, 673.2056, 720.6154, 955.69934, 889.1729, 819.2483, 1223.2461, 1662.069, 856.25226, 839.4986]
2025-05-11 23:32:15,916 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [390.0, 311.0, 301.0, 419.0, 375.0, 355.0, 527.0, 656.0, 378.0, 375.0]
2025-05-11 23:32:15,927 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 56 minutes, 53 seconds)
2025-05-11 23:35:01,009 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 23:35:11,288 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1341.79712 ± 400.847
2025-05-11 23:35:11,288 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1120.8008, 1064.0951, 1650.17, 1229.3041, 1771.1807, 828.108, 857.3537, 1941.2277, 1094.5901, 1861.1404]
2025-05-11 23:35:11,288 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [462.0, 578.0, 775.0, 602.0, 1000.0, 476.0, 393.0, 1000.0, 521.0, 1000.0]
2025-05-11 23:35:11,288 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (1341.80) for latency MM1Queue_a033_s075
2025-05-11 23:35:11,288 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 23:35:11,293 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 23:35:11,310 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 53 minutes, 54 seconds)
2025-05-11 23:38:08,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 23:38:22,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1829.40210 ± 297.291
2025-05-11 23:38:22,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1100.5876, 1562.2515, 1923.6443, 2031.3586, 1907.6201, 2004.371, 2111.6987, 1618.5507, 2030.4929, 2003.4463]
2025-05-11 23:38:22,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [491.0, 798.0, 1000.0, 1000.0, 830.0, 1000.0, 1000.0, 833.0, 1000.0, 1000.0]
2025-05-11 23:38:22,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (1829.40) for latency MM1Queue_a033_s075
2025-05-11 23:38:22,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 23:38:22,419 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 23:38:22,435 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 53 minutes, 48 seconds)
2025-05-11 23:41:09,816 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 23:41:16,895 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1158.28052 ± 525.526
2025-05-11 23:41:16,895 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1124.3177, 887.70825, 795.03516, 2328.3228, 2032.3978, 841.389, 947.8692, 1008.2681, 735.1252, 882.37244]
2025-05-11 23:41:16,895 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [459.0, 453.0, 351.0, 950.0, 927.0, 382.0, 418.0, 465.0, 332.0, 405.0]
2025-05-11 23:41:16,907 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 50 minutes, 27 seconds)
2025-05-11 23:44:05,685 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 23:44:14,165 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1318.52734 ± 488.726
2025-05-11 23:44:14,165 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [749.1461, 917.3509, 1077.112, 2203.4248, 2285.5112, 1201.9159, 1090.1006, 1119.0774, 1178.9014, 1362.7327]
2025-05-11 23:44:14,165 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [356.0, 408.0, 452.0, 1000.0, 1000.0, 521.0, 471.0, 493.0, 500.0, 563.0]
2025-05-11 23:44:14,178 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 47 minutes, 54 seconds)
2025-05-11 23:47:12,002 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 23:47:27,125 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 2067.68042 ± 282.776
2025-05-11 23:47:27,125 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2124.348, 2209.6292, 2189.2053, 1414.466, 2282.9246, 2145.1584, 2115.48, 2301.8528, 2260.0444, 1633.695]
2025-05-11 23:47:27,126 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 851.0, 1000.0, 1000.0, 1000.0, 1000.0, 951.0, 700.0]
2025-05-11 23:47:27,126 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (2067.68) for latency MM1Queue_a033_s075
2025-05-11 23:47:27,126 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 23:47:27,130 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 23:47:27,148 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 46 minutes, 18 seconds)
2025-05-11 23:50:10,344 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 23:50:25,581 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1993.28345 ± 332.069
2025-05-11 23:50:25,581 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2115.3538, 1023.5832, 2110.1619, 2011.0204, 2073.7925, 2165.0522, 2279.9531, 2038.9286, 2007.4579, 2107.531]
2025-05-11 23:50:25,581 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 529.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 23:50:25,594 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 43 minutes, 37 seconds)
2025-05-11 23:53:17,541 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 23:53:33,596 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 2282.59521 ± 56.810
2025-05-11 23:53:33,596 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2288.971, 2230.862, 2359.564, 2166.8772, 2359.8445, 2286.0935, 2315.2773, 2301.3655, 2229.5195, 2287.5774]
2025-05-11 23:53:33,597 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 23:53:33,597 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (2282.60) for latency MM1Queue_a033_s075
2025-05-11 23:53:33,597 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 23:53:33,601 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 23:53:33,619 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 40 minutes, 13 seconds)
2025-05-11 23:56:19,403 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 23:56:35,481 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 2352.71729 ± 42.494
2025-05-11 23:56:35,481 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2385.9966, 2374.6592, 2408.4055, 2297.2065, 2360.4211, 2305.3413, 2413.7373, 2363.2737, 2296.109, 2322.0205]
2025-05-11 23:56:35,481 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 23:56:35,481 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (2352.72) for latency MM1Queue_a033_s075
2025-05-11 23:56:35,482 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 23:56:35,486 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 23:56:35,504 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 37 minutes, 59 seconds)
2025-05-11 23:59:21,765 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 23:59:36,627 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 2170.29834 ± 390.254
2025-05-11 23:59:36,628 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2194.1653, 2362.7817, 2357.218, 2230.6057, 1029.6266, 2155.0625, 2443.5383, 2219.753, 2354.9688, 2355.2625]
2025-05-11 23:59:36,628 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 923.0, 435.0, 1000.0, 996.0, 978.0, 1000.0, 1000.0]
2025-05-11 23:59:36,641 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 35 minutes, 19 seconds)
2025-05-12 00:02:26,617 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 00:02:40,957 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 2271.56445 ± 392.733
2025-05-12 00:02:40,957 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2494.8025, 2449.747, 2452.2427, 2541.0562, 1506.4646, 2493.9885, 2465.301, 2401.185, 2438.3704, 1472.4852]
2025-05-12 00:02:40,957 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 957.0, 1000.0, 1000.0, 568.0, 1000.0, 1000.0, 1000.0, 1000.0, 574.0]
2025-05-12 00:02:40,971 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 31 minutes, 22 seconds)
2025-05-12 00:05:26,087 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 00:05:42,146 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 2417.92139 ± 81.411
2025-05-12 00:05:42,146 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2532.0095, 2515.0542, 2339.1216, 2353.612, 2399.5518, 2531.1677, 2425.2458, 2436.4807, 2355.7727, 2291.1995]
2025-05-12 00:05:42,146 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-12 00:05:42,147 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (2417.92) for latency MM1Queue_a033_s075
2025-05-12 00:05:42,147 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-12 00:05:42,150 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-12 00:05:42,169 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 28 minutes, 36 seconds)
2025-05-12 00:08:46,072 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 00:09:02,135 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 2466.30078 ± 86.864
2025-05-12 00:09:02,135 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2481.9255, 2501.825, 2370.2688, 2476.8523, 2388.6548, 2382.471, 2491.8408, 2674.2603, 2506.397, 2388.5127]
2025-05-12 00:09:02,135 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-12 00:09:02,135 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (2466.30) for latency MM1Queue_a033_s075
2025-05-12 00:09:02,136 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-12 00:09:02,139 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-12 00:09:02,158 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 26 minutes, 39 seconds)
2025-05-12 00:11:37,085 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 00:11:51,042 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 2136.09424 ± 568.404
2025-05-12 00:11:51,042 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [932.6468, 2433.622, 1080.0948, 2423.299, 2477.4358, 2304.4812, 2492.2979, 2354.4236, 2467.821, 2394.8206]
2025-05-12 00:11:51,042 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [429.0, 1000.0, 412.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-12 00:11:51,056 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 22 minutes, 23 seconds)
2025-05-12 00:14:44,945 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 00:15:00,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 2091.20068 ± 328.704
2025-05-12 00:15:00,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2247.5415, 2218.8467, 2116.475, 1131.9346, 2126.897, 2393.5571, 2140.0923, 2181.271, 2196.9016, 2158.4875]
2025-05-12 00:15:00,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 551.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-12 00:15:00,146 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 20 minutes, 2 seconds)
2025-05-12 00:18:00,126 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 00:18:13,938 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 2077.61670 ± 380.974
2025-05-12 00:18:13,938 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2414.3796, 1495.6329, 1737.1553, 2287.623, 2316.2966, 1335.2522, 2449.6963, 2202.63, 2223.6194, 2313.882]
2025-05-12 00:18:13,938 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 647.0, 735.0, 1000.0, 1000.0, 547.0, 971.0, 912.0, 1000.0, 1000.0]
2025-05-12 00:18:13,953 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 17 minutes, 44 seconds)
2025-05-12 00:20:53,048 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 00:21:04,445 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1713.12366 ± 631.279
2025-05-12 00:21:04,445 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [956.21716, 1006.68115, 2484.555, 1563.1029, 2248.2761, 2310.2065, 1023.7904, 980.77454, 2255.0754, 2302.557]
2025-05-12 00:21:04,445 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [443.0, 454.0, 1000.0, 779.0, 1000.0, 928.0, 429.0, 403.0, 1000.0, 1000.0]
2025-05-12 00:21:04,460 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 13 minutes, 46 seconds)
2025-05-12 00:23:51,756 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 00:24:06,023 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 2112.96533 ± 435.041
2025-05-12 00:24:06,023 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2322.5093, 957.28864, 2291.4165, 1937.2977, 2336.6777, 2552.4502, 2097.9106, 2420.7053, 2343.5579, 1869.8397]
2025-05-12 00:24:06,023 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 520.0, 1000.0, 789.0, 1000.0, 1000.0, 919.0, 1000.0, 1000.0, 844.0]
2025-05-12 00:24:06,038 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 9 minutes, 17 seconds)
2025-05-12 00:26:59,024 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 00:27:17,413 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 2236.80322 ± 197.720
2025-05-12 00:27:17,413 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2303.8713, 2246.182, 2239.4287, 2340.045, 1663.1063, 2272.6528, 2358.3752, 2394.5916, 2238.2283, 2311.5515]
2025-05-12 00:27:17,413 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 693.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-12 00:27:17,428 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 7 minutes, 56 seconds)
2025-05-12 00:30:12,201 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 00:30:24,602 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 2003.83826 ± 566.451
2025-05-12 00:30:24,602 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2433.4387, 2345.3613, 846.66064, 1647.9585, 2435.994, 1156.083, 1930.9286, 2282.0706, 2469.1794, 2490.71]
2025-05-12 00:30:24,602 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 892.0, 323.0, 654.0, 1000.0, 473.0, 746.0, 915.0, 1000.0, 1000.0]
2025-05-12 00:30:24,618 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 4 minutes, 42 seconds)
2025-05-12 00:33:12,715 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 00:33:24,973 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1908.89380 ± 644.499
2025-05-12 00:33:24,973 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1865.2432, 2393.5474, 2448.9197, 921.53723, 2409.8855, 2378.3596, 625.9919, 1511.5409, 2023.0762, 2510.8362]
2025-05-12 00:33:24,973 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [760.0, 1000.0, 1000.0, 408.0, 1000.0, 1000.0, 236.0, 630.0, 860.0, 1000.0]
2025-05-12 00:33:24,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 44 seconds)
2025-05-12 00:36:15,566 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 00:36:30,029 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 2252.74146 ± 405.569
2025-05-12 00:36:30,029 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2350.5398, 2376.9597, 2440.1387, 1046.4916, 2450.6155, 2376.3318, 2278.353, 2387.6733, 2351.5964, 2468.7144]
2025-05-12 00:36:30,030 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 428.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-12 00:36:30,044 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 58 minutes, 37 seconds)
2025-05-12 00:39:21,626 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 00:39:36,721 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 2289.40503 ± 411.563
2025-05-12 00:39:36,721 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2804.2275, 2400.7349, 2401.1199, 2281.4688, 2339.2896, 2454.5276, 2357.8342, 1124.3412, 2348.7131, 2381.795]
2025-05-12 00:39:36,721 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 463.0, 965.0, 990.0]
2025-05-12 00:39:36,737 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 55 minutes, 50 seconds)
2025-05-12 00:42:33,203 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 00:42:49,129 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 2395.04102 ± 76.611
2025-05-12 00:42:49,129 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2418.1147, 2395.98, 2379.3113, 2479.7224, 2195.4456, 2402.0098, 2357.9346, 2410.5598, 2424.969, 2486.364]
2025-05-12 00:42:49,129 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 844.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-12 00:42:49,145 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 52 minutes, 47 seconds)
2025-05-12 00:45:34,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 00:45:50,641 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 2545.01636 ± 95.545
2025-05-12 00:45:50,641 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2432.8494, 2517.2542, 2583.2917, 2378.55, 2723.6643, 2628.8748, 2464.9692, 2595.1377, 2574.2114, 2551.3623]
2025-05-12 00:45:50,641 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 928.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-12 00:45:50,642 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (2545.02) for latency MM1Queue_a033_s075
2025-05-12 00:45:50,642 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-12 00:45:50,646 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-12 00:45:50,667 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 49 minutes, 23 seconds)
2025-05-12 00:48:46,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 00:49:01,544 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 2447.96240 ± 289.232
2025-05-12 00:49:01,544 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2579.8826, 2465.4146, 2502.4438, 2645.0833, 1616.4349, 2469.2678, 2509.3572, 2706.1597, 2565.8486, 2419.7354]
2025-05-12 00:49:01,544 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 706.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-12 00:49:01,560 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 46 minutes, 49 seconds)
2025-05-12 00:51:43,934 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 00:51:57,935 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 2255.54834 ± 436.393
2025-05-12 00:51:57,935 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2515.1396, 2510.725, 2408.407, 2386.4265, 2484.6365, 1745.367, 2493.2195, 2450.3928, 1119.9067, 2441.2659]
2025-05-12 00:51:57,935 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 679.0, 1000.0, 1000.0, 487.0, 1000.0]
2025-05-12 00:51:57,951 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 43 minutes, 18 seconds)
2025-05-12 00:54:55,792 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 00:55:10,902 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 2336.58398 ± 431.742
2025-05-12 00:55:10,902 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2535.313, 2532.2783, 1453.1519, 2501.7114, 2596.028, 2560.5002, 1497.2745, 2522.0774, 2580.841, 2586.6626]
2025-05-12 00:55:10,902 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 533.0, 1000.0, 1000.0, 1000.0, 544.0, 1000.0, 1000.0, 1000.0]
2025-05-12 00:55:10,921 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 40 minutes, 28 seconds)
2025-05-12 00:58:17,810 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 00:58:28,416 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1654.13245 ± 754.999
2025-05-12 00:58:28,417 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [824.87085, 1173.0835, 2513.927, 1323.2915, 1171.0508, 971.99945, 2460.3496, 2860.9187, 2370.9995, 870.8313]
2025-05-12 00:58:28,417 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [358.0, 446.0, 1000.0, 562.0, 496.0, 409.0, 1000.0, 1000.0, 1000.0, 365.0]
2025-05-12 00:58:28,435 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 37 minutes, 34 seconds)
2025-05-12 01:00:50,287 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 01:01:01,478 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 2597.55273 ± 95.239
2025-05-12 01:01:01,478 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2626.4443, 2637.979, 2767.5735, 2494.3782, 2638.4927, 2581.7434, 2614.4236, 2643.6904, 2387.4495, 2583.352]
2025-05-12 01:01:01,478 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 857.0, 1000.0]
2025-05-12 01:01:01,478 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (2597.55) for latency MM1Queue_a033_s075
2025-05-12 01:01:01,479 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-12 01:01:01,481 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-12 01:01:01,495 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 33 minutes, 23 seconds)
2025-05-12 01:03:01,721 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 01:03:10,069 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1944.85718 ± 704.405
2025-05-12 01:03:10,069 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2194.1038, 1382.9158, 1271.9662, 2621.1892, 2532.7073, 2554.6506, 2574.5144, 779.8495, 2523.424, 1013.25244]
2025-05-12 01:03:10,069 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [805.0, 551.0, 487.0, 1000.0, 1000.0, 1000.0, 1000.0, 331.0, 1000.0, 400.0]
2025-05-12 01:03:10,079 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 28 minutes, 17 seconds)
2025-05-12 01:05:20,009 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 01:05:31,204 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 2625.84888 ± 131.586
2025-05-12 01:05:31,204 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2616.2366, 2594.3926, 2669.8354, 2543.742, 2953.8762, 2640.0854, 2641.444, 2618.4333, 2394.0522, 2586.388]
2025-05-12 01:05:31,204 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 888.0, 1000.0]
2025-05-12 01:05:31,204 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (2625.85) for latency MM1Queue_a033_s075
2025-05-12 01:05:31,204 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-12 01:05:31,207 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-walker2d/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-12 01:05:31,220 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 24 minutes, 23 seconds)
2025-05-12 01:07:32,611 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 01:07:39,092 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1546.31860 ± 847.048
2025-05-12 01:07:39,092 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2560.0684, 726.3508, 2017.3438, 1632.5967, 142.39474, 826.35345, 965.3356, 2553.4788, 2711.0938, 1328.1708]
2025-05-12 01:07:39,092 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 304.0, 792.0, 629.0, 84.0, 318.0, 372.0, 1000.0, 1000.0, 502.0]
2025-05-12 01:07:39,102 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 19 minutes, 57 seconds)
2025-05-12 01:09:54,155 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 01:10:04,957 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 2489.91821 ± 190.764
2025-05-12 01:10:04,958 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2302.6465, 2517.521, 2578.7637, 2629.2686, 2597.692, 2483.4897, 2619.7341, 2685.9292, 2477.628, 2006.5103]
2025-05-12 01:10:04,958 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [793.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 788.0]
2025-05-12 01:10:04,968 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 16 minutes, 15 seconds)
2025-05-12 01:12:07,774 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 01:12:18,522 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 2394.41626 ± 298.855
2025-05-12 01:12:18,522 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1925.8147, 2540.06, 2603.0808, 2540.8674, 2740.436, 2568.3533, 2565.9558, 2393.426, 2304.0261, 1762.1418]
2025-05-12 01:12:18,522 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [787.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 675.0]
2025-05-12 01:12:18,533 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 13 minutes, 32 seconds)
2025-05-12 01:14:35,006 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 01:14:46,416 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 2453.83398 ± 159.797
2025-05-12 01:14:46,416 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2301.964, 2298.6287, 2373.099, 2315.9094, 2430.7676, 2381.7043, 2702.5525, 2733.4458, 2372.4807, 2627.788]
2025-05-12 01:14:46,416 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-12 01:14:46,426 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 11 minutes, 36 seconds)
2025-05-12 01:16:47,879 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 01:16:58,912 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 2614.15967 ± 236.379
2025-05-12 01:16:58,912 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2655.2793, 2634.6567, 2740.071, 2784.4468, 1944.6486, 2539.0137, 2643.8672, 2704.8083, 2828.9092, 2665.8962]
2025-05-12 01:16:58,912 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 688.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-12 01:16:58,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 9 minutes, 10 seconds)
2025-05-12 01:19:08,039 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 01:19:17,148 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 2071.65552 ± 735.620
2025-05-12 01:19:17,149 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1314.2809, 2507.905, 2572.4766, 2674.636, 2601.2825, 2528.7595, 2392.3384, 647.70465, 966.82117, 2510.3525]
2025-05-12 01:19:17,149 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [528.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 877.0, 273.0, 428.0, 1000.0]
2025-05-12 01:19:17,160 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 58 seconds)
2025-05-12 01:21:17,969 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 01:21:29,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 2496.67041 ± 62.573
2025-05-12 01:21:29,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2573.1313, 2490.5955, 2444.9792, 2542.2366, 2440.6587, 2491.482, 2446.576, 2392.3162, 2556.5435, 2588.183]
2025-05-12 01:21:29,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-12 01:21:29,314 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 33 seconds)
2025-05-12 01:23:37,607 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 01:23:47,999 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 2181.32104 ± 438.385
2025-05-12 01:23:47,999 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2206.2612, 2235.0217, 2275.863, 2265.5051, 2363.6726, 2320.4329, 900.80975, 2411.2344, 2568.5737, 2265.837]
2025-05-12 01:23:47,999 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 394.0, 1000.0, 1000.0, 1000.0]
2025-05-12 01:23:48,010 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 17 seconds)
2025-05-12 01:25:58,314 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-12 01:26:08,835 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 2281.25879 ± 375.511
2025-05-12 01:26:08,835 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2332.9397, 1169.0742, 2375.8037, 2331.3274, 2451.519, 2465.2014, 2529.928, 2401.108, 2403.6606, 2352.0256]
2025-05-12 01:26:08,835 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 481.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-12 01:26:08,846 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1251 [DEBUG]: Training session finished
