2025-05-11 20:08:35,912 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-sac-aug-mem2
2025-05-11 20:08:35,912 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-sac-aug-mem2
2025-05-11 20:08:35,912 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x7e70c6bcc3d0>}
2025-05-11 20:08:35,912 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1111 [DEBUG]: using device: cpu
2025-05-11 20:08:35,912 baseline-sac-noisy-walker2d:77 [WARNING]: args.memorize_actions != args.horizon: 2 != 24
2025-05-11 20:08:35,920 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1133 [INFO]: Creating new trainer
2025-05-11 20:08:35,930 baseline-sac-noisy-walker2d:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=29, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-05-11 20:08:35,930 baseline-sac-noisy-walker2d:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-11 20:08:36,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1194 [DEBUG]: Starting training session...
2025-05-11 20:08:36,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 1/100
2025-05-11 20:11:10,487 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 20:11:12,293 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 103.35514 ± 42.318
2025-05-11 20:11:12,293 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [59.304646, 156.32307, 64.15324, 59.924232, 122.47241, 157.47865, 120.605125, 159.02504, 52.57871, 81.68625]
2025-05-11 20:11:12,293 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [172.0, 84.0, 173.0, 174.0, 171.0, 84.0, 67.0, 269.0, 164.0, 189.0]
2025-05-11 20:11:12,293 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (103.36) for latency ExtremeClogL1U23
2025-05-11 20:11:12,294 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 20:11:12,298 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 20:11:12,304 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 17 minutes, 41 seconds)
2025-05-11 20:14:01,601 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 20:14:03,093 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 92.06113 ± 97.405
2025-05-11 20:14:03,093 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [19.804688, 46.49723, 259.78146, 6.2098346, 302.2001, 61.47983, 47.682785, 78.8441, 72.96334, 25.148027]
2025-05-11 20:14:03,093 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [33.0, 95.0, 171.0, 152.0, 205.0, 99.0, 49.0, 124.0, 200.0, 191.0]
2025-05-11 20:14:03,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 27 minutes, 1 second)
2025-05-11 20:16:59,457 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 20:17:05,309 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 390.15549 ± 415.114
2025-05-11 20:17:05,309 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [996.4515, 58.141006, 289.8473, 960.1679, 514.14124, -4.3021646, 985.0491, 90.699646, -4.562014, 15.92141]
2025-05-11 20:17:05,309 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 64.0, 190.0, 1000.0, 321.0, 34.0, 1000.0, 134.0, 91.0, 100.0]
2025-05-11 20:17:05,309 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (390.16) for latency ExtremeClogL1U23
2025-05-11 20:17:05,309 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 20:17:05,314 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 20:17:05,321 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 34 minutes, 23 seconds)
2025-05-11 20:19:45,240 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 20:19:49,977 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 399.57251 ± 349.758
2025-05-11 20:19:49,977 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [34.27808, 367.09296, 993.6555, 109.2591, 317.9323, 507.21603, 18.550684, 79.55237, 571.91766, 996.2704]
2025-05-11 20:19:49,977 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [50.0, 211.0, 1000.0, 100.0, 163.0, 297.0, 37.0, 168.0, 419.0, 1000.0]
2025-05-11 20:19:49,977 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (399.57) for latency ExtremeClogL1U23
2025-05-11 20:19:49,978 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 20:19:49,982 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 20:19:49,988 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 29 minutes, 32 seconds)
2025-05-11 20:22:38,553 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 20:22:40,195 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 153.25899 ± 180.322
2025-05-11 20:22:40,196 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [3.1197846, 78.06879, 50.01951, 1.3413597, 18.446138, 66.779396, 49.625534, 346.98352, 425.76666, 492.43927]
2025-05-11 20:22:40,196 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [143.0, 116.0, 62.0, 132.0, 40.0, 157.0, 70.0, 196.0, 243.0, 290.0]
2025-05-11 20:22:40,197 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 27 minutes, 17 seconds)
2025-05-11 20:25:31,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 20:25:34,938 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 315.75885 ± 286.353
2025-05-11 20:25:34,939 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [300.96982, 98.299286, 178.86719, 26.922497, 83.9671, 544.0073, 1002.07184, 59.290707, 452.12003, 411.0726]
2025-05-11 20:25:34,939 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [181.0, 161.0, 92.0, 85.0, 128.0, 249.0, 1000.0, 200.0, 204.0, 331.0]
2025-05-11 20:25:34,940 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 30 minutes, 17 seconds)
2025-05-11 20:28:16,727 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 20:28:20,322 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 318.47421 ± 330.575
2025-05-11 20:28:20,322 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [134.835, 80.23191, 245.81549, 14.268668, 83.069725, 329.6958, 833.2794, 1046.6569, 87.057014, 329.832]
2025-05-11 20:28:20,322 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [228.0, 121.0, 201.0, 25.0, 81.0, 212.0, 515.0, 1000.0, 160.0, 227.0]
2025-05-11 20:28:20,324 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 25 minutes, 44 seconds)
2025-05-11 20:31:09,970 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 20:31:11,378 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 80.63298 ± 71.860
2025-05-11 20:31:11,378 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [194.6587, 33.416374, 20.098131, 63.737553, 55.233295, 84.45257, 20.945549, 231.9018, 93.306526, 8.579312]
2025-05-11 20:31:11,378 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [199.0, 66.0, 69.0, 196.0, 168.0, 185.0, 34.0, 124.0, 167.0, 39.0]
2025-05-11 20:31:11,380 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 19 minutes, 27 seconds)
2025-05-11 20:33:58,457 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 20:34:00,607 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 247.35722 ± 147.946
2025-05-11 20:34:00,608 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [232.34213, 260.9745, 445.1388, 69.58862, 384.06894, 129.44513, 118.83874, 357.02246, 445.97104, 30.181828]
2025-05-11 20:34:00,608 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [169.0, 245.0, 365.0, 104.0, 222.0, 96.0, 120.0, 210.0, 284.0, 47.0]
2025-05-11 20:34:00,610 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 18 minutes, 1 second)
2025-05-11 20:36:45,867 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 20:36:48,209 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 412.73413 ± 155.942
2025-05-11 20:36:48,209 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [556.3742, 550.1012, 265.23462, 13.081047, 388.42352, 434.09518, 477.6968, 436.44144, 496.14392, 509.74973]
2025-05-11 20:36:48,209 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [270.0, 257.0, 139.0, 29.0, 224.0, 182.0, 209.0, 215.0, 241.0, 235.0]
2025-05-11 20:36:48,210 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (412.73) for latency ExtremeClogL1U23
2025-05-11 20:36:48,210 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 20:36:48,214 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 20:36:48,221 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 14 minutes, 24 seconds)
2025-05-11 20:39:46,278 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 20:39:49,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 447.49023 ± 318.333
2025-05-11 20:39:49,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [652.633, 17.302517, 224.09833, 404.73215, 550.3927, 811.5502, 41.502453, 1006.52747, 153.1803, 612.98315]
2025-05-11 20:39:49,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [320.0, 29.0, 206.0, 220.0, 374.0, 369.0, 64.0, 592.0, 180.0, 382.0]
2025-05-11 20:39:49,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (447.49) for latency ExtremeClogL1U23
2025-05-11 20:39:49,808 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 20:39:49,813 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 20:39:49,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 13 minutes, 36 seconds)
2025-05-11 20:42:41,927 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 20:42:47,170 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 632.77014 ± 456.399
2025-05-11 20:42:47,170 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [509.62973, 573.3331, 528.3225, 1250.7708, 1584.4519, 179.36195, 25.592836, 718.14996, 217.3446, 740.74396]
2025-05-11 20:42:47,170 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [208.0, 308.0, 402.0, 610.0, 897.0, 172.0, 39.0, 286.0, 310.0, 666.0]
2025-05-11 20:42:47,170 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (632.77) for latency ExtremeClogL1U23
2025-05-11 20:42:47,171 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 20:42:47,175 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 20:42:47,183 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 14 minutes, 16 seconds)
2025-05-11 20:45:33,313 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 20:45:36,678 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 422.23676 ± 105.635
2025-05-11 20:45:36,678 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [562.13495, 368.10025, 245.5379, 432.21924, 404.0182, 307.14447, 473.16742, 329.06647, 552.9953, 547.9832]
2025-05-11 20:45:36,678 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [267.0, 232.0, 258.0, 263.0, 414.0, 174.0, 265.0, 183.0, 332.0, 326.0]
2025-05-11 20:45:36,681 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 10 minutes, 56 seconds)
2025-05-11 20:48:28,873 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 20:48:32,424 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 458.72745 ± 348.524
2025-05-11 20:48:32,424 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [856.13635, 357.86783, 498.067, 184.67563, 814.5711, 304.9505, 336.18442, -5.097698, 101.54158, 1138.3776]
2025-05-11 20:48:32,424 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [482.0, 205.0, 231.0, 98.0, 422.0, 202.0, 180.0, 62.0, 173.0, 693.0]
2025-05-11 20:48:32,427 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 9 minutes, 55 seconds)
2025-05-11 20:51:24,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 20:51:27,712 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 428.26337 ± 315.927
2025-05-11 20:51:27,713 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [898.49994, 814.402, 332.33884, 76.83291, 649.76666, 731.2523, 93.52515, 496.63925, 7.989695, 181.38647]
2025-05-11 20:51:27,713 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [473.0, 396.0, 151.0, 151.0, 391.0, 437.0, 175.0, 298.0, 26.0, 132.0]
2025-05-11 20:51:27,715 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 9 minutes, 11 seconds)
2025-05-11 20:54:24,153 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 20:54:28,361 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 631.60425 ± 293.924
2025-05-11 20:54:28,361 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [545.97455, 711.17377, 674.2716, 755.94257, 1232.076, 77.243324, 424.1072, 887.72235, 394.79733, 612.7339]
2025-05-11 20:54:28,361 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [291.0, 392.0, 291.0, 378.0, 653.0, 131.0, 187.0, 382.0, 251.0, 296.0]
2025-05-11 20:54:28,364 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 5 minutes, 59 seconds)
2025-05-11 20:57:21,572 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 20:57:25,250 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 418.44785 ± 293.714
2025-05-11 20:57:25,250 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [12.135917, 410.46887, 741.6159, 402.67978, 275.86844, 549.7142, 420.92212, 59.24871, 1051.8097, 260.015]
2025-05-11 20:57:25,250 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [29.0, 205.0, 409.0, 250.0, 146.0, 278.0, 216.0, 58.0, 980.0, 159.0]
2025-05-11 20:57:25,254 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 2 minutes, 55 seconds)
2025-05-11 21:00:19,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 21:00:24,617 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 650.99042 ± 394.491
2025-05-11 21:00:24,617 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1568.4692, 707.0576, 399.42566, 634.02924, 913.3229, 22.880016, 881.70435, 449.22632, 573.3977, 360.39148]
2025-05-11 21:00:24,617 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 355.0, 211.0, 356.0, 513.0, 38.0, 491.0, 259.0, 322.0, 217.0]
2025-05-11 21:00:24,617 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (650.99) for latency ExtremeClogL1U23
2025-05-11 21:00:24,618 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 21:00:24,622 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 21:00:24,632 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 2 minutes, 42 seconds)
2025-05-11 21:03:06,981 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 21:03:10,790 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 597.76685 ± 386.857
2025-05-11 21:03:10,790 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [722.52466, 636.1381, 1347.6797, 112.12067, 581.3847, 510.97137, 1099.0171, 3.5199142, 627.77, 336.5422]
2025-05-11 21:03:10,790 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [360.0, 342.0, 622.0, 151.0, 287.0, 257.0, 529.0, 19.0, 344.0, 179.0]
2025-05-11 21:03:10,794 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 3 hours, 57 minutes, 9 seconds)
2025-05-11 21:05:58,727 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 21:06:00,850 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 313.48859 ± 244.402
2025-05-11 21:06:00,850 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [685.3936, 36.52667, 158.16808, 145.8585, 21.74643, 46.888123, 453.46185, 567.4279, 589.7264, 429.68826]
2025-05-11 21:06:00,850 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [319.0, 59.0, 115.0, 112.0, 43.0, 58.0, 233.0, 320.0, 311.0, 239.0]
2025-05-11 21:06:00,854 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 52 minutes, 50 seconds)
2025-05-11 21:08:47,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 21:08:50,215 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 351.34470 ± 245.397
2025-05-11 21:08:50,216 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [427.00522, 331.9395, 225.83466, 488.7032, 992.28436, 364.13828, 47.732296, 209.12442, 215.91888, 210.76642]
2025-05-11 21:08:50,216 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [240.0, 166.0, 129.0, 204.0, 525.0, 184.0, 78.0, 135.0, 140.0, 114.0]
2025-05-11 21:08:50,220 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 46 minutes, 57 seconds)
2025-05-11 21:11:34,528 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 21:11:37,347 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 428.39908 ± 322.651
2025-05-11 21:11:37,347 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [245.77065, 1182.032, 68.10904, 358.51678, 204.18938, 194.31656, 431.16812, 606.26733, 778.73114, 214.88963]
2025-05-11 21:11:37,347 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [132.0, 723.0, 129.0, 173.0, 117.0, 117.0, 282.0, 240.0, 383.0, 136.0]
2025-05-11 21:11:37,351 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 41 minutes, 32 seconds)
2025-05-11 21:14:21,793 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 21:14:25,498 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 643.92480 ± 385.239
2025-05-11 21:14:25,498 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1363.6907, 446.99503, 586.0336, 307.639, 688.24896, 601.6206, 180.61723, 1268.9382, 215.76213, 779.70276]
2025-05-11 21:14:25,498 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [571.0, 288.0, 331.0, 181.0, 327.0, 292.0, 156.0, 536.0, 134.0, 339.0]
2025-05-11 21:14:25,501 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 35 minutes, 49 seconds)
2025-05-11 21:17:18,126 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 21:17:21,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 510.47443 ± 283.688
2025-05-11 21:17:21,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [763.8845, 72.3966, 344.8443, 415.0008, 324.70474, 574.7937, 422.34863, 271.89703, 889.5185, 1025.3553]
2025-05-11 21:17:21,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [361.0, 124.0, 200.0, 203.0, 181.0, 304.0, 223.0, 176.0, 407.0, 501.0]
2025-05-11 21:17:21,335 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 35 minutes, 28 seconds)
2025-05-11 21:20:05,725 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 21:20:08,182 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 350.68930 ± 146.014
2025-05-11 21:20:08,183 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [491.06177, 344.6952, 304.40482, 292.1478, 195.44177, 373.39725, 124.71666, 659.39935, 269.42673, 452.20154]
2025-05-11 21:20:08,183 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [309.0, 201.0, 176.0, 240.0, 130.0, 278.0, 127.0, 276.0, 173.0, 214.0]
2025-05-11 21:20:08,199 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 31 minutes, 50 seconds)
2025-05-11 21:22:53,082 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 21:22:55,900 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 445.83847 ± 247.187
2025-05-11 21:22:55,901 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1016.8002, 444.97635, 417.8553, 236.50084, 632.76294, 282.3322, 144.19156, 187.75404, 562.15063, 533.0607]
2025-05-11 21:22:55,901 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [499.0, 218.0, 231.0, 126.0, 303.0, 162.0, 121.0, 140.0, 297.0, 288.0]
2025-05-11 21:22:55,905 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 28 minutes, 36 seconds)
2025-05-11 21:25:45,473 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 21:25:48,686 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 504.80331 ± 286.998
2025-05-11 21:25:48,686 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [18.97429, 181.23047, 596.07465, 826.6689, 672.41895, 129.92754, 854.0048, 647.4357, 724.7694, 396.52838]
2025-05-11 21:25:48,686 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [88.0, 225.0, 298.0, 378.0, 336.0, 198.0, 423.0, 335.0, 360.0, 201.0]
2025-05-11 21:25:48,691 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 27 minutes, 9 seconds)
2025-05-11 21:28:54,470 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 21:28:57,108 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 334.34723 ± 157.587
2025-05-11 21:28:57,108 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [453.69043, 263.25623, 372.1584, 249.35281, 34.381844, 443.42923, 450.10577, 250.70978, 206.2158, 620.17175]
2025-05-11 21:28:57,108 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [253.0, 173.0, 220.0, 132.0, 45.0, 261.0, 222.0, 153.0, 138.0, 274.0]
2025-05-11 21:28:57,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 29 minutes, 11 seconds)
2025-05-11 21:32:26,059 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 21:32:29,947 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 505.95648 ± 224.445
2025-05-11 21:32:29,947 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [393.40082, 894.6322, 537.6664, 482.99512, 328.42874, 163.59697, 708.53613, 221.75418, 569.89056, 758.6642]
2025-05-11 21:32:29,947 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [192.0, 449.0, 266.0, 256.0, 179.0, 210.0, 324.0, 117.0, 323.0, 322.0]
2025-05-11 21:32:29,953 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 35 minutes, 2 seconds)
2025-05-11 21:35:21,497 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 21:35:26,572 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 813.86292 ± 523.553
2025-05-11 21:35:26,573 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1218.3494, 562.678, 716.3012, 689.02606, 850.69055, 469.65662, 431.7323, 1189.3202, 3.1801097, 2007.695]
2025-05-11 21:35:26,573 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [661.0, 274.0, 305.0, 293.0, 408.0, 219.0, 215.0, 521.0, 48.0, 1000.0]
2025-05-11 21:35:26,573 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (813.86) for latency ExtremeClogL1U23
2025-05-11 21:35:26,573 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 21:35:26,578 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 21:35:26,587 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 34 minutes, 17 seconds)
2025-05-11 21:38:11,590 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 21:38:16,623 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 814.49231 ± 400.207
2025-05-11 21:38:16,624 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [923.5167, 355.51624, 1377.942, 160.93439, 1152.1455, 486.36215, 1273.4242, 1159.571, 545.95264, 709.5583]
2025-05-11 21:38:16,624 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [439.0, 241.0, 678.0, 158.0, 574.0, 225.0, 494.0, 561.0, 292.0, 355.0]
2025-05-11 21:38:16,624 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (814.49) for latency ExtremeClogL1U23
2025-05-11 21:38:16,624 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 21:38:16,629 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 21:38:16,638 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 31 minutes, 46 seconds)
2025-05-11 21:41:26,051 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 21:41:29,388 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 535.53186 ± 360.768
2025-05-11 21:41:29,389 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [190.96617, 392.06918, 699.7598, 438.00604, 351.50943, 1322.1317, 0.2235068, 449.33633, 552.87115, 958.44507]
2025-05-11 21:41:29,389 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [133.0, 184.0, 311.0, 247.0, 306.0, 575.0, 14.0, 201.0, 363.0, 524.0]
2025-05-11 21:41:29,393 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 33 minutes, 13 seconds)
2025-05-11 21:44:10,223 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 21:44:15,387 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 868.21204 ± 389.008
2025-05-11 21:44:15,387 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1281.2817, 604.5523, 1053.5681, 94.97783, 918.9286, 803.7818, 952.5744, 1555.5869, 935.91614, 480.95218]
2025-05-11 21:44:15,388 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [689.0, 243.0, 466.0, 126.0, 468.0, 391.0, 459.0, 777.0, 441.0, 200.0]
2025-05-11 21:44:15,388 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (868.21) for latency ExtremeClogL1U23
2025-05-11 21:44:15,388 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 21:44:15,392 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 21:44:15,402 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 25 minutes, 5 seconds)
2025-05-11 21:47:00,688 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 21:47:05,126 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 600.86542 ± 318.715
2025-05-11 21:47:05,126 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [996.28613, 898.0937, 627.0037, 45.03846, 372.65332, 548.1771, 706.54016, 819.912, 93.692215, 901.2566]
2025-05-11 21:47:05,126 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 474.0, 238.0, 59.0, 212.0, 248.0, 341.0, 405.0, 161.0, 462.0]
2025-05-11 21:47:05,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 12 minutes, 32 seconds)
2025-05-11 21:49:47,019 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 21:49:52,098 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 934.05988 ± 586.457
2025-05-11 21:49:52,098 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [851.29816, 1069.8376, 27.89273, 40.338978, 987.4597, 1299.1692, 1180.7112, 661.00085, 2161.4014, 1061.4895]
2025-05-11 21:49:52,098 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [341.0, 479.0, 51.0, 111.0, 460.0, 572.0, 467.0, 325.0, 875.0, 473.0]
2025-05-11 21:49:52,098 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (934.06) for latency ExtremeClogL1U23
2025-05-11 21:49:52,099 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 21:49:52,103 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 21:49:52,112 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 7 minutes, 31 seconds)
2025-05-11 21:52:39,233 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 21:52:41,681 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 387.64499 ± 442.569
2025-05-11 21:52:41,682 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [863.70496, 960.96857, 904.6572, 26.451677, 15.999199, 50.004337, 983.51666, -2.5636141, 33.626823, 40.084316]
2025-05-11 21:52:41,682 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [459.0, 369.0, 478.0, 48.0, 29.0, 46.0, 489.0, 24.0, 47.0, 60.0]
2025-05-11 21:52:41,687 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 4 minutes, 32 seconds)
2025-05-11 21:55:25,935 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 21:55:30,104 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 719.35925 ± 100.037
2025-05-11 21:55:30,104 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [714.6867, 592.02484, 563.7272, 712.5498, 639.13043, 792.1225, 856.0927, 722.68884, 889.82263, 710.7471]
2025-05-11 21:55:30,105 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [339.0, 341.0, 263.0, 321.0, 330.0, 347.0, 412.0, 395.0, 463.0, 312.0]
2025-05-11 21:55:30,110 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 56 minutes, 33 seconds)
2025-05-11 21:58:14,251 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 21:58:19,548 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 928.08252 ± 468.397
2025-05-11 21:58:19,548 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1075.8357, 18.624697, 1154.8784, 997.3231, 1303.5483, 469.02322, 1383.3032, 1039.5177, 320.51984, 1518.2509]
2025-05-11 21:58:19,548 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [476.0, 28.0, 562.0, 447.0, 569.0, 210.0, 606.0, 417.0, 284.0, 719.0]
2025-05-11 21:58:19,554 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 54 minutes, 27 seconds)
2025-05-11 22:01:06,389 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 22:01:14,176 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1189.60876 ± 760.846
2025-05-11 22:01:14,177 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1702.8978, 1434.7399, 2060.8687, 2128.1418, 271.5799, 1203.0048, 461.6329, 2046.1204, 531.2693, 55.83217]
2025-05-11 22:01:14,177 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [721.0, 725.0, 1000.0, 1000.0, 147.0, 517.0, 242.0, 1000.0, 231.0, 85.0]
2025-05-11 22:01:14,177 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (1189.61) for latency ExtremeClogL1U23
2025-05-11 22:01:14,177 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 22:01:14,181 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 22:01:14,192 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 52 minutes, 38 seconds)
2025-05-11 22:04:06,024 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 22:04:12,810 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1100.30444 ± 754.411
2025-05-11 22:04:12,811 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [26.649054, 1644.7522, 1693.2319, 297.81268, 634.61456, 1760.333, 22.031307, 1036.0562, 1896.707, 1990.857]
2025-05-11 22:04:12,811 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [33.0, 643.0, 723.0, 262.0, 291.0, 1000.0, 34.0, 442.0, 819.0, 815.0]
2025-05-11 22:04:12,816 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 52 minutes, 8 seconds)
2025-05-11 22:07:00,133 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 22:07:05,737 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 853.43176 ± 747.263
2025-05-11 22:07:05,737 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1948.2097, 1086.9706, 1547.9474, 294.25723, 25.586987, 918.3093, 2096.177, 309.68735, 37.98407, 269.18753]
2025-05-11 22:07:05,737 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [756.0, 556.0, 765.0, 228.0, 33.0, 454.0, 1000.0, 190.0, 52.0, 192.0]
2025-05-11 22:07:05,743 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 49 minutes, 55 seconds)
2025-05-11 22:09:50,999 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 22:09:57,386 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1146.54028 ± 651.403
2025-05-11 22:09:57,386 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2527.57, 862.74457, 2050.6475, 1088.7523, 245.17662, 463.52795, 1091.2648, 1193.0668, 1182.441, 760.21094]
2025-05-11 22:09:57,386 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 342.0, 873.0, 436.0, 132.0, 253.0, 504.0, 535.0, 491.0, 313.0]
2025-05-11 22:09:57,393 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 47 minutes, 40 seconds)
2025-05-11 22:12:40,260 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 22:12:46,105 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 981.23163 ± 702.422
2025-05-11 22:12:46,105 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [545.9922, -1.1435678, 1574.3528, 623.8346, 1084.7635, 2038.8363, 1358.0098, 1961.6473, 17.643343, 608.38]
2025-05-11 22:12:46,105 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [262.0, 25.0, 637.0, 313.0, 548.0, 940.0, 610.0, 780.0, 27.0, 312.0]
2025-05-11 22:12:46,112 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 44 minutes, 38 seconds)
2025-05-11 22:15:40,710 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 22:15:44,696 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 670.25330 ± 617.389
2025-05-11 22:15:44,697 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [924.6005, 662.7085, 749.8653, 17.022165, 2204.5916, 298.03983, 1102.38, 451.61768, 281.19772, 10.509945]
2025-05-11 22:15:44,697 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [407.0, 331.0, 329.0, 38.0, 952.0, 181.0, 494.0, 225.0, 158.0, 22.0]
2025-05-11 22:15:44,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 42 minutes, 29 seconds)
2025-05-11 22:18:24,412 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 22:18:32,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1240.04712 ± 786.878
2025-05-11 22:18:32,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1095.4493, 939.6359, 2148.2239, 927.07324, 555.1011, 2259.0046, 2339.3743, 411.48193, 5.1178975, 1720.0096]
2025-05-11 22:18:32,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [581.0, 462.0, 1000.0, 514.0, 274.0, 1000.0, 1000.0, 229.0, 15.0, 810.0]
2025-05-11 22:18:32,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (1240.05) for latency ExtremeClogL1U23
2025-05-11 22:18:32,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 22:18:32,476 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 22:18:32,487 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 37 minutes, 36 seconds)
2025-05-11 22:21:25,884 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 22:21:34,307 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1410.46167 ± 592.652
2025-05-11 22:21:34,307 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2090.6387, 669.9155, 875.58435, 2166.6997, 880.2439, 1437.7477, 2205.1838, 1690.9266, 1481.0919, 606.58417]
2025-05-11 22:21:34,307 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [831.0, 336.0, 397.0, 963.0, 395.0, 597.0, 959.0, 732.0, 681.0, 396.0]
2025-05-11 22:21:34,308 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (1410.46) for latency ExtremeClogL1U23
2025-05-11 22:21:34,308 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 22:21:34,311 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 22:21:34,323 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 36 minutes, 20 seconds)
2025-05-11 22:24:18,822 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 22:24:26,986 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1426.39673 ± 663.549
2025-05-11 22:24:26,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1257.3458, 3.3996975, 2365.1824, 1387.8093, 2403.3535, 1754.0602, 1512.2817, 837.6028, 1221.8608, 1521.07]
2025-05-11 22:24:26,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [553.0, 26.0, 1000.0, 627.0, 1000.0, 697.0, 602.0, 329.0, 562.0, 646.0]
2025-05-11 22:24:26,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (1426.40) for latency ExtremeClogL1U23
2025-05-11 22:24:26,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 22:24:26,991 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 22:24:27,003 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 33 minutes, 37 seconds)
2025-05-11 22:27:18,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 22:27:26,706 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1431.44067 ± 747.402
2025-05-11 22:27:26,706 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1358.623, 2700.213, 520.4806, 689.85736, 1994.4386, 1058.297, 1010.7758, 566.6992, 2154.0645, 2260.958]
2025-05-11 22:27:26,706 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [600.0, 1000.0, 209.0, 260.0, 921.0, 492.0, 500.0, 344.0, 1000.0, 1000.0]
2025-05-11 22:27:26,706 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (1431.44) for latency ExtremeClogL1U23
2025-05-11 22:27:26,707 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 22:27:26,711 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 22:27:26,722 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 32 minutes, 38 seconds)
2025-05-11 22:30:06,215 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 22:30:13,460 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1238.60791 ± 1047.881
2025-05-11 22:30:13,461 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2306.4207, 1969.0676, 2263.9978, 2112.5322, 244.60564, 2679.797, 307.5423, 29.110785, 26.358208, 446.64633]
2025-05-11 22:30:13,461 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 745.0, 1000.0, 971.0, 130.0, 1000.0, 193.0, 56.0, 38.0, 221.0]
2025-05-11 22:30:13,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 27 minutes, 41 seconds)
2025-05-11 22:32:59,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 22:33:08,787 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1315.78052 ± 922.094
2025-05-11 22:33:08,787 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1754.5024, 2021.7302, 2227.9224, 107.34838, 2255.6182, -4.4886928, 21.908216, 1945.842, 780.1685, 2047.2537]
2025-05-11 22:33:08,787 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [755.0, 911.0, 1000.0, 120.0, 1000.0, 56.0, 33.0, 1000.0, 551.0, 1000.0]
2025-05-11 22:33:08,794 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 26 minutes, 3 seconds)
2025-05-11 22:36:01,088 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 22:36:07,238 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1119.76050 ± 683.425
2025-05-11 22:36:07,238 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1192.143, 613.85754, 31.742947, 2087.0442, 1142.605, 1015.44354, 244.97856, 2253.0403, 1589.4276, 1027.3213]
2025-05-11 22:36:07,238 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [515.0, 234.0, 41.0, 894.0, 493.0, 383.0, 161.0, 919.0, 652.0, 375.0]
2025-05-11 22:36:07,245 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 22 minutes, 34 seconds)
2025-05-11 22:38:53,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 22:39:03,353 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1320.35278 ± 897.327
2025-05-11 22:39:03,354 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2358.6436, 150.90492, 2118.3743, 2380.3574, 1658.848, 793.5302, 2125.2107, 52.405754, 225.87682, 1339.3762]
2025-05-11 22:39:03,354 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 168.0, 914.0, 1000.0, 640.0, 378.0, 877.0, 63.0, 286.0, 663.0]
2025-05-11 22:39:03,363 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 20 minutes, 13 seconds)
2025-05-11 22:42:08,530 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 22:42:16,423 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1464.20386 ± 948.840
2025-05-11 22:42:16,423 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2639.718, 2714.0334, 287.88498, 1390.7207, 810.9495, 2057.1167, 2803.6553, 821.5056, 411.79648, 704.65784]
2025-05-11 22:42:16,423 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 212.0, 584.0, 292.0, 1000.0, 1000.0, 350.0, 194.0, 255.0]
2025-05-11 22:42:16,424 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (1464.20) for latency ExtremeClogL1U23
2025-05-11 22:42:16,424 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 22:42:16,428 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 22:42:16,439 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 19 minutes, 23 seconds)
2025-05-11 22:45:25,043 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 22:45:33,934 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1180.10779 ± 849.346
2025-05-11 22:45:33,934 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2299.3657, 2213.0864, 1383.2966, 2036.8918, 30.881159, 667.1025, 1921.2311, 629.95276, 20.266418, 599.0033]
2025-05-11 22:45:33,934 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 913.0, 567.0, 728.0, 36.0, 393.0, 1000.0, 282.0, 38.0, 274.0]
2025-05-11 22:45:33,945 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 21 minutes, 8 seconds)
2025-05-11 22:48:25,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 22:48:34,257 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1591.51831 ± 868.586
2025-05-11 22:48:34,258 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2723.4277, 2542.3794, 1944.9669, 1672.0098, 1017.97955, 820.4646, 35.81544, 1207.5251, 1126.2079, 2824.4075]
2025-05-11 22:48:34,258 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 869.0, 1000.0, 643.0, 530.0, 359.0, 56.0, 465.0, 445.0, 1000.0]
2025-05-11 22:48:34,258 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (1591.52) for latency ExtremeClogL1U23
2025-05-11 22:48:34,258 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 22:48:34,262 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 22:48:34,275 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 18 minutes, 49 seconds)
2025-05-11 22:51:32,582 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 22:51:38,750 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1132.95691 ± 782.892
2025-05-11 22:51:38,750 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1888.5417, 507.64795, 1479.9462, 2637.4795, 599.8801, 2044.1978, 720.3695, 848.0208, 434.98422, 168.50073]
2025-05-11 22:51:38,751 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [790.0, 225.0, 532.0, 1000.0, 228.0, 798.0, 362.0, 401.0, 212.0, 115.0]
2025-05-11 22:51:38,758 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 16 minutes, 37 seconds)
2025-05-11 22:54:32,490 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 22:54:37,892 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 954.12073 ± 823.664
2025-05-11 22:54:37,893 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [282.65732, 911.73834, 1185.3768, 580.07947, 13.932795, 359.35928, 2201.534, 40.129745, 2425.3865, 1541.0125]
2025-05-11 22:54:37,893 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [188.0, 416.0, 510.0, 245.0, 27.0, 182.0, 879.0, 49.0, 1000.0, 580.0]
2025-05-11 22:54:37,901 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 13 minutes, 57 seconds)
2025-05-11 22:57:18,534 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 22:57:27,918 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1627.17920 ± 854.769
2025-05-11 22:57:27,919 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1081.697, -21.347595, 2344.7148, 1892.7853, 2501.9202, 675.18475, 1554.3639, 2545.785, 1111.0262, 2585.6624]
2025-05-11 22:57:27,919 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [473.0, 78.0, 1000.0, 805.0, 1000.0, 337.0, 581.0, 1000.0, 482.0, 1000.0]
2025-05-11 22:57:27,919 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (1627.18) for latency ExtremeClogL1U23
2025-05-11 22:57:27,919 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 22:57:27,923 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 22:57:27,937 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 7 minutes, 36 seconds)
2025-05-11 23:00:25,888 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:00:33,542 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1407.90698 ± 921.661
2025-05-11 23:00:33,542 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1240.5471, 2507.5215, 523.7842, 2623.4219, 1068.2103, 2579.137, 52.01276, 211.68993, 1374.3419, 1898.4036]
2025-05-11 23:00:33,542 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [453.0, 1000.0, 276.0, 1000.0, 409.0, 1000.0, 63.0, 121.0, 549.0, 720.0]
2025-05-11 23:00:33,549 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 2 minutes, 56 seconds)
2025-05-11 23:03:15,717 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:03:19,832 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 805.07483 ± 364.242
2025-05-11 23:03:19,832 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [951.7584, 566.7055, 643.3451, 635.6437, 854.15704, 363.42297, 1394.5548, 1525.3556, 496.34103, 619.465]
2025-05-11 23:03:19,832 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [368.0, 281.0, 249.0, 278.0, 325.0, 180.0, 551.0, 628.0, 249.0, 245.0]
2025-05-11 23:03:19,841 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 58 minutes, 4 seconds)
2025-05-11 23:06:15,570 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:06:23,065 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1298.20081 ± 919.883
2025-05-11 23:06:23,065 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2442.0625, 975.608, 903.06757, 428.31296, 1435.8348, 2.8444753, 165.89648, 2461.7332, 1549.0614, 2617.587]
2025-05-11 23:06:23,066 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 451.0, 383.0, 206.0, 613.0, 12.0, 117.0, 1000.0, 676.0, 1000.0]
2025-05-11 23:06:23,073 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 54 minutes, 57 seconds)
2025-05-11 23:09:07,863 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:09:14,867 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1336.14673 ± 868.733
2025-05-11 23:09:14,867 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [972.89307, 696.8089, 2620.9363, 831.22766, 2021.2692, 10.544457, 2584.3606, 517.9532, 1027.0344, 2078.4387]
2025-05-11 23:09:14,867 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [383.0, 289.0, 987.0, 307.0, 815.0, 21.0, 1000.0, 252.0, 401.0, 740.0]
2025-05-11 23:09:14,876 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 51 minutes, 5 seconds)
2025-05-11 23:11:54,965 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:12:01,875 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1329.81628 ± 879.138
2025-05-11 23:12:01,876 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1632.5929, 422.88123, 2644.4622, 1206.0864, 2750.2676, 199.25935, 1184.0944, 1582.8296, 61.358917, 1614.3314]
2025-05-11 23:12:01,876 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [582.0, 220.0, 958.0, 474.0, 1000.0, 106.0, 503.0, 583.0, 109.0, 772.0]
2025-05-11 23:12:01,885 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 47 minutes, 47 seconds)
2025-05-11 23:14:55,574 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:15:03,676 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1448.99976 ± 957.784
2025-05-11 23:15:03,677 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2612.251, 1956.2521, 2062.1692, 1359.2793, 1427.3025, 2.3103878, 232.40472, 2163.072, 98.374695, 2576.5825]
2025-05-11 23:15:03,677 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 759.0, 757.0, 501.0, 540.0, 15.0, 400.0, 788.0, 169.0, 1000.0]
2025-05-11 23:15:03,686 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 44 minutes, 24 seconds)
2025-05-11 23:17:44,102 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:17:53,363 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1749.17737 ± 940.226
2025-05-11 23:17:53,363 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2717.7642, 2594.663, 851.3659, 1540.7448, 2564.999, 23.348864, 675.74896, 2492.9072, 2665.8298, 1364.4004]
2025-05-11 23:17:53,363 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [972.0, 1000.0, 395.0, 619.0, 1000.0, 39.0, 308.0, 1000.0, 1000.0, 538.0]
2025-05-11 23:17:53,363 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (1749.18) for latency ExtremeClogL1U23
2025-05-11 23:17:53,363 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-11 23:17:53,368 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 23:17:53,380 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 41 minutes, 54 seconds)
2025-05-11 23:20:46,046 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:20:51,586 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1087.40649 ± 635.843
2025-05-11 23:20:51,586 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [718.62415, 964.69135, 917.3264, 2709.0806, 1125.1746, 745.37665, 81.182175, 1011.4332, 1247.7208, 1353.4562]
2025-05-11 23:20:51,586 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [322.0, 370.0, 369.0, 938.0, 405.0, 328.0, 145.0, 427.0, 500.0, 529.0]
2025-05-11 23:20:51,594 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 38 minutes, 25 seconds)
2025-05-11 23:23:47,423 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:23:56,068 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1673.47131 ± 825.489
2025-05-11 23:23:56,068 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [799.1653, 1188.4963, 1067.8866, 2653.3694, 1109.6316, 2560.5786, 2627.0735, 581.8656, 2766.932, 1379.7152]
2025-05-11 23:23:56,068 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [268.0, 425.0, 449.0, 1000.0, 421.0, 1000.0, 1000.0, 275.0, 1000.0, 470.0]
2025-05-11 23:23:56,078 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 36 minutes, 55 seconds)
2025-05-11 23:26:34,654 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:26:40,659 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1170.97107 ± 784.775
2025-05-11 23:26:40,659 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2470.2334, 865.20764, 1628.0071, 1007.02905, 687.29297, 21.622501, 237.86263, 2475.5715, 1076.4286, 1240.4559]
2025-05-11 23:26:40,660 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 371.0, 563.0, 362.0, 302.0, 32.0, 170.0, 866.0, 419.0, 478.0]
2025-05-11 23:26:40,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 33 minutes, 44 seconds)
2025-05-11 23:29:36,528 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:29:44,374 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1495.16101 ± 1010.242
2025-05-11 23:29:44,374 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [684.99036, 289.7814, 1440.8264, 2662.8826, 2763.127, 2796.7273, 2093.4312, 59.44287, 503.7306, 1656.6707]
2025-05-11 23:29:44,374 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [301.0, 168.0, 529.0, 1000.0, 1000.0, 1000.0, 813.0, 74.0, 229.0, 604.0]
2025-05-11 23:29:44,383 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 31 minutes)
2025-05-11 23:32:35,819 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:32:42,107 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1179.34998 ± 788.698
2025-05-11 23:32:42,108 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1829.0476, 2237.901, 333.5937, 1130.2911, 2652.189, 472.0755, 732.3166, 662.5682, 1434.9972, 308.521]
2025-05-11 23:32:42,108 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [718.0, 810.0, 191.0, 460.0, 1000.0, 213.0, 311.0, 301.0, 534.0, 205.0]
2025-05-11 23:32:42,116 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 28 minutes, 52 seconds)
2025-05-11 23:35:31,316 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:35:34,436 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 505.83478 ± 381.409
2025-05-11 23:35:34,436 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1202.9325, 109.24043, 752.86945, 123.264114, 763.98157, 123.0855, 827.7265, 74.22657, 318.39972, 762.621]
2025-05-11 23:35:34,436 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [467.0, 173.0, 346.0, 173.0, 306.0, 178.0, 356.0, 90.0, 145.0, 350.0]
2025-05-11 23:35:34,446 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 25 minutes, 20 seconds)
2025-05-11 23:38:22,092 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:38:28,516 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1174.05969 ± 1016.457
2025-05-11 23:38:28,516 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [660.05, 157.71988, 178.66475, 147.09708, 1632.2736, 2567.4177, 2549.2263, 2618.046, 271.05545, 959.0446]
2025-05-11 23:38:28,516 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [341.0, 182.0, 95.0, 160.0, 615.0, 1000.0, 1000.0, 989.0, 153.0, 375.0]
2025-05-11 23:38:28,526 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 21 minutes, 25 seconds)
2025-05-11 23:41:05,502 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:41:13,644 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1680.83496 ± 849.461
2025-05-11 23:41:13,644 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2697.3906, 2883.3262, 1471.8812, 873.6811, 1441.542, 2403.8403, 2676.8076, 834.2481, 579.6198, 946.0124]
2025-05-11 23:41:13,644 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 545.0, 387.0, 535.0, 939.0, 1000.0, 319.0, 250.0, 411.0]
2025-05-11 23:41:13,654 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 18 minutes, 34 seconds)
2025-05-11 23:43:56,430 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:44:01,422 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 946.66248 ± 610.784
2025-05-11 23:44:01,422 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [754.78625, 1661.6649, 919.863, 451.6008, 937.17224, 888.5037, 352.47742, 22.389404, 1254.289, 2223.8782]
2025-05-11 23:44:01,422 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [304.0, 707.0, 374.0, 229.0, 401.0, 357.0, 203.0, 39.0, 545.0, 877.0]
2025-05-11 23:44:01,431 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 14 minutes, 16 seconds)
2025-05-11 23:46:54,867 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:47:01,179 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1139.65698 ± 842.706
2025-05-11 23:47:01,179 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1196.3654, 2610.4966, 1682.4371, 1630.3815, 525.8816, 1020.52496, 2215.435, 208.63556, 291.67108, 14.739865]
2025-05-11 23:47:01,179 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [571.0, 1000.0, 641.0, 663.0, 206.0, 479.0, 991.0, 113.0, 163.0, 27.0]
2025-05-11 23:47:01,189 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 11 minutes, 35 seconds)
2025-05-11 23:49:46,856 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:49:52,954 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1151.05957 ± 814.084
2025-05-11 23:49:52,954 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1693.2781, 370.62952, 1006.7032, 2520.1633, 994.71765, -11.889547, 1039.8035, 2394.7363, 1286.5928, 215.85988]
2025-05-11 23:49:52,954 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [609.0, 212.0, 510.0, 949.0, 395.0, 47.0, 430.0, 897.0, 605.0, 117.0]
2025-05-11 23:49:52,964 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 8 minutes, 40 seconds)
2025-05-11 23:52:31,467 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:52:39,403 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1603.28503 ± 799.107
2025-05-11 23:52:39,404 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2458.8535, 1636.178, 443.0687, 1361.1399, 2823.805, 2655.8323, 934.22906, 907.70386, 1972.186, 839.8542]
2025-05-11 23:52:39,404 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [957.0, 570.0, 211.0, 517.0, 1000.0, 1000.0, 335.0, 368.0, 777.0, 333.0]
2025-05-11 23:52:39,414 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 5 minutes, 14 seconds)
2025-05-11 23:55:28,230 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:55:34,624 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1189.24854 ± 961.501
2025-05-11 23:55:34,624 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [284.34464, 588.56525, 1692.4099, 1706.7401, 2628.5588, 250.28679, 1918.3232, 2542.8647, 254.4263, 25.965033]
2025-05-11 23:55:34,624 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [181.0, 234.0, 672.0, 670.0, 1000.0, 146.0, 760.0, 1000.0, 154.0, 48.0]
2025-05-11 23:55:34,634 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 3 minutes, 8 seconds)
2025-05-11 23:58:14,254 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 23:58:22,830 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1648.51404 ± 934.890
2025-05-11 23:58:22,830 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1890.5388, 967.3682, 2769.085, 199.75072, 46.51322, 2280.5972, 2499.817, 2387.8848, 1167.2642, 2276.3213]
2025-05-11 23:58:22,830 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [649.0, 366.0, 1000.0, 217.0, 69.0, 1000.0, 1000.0, 840.0, 443.0, 863.0]
2025-05-11 23:58:22,841 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 17 seconds)
2025-05-12 00:01:06,711 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 00:01:12,674 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1291.45605 ± 826.769
2025-05-12 00:01:12,674 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [205.78468, 892.4836, 2530.063, 1364.6133, 1872.385, 867.82007, 1775.3284, 359.4656, 2579.4597, 467.15695]
2025-05-12 00:01:12,674 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [106.0, 336.0, 785.0, 505.0, 642.0, 412.0, 605.0, 152.0, 944.0, 231.0]
2025-05-12 00:01:12,685 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 56 minutes, 45 seconds)
2025-05-12 00:04:09,895 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 00:04:16,170 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1314.50952 ± 716.958
2025-05-12 00:04:16,170 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1708.0767, 360.96686, 1600.9861, 1812.1283, 400.01523, 732.3272, 2746.2363, 1446.1165, 1649.6367, 688.606]
2025-05-12 00:04:16,170 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [601.0, 182.0, 612.0, 697.0, 170.0, 305.0, 1000.0, 513.0, 630.0, 266.0]
2025-05-12 00:04:16,181 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 54 minutes, 40 seconds)
2025-05-12 00:07:03,434 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 00:07:11,295 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1667.37329 ± 757.089
2025-05-12 00:07:11,295 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2896.99, 1699.7301, 416.44153, 1240.6263, 2879.3115, 1970.347, 1227.9172, 830.19714, 1728.0254, 1784.1462]
2025-05-12 00:07:11,295 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 597.0, 176.0, 483.0, 1000.0, 728.0, 424.0, 314.0, 661.0, 696.0]
2025-05-12 00:07:11,306 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 52 minutes, 18 seconds)
2025-05-12 00:09:43,720 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 00:09:49,942 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1292.28162 ± 613.435
2025-05-12 00:09:49,942 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1510.223, 2126.4424, 645.25934, 905.64636, 1968.6239, 2308.968, 1249.7137, 876.9463, 650.4775, 680.5159]
2025-05-12 00:09:49,943 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [521.0, 808.0, 268.0, 403.0, 690.0, 822.0, 548.0, 357.0, 318.0, 258.0]
2025-05-12 00:09:49,954 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 48 minutes, 28 seconds)
2025-05-12 00:12:35,610 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 00:12:41,132 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1039.90222 ± 858.550
2025-05-12 00:12:41,132 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1085.5234, 563.929, 2567.6672, 1271.9788, 2546.5369, 837.3004, 426.51318, 1053.0424, 26.070503, 20.459904]
2025-05-12 00:12:41,133 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [435.0, 277.0, 1000.0, 543.0, 1000.0, 294.0, 216.0, 461.0, 41.0, 31.0]
2025-05-12 00:12:41,143 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 45 minutes, 46 seconds)
2025-05-12 00:15:34,643 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 00:15:42,875 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1587.34827 ± 1004.461
2025-05-12 00:15:42,875 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2592.4937, 2649.044, 1630.0913, 29.49562, 1082.874, 746.1024, 34.435036, 2667.134, 1920.3107, 2521.5024]
2025-05-12 00:15:42,875 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [950.0, 1000.0, 601.0, 44.0, 442.0, 345.0, 50.0, 1000.0, 723.0, 1000.0]
2025-05-12 00:15:42,886 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 43 minutes, 30 seconds)
2025-05-12 00:18:17,372 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 00:18:24,689 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1488.90649 ± 822.514
2025-05-12 00:18:24,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [715.99365, 1047.8535, 605.01965, 2734.2634, 2062.058, 624.60284, 715.65643, 2176.679, 1495.4563, 2711.4812]
2025-05-12 00:18:24,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [307.0, 378.0, 243.0, 1000.0, 731.0, 273.0, 335.0, 824.0, 689.0, 882.0]
2025-05-12 00:18:24,701 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 39 minutes, 35 seconds)
2025-05-12 00:21:11,391 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 00:21:20,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1850.87830 ± 979.036
2025-05-12 00:21:20,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [964.1666, 2727.9868, 210.17256, 1221.1184, 2769.9153, 2736.4385, 595.32837, 2701.8953, 1732.1587, 2849.603]
2025-05-12 00:21:20,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [382.0, 1000.0, 113.0, 447.0, 1000.0, 1000.0, 218.0, 1000.0, 627.0, 963.0]
2025-05-12 00:21:20,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (1850.88) for latency ExtremeClogL1U23
2025-05-12 00:21:20,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-12 00:21:20,472 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-12 00:21:20,488 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 36 minutes, 47 seconds)
2025-05-12 00:24:18,049 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 00:24:25,915 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1543.84827 ± 897.675
2025-05-12 00:24:25,915 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2117.4521, 2423.4775, 836.2921, 2750.6082, 2706.552, 869.86993, 1843.125, 790.70013, 20.606606, 1079.7985]
2025-05-12 00:24:25,915 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [832.0, 842.0, 298.0, 1000.0, 1000.0, 357.0, 598.0, 314.0, 32.0, 499.0]
2025-05-12 00:24:25,927 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 35 minutes, 2 seconds)
2025-05-12 00:27:06,384 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 00:27:11,161 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 819.35663 ± 669.471
2025-05-12 00:27:11,161 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [21.494469, 1807.093, -27.785301, 1145.1892, 526.3936, 175.9627, 1884.1582, 1317.0658, 425.86798, 918.12646]
2025-05-12 00:27:11,161 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [29.0, 655.0, 96.0, 523.0, 223.0, 105.0, 623.0, 498.0, 168.0, 421.0]
2025-05-12 00:27:11,175 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 31 minutes, 54 seconds)
2025-05-12 00:30:05,077 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 00:30:11,239 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1157.02686 ± 780.118
2025-05-12 00:30:11,240 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2441.0303, 2580.5776, 477.27402, 562.3247, 1771.0941, 370.52255, 548.07404, 1060.3809, 699.132, 1059.8583]
2025-05-12 00:30:11,240 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 252.0, 243.0, 632.0, 169.0, 210.0, 386.0, 338.0, 424.0]
2025-05-12 00:30:11,251 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 28 minutes, 56 seconds)
2025-05-12 00:32:55,143 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 00:33:02,979 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1527.71143 ± 1003.821
2025-05-12 00:33:02,979 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [865.16943, 2635.1807, 454.05338, 1382.0771, 2887.688, 2667.0793, 448.43073, 435.35437, 2654.2642, 847.8172]
2025-05-12 00:33:02,979 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [341.0, 926.0, 187.0, 521.0, 996.0, 1000.0, 192.0, 196.0, 1000.0, 338.0]
2025-05-12 00:33:02,991 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 26 minutes, 20 seconds)
2025-05-12 00:35:54,717 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 00:36:00,839 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 911.78693 ± 819.505
2025-05-12 00:36:00,839 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1065.8351, 44.87246, 683.513, 2611.1072, 129.0172, 106.020805, 746.9274, 2185.5554, 613.8235, 931.197]
2025-05-12 00:36:00,839 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 75.0, 271.0, 1000.0, 178.0, 167.0, 330.0, 811.0, 291.0, 400.0]
2025-05-12 00:36:00,851 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 23 minutes, 28 seconds)
2025-05-12 00:38:54,081 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 00:39:00,563 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1184.53833 ± 1042.009
2025-05-12 00:39:00,563 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2687.4885, 530.10126, 2700.2463, 2692.9033, 972.6447, 1174.7439, 118.4018, 39.09257, 279.61386, 650.14655]
2025-05-12 00:39:00,563 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 263.0, 1000.0, 1000.0, 428.0, 484.0, 113.0, 55.0, 170.0, 280.0]
2025-05-12 00:39:00,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 20 minutes, 24 seconds)
2025-05-12 00:41:51,949 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 00:42:03,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 2161.67700 ± 842.907
2025-05-12 00:42:03,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2804.2766, 2743.477, 1262.7251, 2733.2249, 2021.5596, 2736.4497, 2593.3862, 2022.0283, 56.64188, 2642.9976]
2025-05-12 00:42:03,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 480.0, 1000.0, 1000.0, 1000.0, 976.0, 756.0, 90.0, 1000.0]
2025-05-12 00:42:03,798 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1226 [INFO]: New best (2161.68) for latency ExtremeClogL1U23
2025-05-12 00:42:03,798 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1229 [INFO]: saving network
2025-05-12 00:42:03,802 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-12 00:42:03,819 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 17 minutes, 51 seconds)
2025-05-12 00:44:42,530 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 00:44:51,325 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1576.88538 ± 999.517
2025-05-12 00:44:51,325 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [362.19565, 1382.9248, 1884.7135, 119.89922, 1782.7004, 26.758417, 2602.7485, 2495.529, 2524.1074, 2587.2764]
2025-05-12 00:44:51,325 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [224.0, 550.0, 712.0, 136.0, 659.0, 42.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-12 00:44:51,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 14 minutes, 40 seconds)
2025-05-12 00:47:39,393 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 00:47:48,813 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1923.33813 ± 901.815
2025-05-12 00:47:48,813 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1367.0503, 2922.5303, 2830.16, 631.81757, 2593.5488, 2777.0322, 564.3899, 1895.0458, 1022.78064, 2629.0254]
2025-05-12 00:47:48,813 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [492.0, 917.0, 1000.0, 253.0, 898.0, 968.0, 267.0, 659.0, 379.0, 1000.0]
2025-05-12 00:47:48,827 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 11 minutes, 48 seconds)
2025-05-12 00:50:32,753 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 00:50:41,662 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1749.33435 ± 902.338
2025-05-12 00:50:41,662 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2786.9714, 1125.1512, 884.45306, 1334.7428, 2844.757, 1664.0852, 2731.265, 1151.1643, 242.9227, 2727.8313]
2025-05-12 00:50:41,662 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 491.0, 345.0, 526.0, 1000.0, 593.0, 1000.0, 394.0, 130.0, 1000.0]
2025-05-12 00:50:41,675 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 8 minutes, 48 seconds)
2025-05-12 00:53:30,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 00:53:38,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1701.61462 ± 758.917
2025-05-12 00:53:38,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2660.0134, 2629.1543, 706.8614, 2632.309, 1751.7994, 1381.6537, 1996.652, 1476.5638, 358.2208, 1422.9182]
2025-05-12 00:53:38,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [896.0, 1000.0, 343.0, 1000.0, 721.0, 556.0, 748.0, 592.0, 159.0, 515.0]
2025-05-12 00:53:39,001 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 5 minutes, 51 seconds)
2025-05-12 00:56:23,937 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 00:56:32,703 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1707.54175 ± 891.637
2025-05-12 00:56:32,703 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [1742.75, 1763.4102, 1774.0176, 2554.0125, 265.9004, 2526.633, 192.2102, 1008.91046, 2545.3252, 2702.2478]
2025-05-12 00:56:32,703 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [688.0, 633.0, 640.0, 1000.0, 160.0, 878.0, 153.0, 469.0, 1000.0, 1000.0]
2025-05-12 00:56:32,716 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 53 seconds)
2025-05-12 00:59:18,358 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-12 00:59:27,315 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1221 [DEBUG]: Total Reward: 1634.15674 ± 886.245
2025-05-12 00:59:27,315 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1222 [DEBUG]: All rewards: [2501.5056, 443.19537, 2319.1318, 517.2582, 1388.586, 2614.488, 2513.837, 712.1006, 2478.0098, 853.455]
2025-05-12 00:59:27,316 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 188.0, 935.0, 257.0, 621.0, 1000.0, 1000.0, 345.0, 1000.0, 364.0]
2025-05-12 00:59:27,328 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1251 [DEBUG]: Training session finished
