2025-05-11 03:19:48,049 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-sac-aug-mem2
2025-05-11 03:19:48,049 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-sac-aug-mem2
2025-05-11 03:19:48,049 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x7c4f04bcc3d0>}
2025-05-11 03:19:48,049 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1111 [DEBUG]: using device: cpu
2025-05-11 03:19:48,049 baseline-sac-noisy-humanoid:77 [WARNING]: args.memorize_actions != args.horizon: 2 != 24
2025-05-11 03:19:48,071 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1133 [INFO]: Creating new trainer
2025-05-11 03:19:48,084 baseline-sac-noisy-humanoid:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=410, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-05-11 03:19:48,084 baseline-sac-noisy-humanoid:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=427, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-11 03:19:48,767 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1194 [DEBUG]: Starting training session...
2025-05-11 03:19:48,767 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 1/100
2025-05-11 03:23:22,799 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 03:23:23,846 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 225.59029 ± 30.918
2025-05-11 03:23:23,846 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [199.54823, 219.52179, 207.04875, 229.5232, 214.0019, 227.48172, 209.97311, 314.1957, 224.88953, 209.71893]
2025-05-11 03:23:23,846 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [39.0, 43.0, 41.0, 45.0, 42.0, 45.0, 41.0, 61.0, 44.0, 41.0]
2025-05-11 03:23:23,846 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (225.59) for latency ExtremeClogL1U23
2025-05-11 03:23:23,847 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 03:23:23,851 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 03:23:23,859 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 5 hours, 54 minutes, 54 seconds)
2025-05-11 03:27:30,150 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 03:27:31,115 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 257.21518 ± 94.558
2025-05-11 03:27:31,115 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [212.93759, 234.8569, 226.16095, 335.17502, 196.43759, 227.58134, 416.65393, 256.16516, 389.21964, 76.96365]
2025-05-11 03:27:31,115 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [43.0, 47.0, 45.0, 67.0, 41.0, 47.0, 86.0, 51.0, 87.0, 16.0]
2025-05-11 03:27:31,115 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (257.22) for latency ExtremeClogL1U23
2025-05-11 03:27:31,115 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 03:27:31,120 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 03:27:31,129 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 6 hours, 17 minutes, 35 seconds)
2025-05-11 03:31:37,912 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 03:31:38,977 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 269.96484 ± 100.481
2025-05-11 03:31:38,977 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [346.9002, 243.08743, 390.33188, 413.389, 339.4324, 328.54446, 132.40518, 181.94466, 183.48227, 140.131]
2025-05-11 03:31:38,977 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 51.0, 77.0, 82.0, 68.0, 71.0, 28.0, 35.0, 36.0, 27.0]
2025-05-11 03:31:38,978 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (269.96) for latency ExtremeClogL1U23
2025-05-11 03:31:38,978 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 03:31:38,982 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 03:31:38,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 6 hours, 22 minutes, 43 seconds)
2025-05-11 03:35:50,632 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 03:35:52,227 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 387.34491 ± 150.299
2025-05-11 03:35:52,227 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [657.8333, 461.346, 387.12177, 508.97922, 266.3398, 307.86627, 460.5516, 73.24014, 315.15933, 435.0115]
2025-05-11 03:35:52,227 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [125.0, 89.0, 86.0, 101.0, 57.0, 60.0, 85.0, 15.0, 67.0, 82.0]
2025-05-11 03:35:52,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (387.34) for latency ExtremeClogL1U23
2025-05-11 03:35:52,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 03:35:52,231 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 03:35:52,267 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 6 hours, 25 minutes, 24 seconds)
2025-05-11 03:40:07,594 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 03:40:09,201 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 406.05078 ± 136.808
2025-05-11 03:40:09,201 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [373.81192, 456.9051, 252.75467, 730.80865, 474.28226, 363.80862, 409.45688, 186.68933, 406.60864, 405.3816]
2025-05-11 03:40:09,201 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 87.0, 47.0, 139.0, 90.0, 75.0, 80.0, 36.0, 78.0, 85.0]
2025-05-11 03:40:09,201 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (406.05) for latency ExtremeClogL1U23
2025-05-11 03:40:09,202 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 03:40:09,206 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 03:40:09,247 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 6 hours, 26 minutes, 29 seconds)
2025-05-11 03:44:22,984 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 03:44:24,467 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 371.20474 ± 90.102
2025-05-11 03:44:24,467 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [430.8201, 366.99503, 375.65146, 176.26886, 516.08624, 426.12225, 441.91354, 367.5607, 336.05908, 274.57037]
2025-05-11 03:44:24,467 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 81.0, 72.0, 34.0, 99.0, 80.0, 93.0, 72.0, 69.0, 53.0]
2025-05-11 03:44:24,469 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 6 hours, 34 minutes, 59 seconds)
2025-05-11 03:48:37,223 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 03:48:38,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 432.57431 ± 98.725
2025-05-11 03:48:38,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [313.28925, 475.31308, 403.0693, 477.6745, 531.9873, 258.34247, 470.77148, 469.14902, 592.7069, 333.43994]
2025-05-11 03:48:38,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [68.0, 103.0, 74.0, 91.0, 103.0, 51.0, 88.0, 89.0, 118.0, 68.0]
2025-05-11 03:48:38,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (432.57) for latency ExtremeClogL1U23
2025-05-11 03:48:38,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 03:48:38,995 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 03:48:39,005 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 6 hours, 33 minutes, 2 seconds)
2025-05-11 03:52:55,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 03:52:56,826 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 455.64853 ± 109.281
2025-05-11 03:52:56,826 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [435.82776, 599.6663, 493.5682, 557.6499, 466.47083, 381.7918, 469.51624, 188.08359, 544.037, 419.87363]
2025-05-11 03:52:56,826 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 111.0, 97.0, 104.0, 85.0, 72.0, 87.0, 38.0, 101.0, 76.0]
2025-05-11 03:52:56,826 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (455.65) for latency ExtremeClogL1U23
2025-05-11 03:52:56,827 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 03:52:56,831 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 03:52:56,841 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 6 hours, 31 minutes, 52 seconds)
2025-05-11 03:57:15,737 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 03:57:17,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 348.11298 ± 75.400
2025-05-11 03:57:17,095 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [216.7269, 416.29932, 340.29272, 376.5882, 366.7448, 450.33826, 398.707, 376.3746, 331.0168, 208.04095]
2025-05-11 03:57:17,095 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [50.0, 89.0, 65.0, 70.0, 69.0, 93.0, 74.0, 70.0, 66.0, 43.0]
2025-05-11 03:57:17,097 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 6 hours, 29 minutes, 43 seconds)
2025-05-11 04:01:35,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 04:01:37,200 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 399.60934 ± 81.604
2025-05-11 04:01:37,200 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [375.41205, 584.91693, 458.35602, 365.6381, 352.49567, 296.8719, 418.16025, 347.85126, 323.15964, 473.23172]
2025-05-11 04:01:37,200 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 108.0, 97.0, 68.0, 67.0, 57.0, 85.0, 65.0, 60.0, 86.0]
2025-05-11 04:01:37,203 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 6 hours, 26 minutes, 23 seconds)
2025-05-11 04:05:53,252 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 04:05:54,823 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 398.46503 ± 83.163
2025-05-11 04:05:54,823 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [393.40356, 548.9627, 474.39883, 421.37415, 434.34604, 446.88403, 388.18936, 329.9533, 256.75354, 290.38458]
2025-05-11 04:05:54,824 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 110.0, 102.0, 76.0, 82.0, 81.0, 77.0, 59.0, 53.0, 57.0]
2025-05-11 04:05:54,827 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 6 hours, 22 minutes, 48 seconds)
2025-05-11 04:10:12,400 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 04:10:13,749 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 345.45325 ± 20.368
2025-05-11 04:10:13,749 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [322.3183, 317.60947, 357.11206, 344.4192, 342.80368, 337.5576, 387.16428, 345.8835, 370.4882, 329.1761]
2025-05-11 04:10:13,750 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [58.0, 59.0, 66.0, 63.0, 63.0, 62.0, 72.0, 62.0, 68.0, 61.0]
2025-05-11 04:10:13,752 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 6 hours, 19 minutes, 47 seconds)
2025-05-11 04:14:36,853 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 04:14:38,298 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 369.33591 ± 92.533
2025-05-11 04:14:38,298 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [380.18152, 400.5208, 438.49533, 379.69406, 356.4352, 480.663, 326.91162, 409.63544, 401.7095, 119.11249]
2025-05-11 04:14:38,298 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 74.0, 78.0, 71.0, 66.0, 90.0, 63.0, 77.0, 82.0, 22.0]
2025-05-11 04:14:38,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 6 hours, 17 minutes, 25 seconds)
2025-05-11 04:19:02,115 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 04:19:03,527 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 359.90945 ± 100.710
2025-05-11 04:19:03,527 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [355.14682, 578.5758, 338.40652, 407.0944, 411.0048, 340.58084, 149.17856, 308.78802, 375.74496, 334.5737]
2025-05-11 04:19:03,527 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 117.0, 62.0, 75.0, 76.0, 65.0, 31.0, 58.0, 72.0, 62.0]
2025-05-11 04:19:03,530 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 6 hours, 14 minutes, 30 seconds)
2025-05-11 04:23:26,759 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 04:23:28,384 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 411.93100 ± 166.004
2025-05-11 04:23:28,384 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [73.28543, 328.86212, 537.1064, 748.02094, 365.24417, 335.16147, 518.3367, 427.0672, 443.2904, 342.93506]
2025-05-11 04:23:28,384 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [15.0, 69.0, 101.0, 148.0, 68.0, 67.0, 106.0, 79.0, 91.0, 64.0]
2025-05-11 04:23:28,389 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 6 hours, 11 minutes, 30 seconds)
2025-05-11 04:27:39,779 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 04:27:41,328 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 392.14508 ± 199.214
2025-05-11 04:27:41,328 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [353.33356, 73.44308, 454.8093, 156.4868, 391.1819, 606.75854, 656.4708, 344.3847, 677.96484, 206.61754]
2025-05-11 04:27:41,328 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 15.0, 84.0, 30.0, 75.0, 113.0, 131.0, 66.0, 143.0, 40.0]
2025-05-11 04:27:41,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 6 hours, 5 minutes, 49 seconds)
2025-05-11 04:31:56,772 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 04:31:58,282 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 383.07620 ± 77.315
2025-05-11 04:31:58,283 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [407.8669, 380.1843, 267.58212, 384.4372, 446.4659, 349.11282, 372.8586, 337.9443, 316.6174, 567.6923]
2025-05-11 04:31:58,283 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 70.0, 50.0, 70.0, 83.0, 63.0, 70.0, 62.0, 59.0, 107.0]
2025-05-11 04:31:58,286 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 6 hours, 55 seconds)
2025-05-11 04:36:19,157 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 04:36:20,553 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 354.08099 ± 106.701
2025-05-11 04:36:20,554 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [445.49533, 454.20105, 426.76962, 436.10248, 361.04782, 403.27457, 366.25424, 341.12076, 149.72484, 156.81937]
2025-05-11 04:36:20,554 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 84.0, 78.0, 79.0, 71.0, 78.0, 69.0, 64.0, 29.0, 30.0]
2025-05-11 04:36:20,558 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 5 hours, 55 minutes, 57 seconds)
2025-05-11 04:40:34,616 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 04:40:36,422 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 447.83734 ± 138.981
2025-05-11 04:40:36,422 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [375.52594, 432.42618, 198.11328, 619.7042, 395.34332, 738.15814, 493.79138, 400.83676, 442.99228, 381.482]
2025-05-11 04:40:36,422 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 81.0, 38.0, 118.0, 73.0, 146.0, 103.0, 73.0, 83.0, 72.0]
2025-05-11 04:40:36,425 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 5 hours, 49 minutes, 4 seconds)
2025-05-11 04:44:50,888 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 04:44:52,852 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 478.60962 ± 115.592
2025-05-11 04:44:52,852 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [752.0102, 579.0892, 503.79495, 379.97256, 436.5248, 432.60187, 321.8273, 383.98465, 487.62683, 508.66336]
2025-05-11 04:44:52,852 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [145.0, 117.0, 94.0, 72.0, 92.0, 84.0, 59.0, 72.0, 96.0, 95.0]
2025-05-11 04:44:52,852 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (478.61) for latency ExtremeClogL1U23
2025-05-11 04:44:52,853 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 04:44:52,856 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 04:44:52,866 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 5 hours, 42 minutes, 31 seconds)
2025-05-11 04:49:05,448 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 04:49:07,921 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 610.08221 ± 232.709
2025-05-11 04:49:07,921 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [157.0207, 439.97797, 468.60477, 745.76715, 712.64984, 1014.03815, 574.2273, 430.76077, 730.0189, 827.7561]
2025-05-11 04:49:07,921 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 82.0, 102.0, 147.0, 134.0, 191.0, 117.0, 81.0, 142.0, 163.0]
2025-05-11 04:49:07,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (610.08) for latency ExtremeClogL1U23
2025-05-11 04:49:07,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 04:49:07,926 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 04:49:07,936 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 5 hours, 38 minutes, 48 seconds)
2025-05-11 04:53:25,801 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 04:53:27,949 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 536.35071 ± 127.195
2025-05-11 04:53:27,949 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [659.2282, 434.99646, 506.47195, 415.53656, 579.19995, 386.4835, 531.0233, 390.0552, 730.4899, 730.0217]
2025-05-11 04:53:27,949 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [129.0, 81.0, 95.0, 76.0, 109.0, 72.0, 105.0, 74.0, 151.0, 137.0]
2025-05-11 04:53:27,953 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 5 hours, 35 minutes, 18 seconds)
2025-05-11 04:57:44,912 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 04:57:46,767 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 444.58041 ± 141.738
2025-05-11 04:57:46,767 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [515.9258, 385.6867, 383.99597, 425.75555, 695.10504, 423.4544, 621.5432, 144.06267, 462.0326, 388.24225]
2025-05-11 04:57:46,767 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 70.0, 71.0, 79.0, 132.0, 76.0, 119.0, 30.0, 88.0, 74.0]
2025-05-11 04:57:46,772 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 5 hours, 30 minutes, 7 seconds)
2025-05-11 05:02:13,595 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 05:02:15,470 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 418.02451 ± 223.629
2025-05-11 05:02:15,470 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [496.01642, 538.48096, 365.46487, 365.06152, 648.4828, 796.31573, 76.88499, 181.90843, 134.61563, 577.01355]
2025-05-11 05:02:15,470 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 106.0, 70.0, 81.0, 123.0, 159.0, 16.0, 35.0, 26.0, 109.0]
2025-05-11 05:02:15,475 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 5 hours, 29 minutes, 5 seconds)
2025-05-11 05:06:39,580 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 05:06:41,336 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 458.12924 ± 116.215
2025-05-11 05:06:41,336 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [545.8323, 399.0063, 393.11453, 171.22856, 628.7257, 498.4892, 532.07855, 457.55225, 491.26175, 464.0035]
2025-05-11 05:06:41,336 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [101.0, 74.0, 74.0, 33.0, 127.0, 92.0, 97.0, 85.0, 93.0, 86.0]
2025-05-11 05:06:41,340 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 5 hours, 27 minutes, 7 seconds)
2025-05-11 05:10:58,952 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 05:11:01,660 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 591.34900 ± 232.756
2025-05-11 05:11:01,661 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [500.6687, 390.3705, 595.9181, 1240.3438, 502.9992, 418.55164, 498.14825, 537.9182, 510.2046, 718.3672]
2025-05-11 05:11:01,661 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 80.0, 111.0, 262.0, 103.0, 85.0, 107.0, 101.0, 98.0, 138.0]
2025-05-11 05:11:01,666 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 5 hours, 24 minutes, 3 seconds)
2025-05-11 05:15:17,473 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 05:15:19,327 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 468.65640 ± 131.295
2025-05-11 05:15:19,327 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [495.83597, 151.4315, 430.18808, 588.6043, 561.4173, 513.859, 536.09424, 381.8287, 394.56976, 632.7347]
2025-05-11 05:15:19,327 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 29.0, 79.0, 116.0, 113.0, 98.0, 102.0, 72.0, 80.0, 121.0]
2025-05-11 05:15:19,332 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 5 hours, 19 minutes, 6 seconds)
2025-05-11 05:19:38,223 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 05:19:40,317 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 515.11816 ± 56.482
2025-05-11 05:19:40,317 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [421.12363, 515.91486, 518.6267, 452.29352, 576.2385, 561.39496, 599.3655, 457.90268, 487.4538, 560.8677]
2025-05-11 05:19:40,317 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 99.0, 97.0, 85.0, 107.0, 104.0, 128.0, 86.0, 96.0, 102.0]
2025-05-11 05:19:40,321 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 5 hours, 15 minutes, 15 seconds)
2025-05-11 05:24:01,863 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 05:24:04,113 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 513.26294 ± 220.937
2025-05-11 05:24:04,113 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [591.02216, 1055.1615, 162.19, 427.00266, 528.209, 417.84705, 472.8716, 666.73267, 413.7519, 397.84113]
2025-05-11 05:24:04,113 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [108.0, 217.0, 31.0, 80.0, 98.0, 78.0, 97.0, 124.0, 76.0, 77.0]
2025-05-11 05:24:04,145 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 5 hours, 9 minutes, 43 seconds)
2025-05-11 05:28:25,695 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 05:28:27,685 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 519.51300 ± 95.953
2025-05-11 05:28:27,685 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [716.28217, 477.08755, 520.46747, 473.00287, 575.4139, 514.50275, 383.62466, 393.1074, 513.6886, 627.9529]
2025-05-11 05:28:27,685 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 88.0, 96.0, 87.0, 106.0, 97.0, 70.0, 72.0, 96.0, 118.0]
2025-05-11 05:28:27,689 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 5 hours, 4 minutes, 48 seconds)
2025-05-11 05:32:39,102 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 05:32:41,594 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 583.95837 ± 179.636
2025-05-11 05:32:41,594 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [441.40622, 676.9737, 581.9003, 614.1454, 605.09204, 948.2933, 720.18475, 551.03375, 226.26202, 474.2919]
2025-05-11 05:32:41,595 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 137.0, 119.0, 134.0, 113.0, 184.0, 153.0, 111.0, 44.0, 86.0]
2025-05-11 05:32:41,599 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 4 hours, 58 minutes, 59 seconds)
2025-05-11 05:36:54,121 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 05:36:56,352 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 522.31335 ± 137.064
2025-05-11 05:36:56,352 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [483.9749, 494.3326, 617.49426, 564.8693, 580.1185, 557.3827, 572.4386, 509.83844, 150.13031, 692.5544]
2025-05-11 05:36:56,352 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 93.0, 126.0, 110.0, 127.0, 104.0, 112.0, 112.0, 29.0, 131.0]
2025-05-11 05:36:56,357 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 53 minutes, 59 seconds)
2025-05-11 05:41:08,512 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 05:41:10,733 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 540.69452 ± 120.859
2025-05-11 05:41:10,734 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [572.01514, 836.1022, 641.58527, 492.90823, 464.85815, 505.06036, 374.27954, 442.7567, 561.54834, 515.83136]
2025-05-11 05:41:10,734 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 171.0, 130.0, 92.0, 84.0, 95.0, 73.0, 81.0, 110.0, 93.0]
2025-05-11 05:41:10,739 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 4 hours, 48 minutes, 11 seconds)
2025-05-11 05:45:25,496 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 05:45:27,919 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 569.50220 ± 64.716
2025-05-11 05:45:27,919 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [581.3463, 543.9881, 642.1577, 552.8075, 528.99896, 547.45966, 478.48062, 488.87207, 675.62463, 655.28595]
2025-05-11 05:45:27,920 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 119.0, 119.0, 115.0, 98.0, 114.0, 86.0, 92.0, 126.0, 121.0]
2025-05-11 05:45:27,925 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 4 hours, 42 minutes, 25 seconds)
2025-05-11 05:49:47,821 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 05:49:50,260 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 581.08881 ± 227.558
2025-05-11 05:49:50,260 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [794.38385, 746.54724, 78.06713, 443.92233, 547.56775, 594.157, 361.53613, 745.97076, 890.4186, 608.31726]
2025-05-11 05:49:50,260 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [149.0, 144.0, 16.0, 81.0, 111.0, 110.0, 69.0, 140.0, 168.0, 115.0]
2025-05-11 05:49:50,266 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 4 hours, 37 minutes, 53 seconds)
2025-05-11 05:54:04,204 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 05:54:06,534 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 557.99524 ± 194.671
2025-05-11 05:54:06,534 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [395.4782, 749.79694, 688.0417, 665.215, 145.15842, 492.77243, 744.7289, 340.256, 657.0521, 701.45294]
2025-05-11 05:54:06,535 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 142.0, 130.0, 126.0, 28.0, 91.0, 143.0, 60.0, 130.0, 138.0]
2025-05-11 05:54:06,540 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 4 hours, 34 minutes, 7 seconds)
2025-05-11 05:58:26,993 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 05:58:28,732 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 453.31073 ± 167.555
2025-05-11 05:58:28,733 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [494.58282, 752.48816, 570.28296, 431.98395, 453.92517, 524.6779, 171.24463, 156.61392, 478.2054, 499.10242]
2025-05-11 05:58:28,733 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 155.0, 105.0, 79.0, 83.0, 95.0, 33.0, 30.0, 89.0, 91.0]
2025-05-11 05:58:28,739 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 4 hours, 31 minutes, 24 seconds)
2025-05-11 06:02:40,504 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 06:02:42,729 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 545.00397 ± 171.441
2025-05-11 06:02:42,729 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [490.7575, 662.15717, 619.3755, 166.2918, 590.1489, 487.34015, 687.0484, 327.71277, 683.5672, 735.64026]
2025-05-11 06:02:42,729 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 126.0, 122.0, 32.0, 126.0, 89.0, 129.0, 74.0, 133.0, 142.0]
2025-05-11 06:02:42,735 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 4 hours, 27 minutes)
2025-05-11 06:06:58,075 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 06:07:00,789 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 642.84265 ± 139.889
2025-05-11 06:07:00,790 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [580.75836, 632.67456, 368.59253, 730.36523, 741.9864, 503.21112, 912.719, 651.30237, 596.8274, 709.98956]
2025-05-11 06:07:00,790 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 119.0, 71.0, 139.0, 145.0, 92.0, 176.0, 129.0, 112.0, 135.0]
2025-05-11 06:07:00,790 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (642.84) for latency ExtremeClogL1U23
2025-05-11 06:07:00,790 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 06:07:00,794 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 06:07:00,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 4 hours, 22 minutes, 53 seconds)
2025-05-11 06:11:20,838 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 06:11:23,112 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 524.73718 ± 162.194
2025-05-11 06:11:23,112 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [480.5552, 431.2794, 492.466, 812.21094, 541.006, 543.26855, 746.0707, 180.24463, 511.40894, 508.86087]
2025-05-11 06:11:23,112 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 85.0, 91.0, 152.0, 101.0, 101.0, 150.0, 35.0, 96.0, 94.0]
2025-05-11 06:11:23,119 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 4 hours, 18 minutes, 34 seconds)
2025-05-11 06:15:46,586 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 06:15:49,276 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 636.31287 ± 152.907
2025-05-11 06:15:49,277 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [765.4527, 490.87744, 379.12292, 568.2447, 916.148, 594.43585, 769.65216, 567.2335, 760.40717, 551.5538]
2025-05-11 06:15:49,277 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [142.0, 92.0, 70.0, 106.0, 176.0, 110.0, 150.0, 106.0, 148.0, 102.0]
2025-05-11 06:15:49,283 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 4 hours, 16 minutes, 12 seconds)
2025-05-11 06:20:07,003 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 06:20:09,746 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 646.94788 ± 236.499
2025-05-11 06:20:09,746 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [504.4938, 970.19507, 523.58636, 393.36526, 672.14215, 533.39166, 1169.2697, 386.90775, 638.1156, 678.0119]
2025-05-11 06:20:09,746 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 190.0, 97.0, 71.0, 125.0, 100.0, 228.0, 75.0, 119.0, 126.0]
2025-05-11 06:20:09,746 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (646.95) for latency ExtremeClogL1U23
2025-05-11 06:20:09,746 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 06:20:09,750 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 06:20:09,764 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 4 hours, 11 minutes, 31 seconds)
2025-05-11 06:24:27,519 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 06:24:30,041 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 582.28137 ± 69.448
2025-05-11 06:24:30,042 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [504.16837, 537.7008, 633.7944, 636.1933, 485.82352, 619.0367, 527.64185, 632.1526, 710.07983, 536.22174]
2025-05-11 06:24:30,042 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 99.0, 116.0, 121.0, 97.0, 118.0, 113.0, 121.0, 135.0, 100.0]
2025-05-11 06:24:30,048 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 4 hours, 8 minutes, 23 seconds)
2025-05-11 06:28:50,059 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 06:28:53,074 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 724.60614 ± 138.701
2025-05-11 06:28:53,074 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [585.53235, 736.9507, 546.2497, 737.9783, 671.3481, 758.6307, 767.6388, 656.48175, 700.3393, 1084.9117]
2025-05-11 06:28:53,075 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [111.0, 142.0, 101.0, 146.0, 129.0, 143.0, 146.0, 123.0, 133.0, 210.0]
2025-05-11 06:28:53,075 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (724.61) for latency ExtremeClogL1U23
2025-05-11 06:28:53,075 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 06:28:53,079 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 06:28:53,092 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 4 hours, 4 minutes, 57 seconds)
2025-05-11 06:33:10,358 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 06:33:13,294 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 621.81665 ± 163.503
2025-05-11 06:33:13,294 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [773.53625, 823.14087, 434.23752, 461.96143, 470.79578, 572.6389, 844.4476, 717.33307, 720.498, 399.57712]
2025-05-11 06:33:13,294 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [155.0, 169.0, 91.0, 96.0, 99.0, 106.0, 179.0, 136.0, 152.0, 84.0]
2025-05-11 06:33:13,302 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 4 hours, 12 seconds)
2025-05-11 06:37:31,185 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 06:37:33,507 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 596.27185 ± 162.949
2025-05-11 06:37:33,508 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [489.46228, 519.2946, 640.7651, 544.27045, 788.8385, 859.0141, 310.74142, 630.38495, 755.23145, 424.71515]
2025-05-11 06:37:33,508 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 95.0, 121.0, 99.0, 152.0, 163.0, 59.0, 115.0, 140.0, 77.0]
2025-05-11 06:37:33,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 54 minutes, 45 seconds)
2025-05-11 06:41:50,912 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 06:41:53,398 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 558.12024 ± 129.208
2025-05-11 06:41:53,398 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [606.7744, 539.0055, 675.5369, 319.3775, 638.4457, 494.99097, 726.56793, 588.7622, 344.33212, 647.40894]
2025-05-11 06:41:53,398 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 109.0, 127.0, 59.0, 123.0, 105.0, 138.0, 109.0, 74.0, 139.0]
2025-05-11 06:41:53,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 50 minutes, 18 seconds)
2025-05-11 06:46:09,450 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 06:46:11,181 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 428.74512 ± 228.269
2025-05-11 06:46:11,182 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [152.45053, 155.70312, 130.17937, 177.36345, 606.5782, 713.7654, 541.50653, 614.95074, 579.39746, 615.55634]
2025-05-11 06:46:11,182 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 30.0, 25.0, 34.0, 111.0, 133.0, 101.0, 120.0, 107.0, 115.0]
2025-05-11 06:46:11,190 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 45 minutes, 31 seconds)
2025-05-11 06:50:29,511 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 06:50:32,226 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 657.53479 ± 121.172
2025-05-11 06:50:32,226 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [733.8418, 657.72784, 478.90222, 818.78705, 596.68896, 800.84357, 541.4035, 496.69284, 653.361, 797.0995]
2025-05-11 06:50:32,226 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 122.0, 87.0, 154.0, 110.0, 151.0, 100.0, 91.0, 128.0, 153.0]
2025-05-11 06:50:32,234 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 40 minutes, 51 seconds)
2025-05-11 06:54:58,343 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 06:55:00,507 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 537.13892 ± 53.207
2025-05-11 06:55:00,507 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [550.6642, 536.7651, 535.48157, 535.71716, 542.2884, 569.4042, 536.88025, 394.5102, 617.7074, 551.9709]
2025-05-11 06:55:00,507 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [101.0, 101.0, 98.0, 97.0, 100.0, 106.0, 99.0, 70.0, 114.0, 100.0]
2025-05-11 06:55:00,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 37 minutes, 52 seconds)
2025-05-11 06:59:26,166 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 06:59:28,601 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 604.77106 ± 114.894
2025-05-11 06:59:28,602 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [600.5742, 602.0858, 852.48645, 667.622, 400.41544, 701.44543, 582.6891, 597.24786, 545.78564, 497.35852]
2025-05-11 06:59:28,602 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [111.0, 115.0, 154.0, 127.0, 74.0, 131.0, 107.0, 110.0, 116.0, 91.0]
2025-05-11 06:59:28,609 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 34 minutes, 47 seconds)
2025-05-11 07:03:40,879 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 07:03:43,641 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 643.90729 ± 173.121
2025-05-11 07:03:43,641 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [979.5408, 626.1266, 788.9208, 622.95685, 543.5687, 571.57196, 839.24426, 647.403, 376.24915, 443.4909]
2025-05-11 07:03:43,641 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [193.0, 129.0, 148.0, 125.0, 99.0, 107.0, 153.0, 115.0, 72.0, 83.0]
2025-05-11 07:03:43,649 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 3 hours, 29 minutes, 38 seconds)
2025-05-11 07:08:01,808 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 07:08:03,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 524.13239 ± 84.618
2025-05-11 07:08:03,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [641.5754, 611.5378, 360.12543, 499.01273, 513.92255, 418.04315, 471.11893, 592.13806, 576.5039, 557.34564]
2025-05-11 07:08:03,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [118.0, 111.0, 67.0, 91.0, 94.0, 77.0, 87.0, 107.0, 122.0, 102.0]
2025-05-11 07:08:03,997 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 3 hours, 25 minutes, 40 seconds)
2025-05-11 07:12:27,417 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 07:12:30,424 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 702.90668 ± 175.984
2025-05-11 07:12:30,424 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [449.188, 579.1648, 1071.4042, 731.3752, 666.78235, 894.2191, 575.2243, 807.4093, 718.8271, 535.47266]
2025-05-11 07:12:30,424 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 106.0, 198.0, 135.0, 124.0, 171.0, 124.0, 156.0, 132.0, 100.0]
2025-05-11 07:12:30,433 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 3 hours, 22 minutes, 7 seconds)
2025-05-11 07:16:51,890 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 07:16:54,007 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 516.95764 ± 229.924
2025-05-11 07:16:54,007 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [547.2677, 139.98813, 762.39575, 859.8146, 209.07907, 361.2124, 690.00775, 591.396, 341.4569, 666.95734]
2025-05-11 07:16:54,007 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 27.0, 148.0, 160.0, 43.0, 68.0, 133.0, 115.0, 63.0, 132.0]
2025-05-11 07:16:54,015 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 3 hours, 17 minutes, 1 second)
2025-05-11 07:21:09,839 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 07:21:12,645 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 649.56012 ± 327.973
2025-05-11 07:21:12,646 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [711.20123, 640.6388, 583.4946, 691.7195, 497.63977, 942.0514, 1380.0195, 129.92905, 258.03033, 660.87683]
2025-05-11 07:21:12,646 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 122.0, 108.0, 129.0, 91.0, 180.0, 270.0, 25.0, 48.0, 123.0]
2025-05-11 07:21:12,654 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 3 hours, 11 minutes, 15 seconds)
2025-05-11 07:25:32,539 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 07:25:34,682 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 537.39832 ± 179.213
2025-05-11 07:25:34,682 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [431.25015, 712.1247, 548.1543, 539.9933, 616.469, 140.33418, 338.24753, 594.7281, 704.0234, 748.65894]
2025-05-11 07:25:34,682 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 132.0, 100.0, 100.0, 114.0, 27.0, 65.0, 110.0, 133.0, 139.0]
2025-05-11 07:25:34,691 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 3 hours, 7 minutes, 54 seconds)
2025-05-11 07:29:53,153 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 07:29:55,592 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 543.56702 ± 242.438
2025-05-11 07:29:55,592 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [157.98705, 540.37964, 386.85684, 895.7099, 139.50673, 778.2752, 800.76556, 549.3937, 570.86725, 615.92816]
2025-05-11 07:29:55,592 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 101.0, 70.0, 179.0, 27.0, 164.0, 171.0, 103.0, 122.0, 114.0]
2025-05-11 07:29:55,600 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 3 hours, 3 minutes, 37 seconds)
2025-05-11 07:34:12,212 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 07:34:14,644 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 575.33459 ± 173.204
2025-05-11 07:34:14,644 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [791.7738, 628.4236, 697.8716, 602.1218, 684.4669, 414.73267, 517.54877, 536.2656, 718.85223, 161.28896]
2025-05-11 07:34:14,644 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [153.0, 116.0, 128.0, 115.0, 149.0, 78.0, 98.0, 98.0, 133.0, 31.0]
2025-05-11 07:34:14,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 58 minutes, 14 seconds)
2025-05-11 07:38:27,617 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 07:38:30,478 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 666.51025 ± 164.833
2025-05-11 07:38:30,479 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [696.38916, 768.6777, 574.362, 638.38086, 329.50015, 842.683, 624.28577, 585.4475, 628.00183, 977.374]
2025-05-11 07:38:30,479 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 144.0, 103.0, 119.0, 60.0, 159.0, 117.0, 107.0, 124.0, 186.0]
2025-05-11 07:38:30,487 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 52 minutes, 51 seconds)
2025-05-11 07:42:46,857 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 07:42:49,939 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 724.78485 ± 274.247
2025-05-11 07:42:49,939 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [1102.1055, 683.0416, 564.20074, 514.79913, 212.97972, 851.1929, 501.42, 958.7787, 740.4605, 1118.8695]
2025-05-11 07:42:49,939 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [203.0, 130.0, 109.0, 96.0, 42.0, 160.0, 96.0, 177.0, 142.0, 206.0]
2025-05-11 07:42:49,939 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (724.78) for latency ExtremeClogL1U23
2025-05-11 07:42:49,939 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 07:42:49,943 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 07:42:49,961 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 48 minutes, 38 seconds)
2025-05-11 07:47:03,640 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 07:47:06,910 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 763.22424 ± 175.191
2025-05-11 07:47:06,910 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [925.19214, 1002.16516, 784.0103, 951.687, 561.6799, 801.0895, 763.6796, 799.63763, 404.39334, 638.70807]
2025-05-11 07:47:06,910 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [184.0, 185.0, 147.0, 192.0, 120.0, 165.0, 156.0, 147.0, 84.0, 118.0]
2025-05-11 07:47:06,910 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (763.22) for latency ExtremeClogL1U23
2025-05-11 07:47:06,911 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 07:47:06,914 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 07:47:06,930 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 43 minutes, 41 seconds)
2025-05-11 07:51:18,355 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 07:51:21,069 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 668.59949 ± 100.998
2025-05-11 07:51:21,069 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [790.86316, 644.7414, 557.4441, 581.51935, 854.1462, 582.08527, 748.0534, 716.38965, 541.9496, 668.8027]
2025-05-11 07:51:21,069 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [151.0, 121.0, 108.0, 108.0, 162.0, 111.0, 136.0, 136.0, 103.0, 124.0]
2025-05-11 07:51:21,077 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 38 minutes, 32 seconds)
2025-05-11 07:55:31,907 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 07:55:34,738 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 694.22693 ± 202.096
2025-05-11 07:55:34,739 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [173.01302, 789.227, 675.09503, 944.96136, 866.71063, 635.311, 603.3981, 786.2135, 809.6017, 658.7383]
2025-05-11 07:55:34,739 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 152.0, 127.0, 172.0, 176.0, 122.0, 114.0, 145.0, 150.0, 122.0]
2025-05-11 07:55:34,762 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 33 minutes, 36 seconds)
2025-05-11 07:59:50,917 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 07:59:54,281 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 834.42102 ± 198.938
2025-05-11 07:59:54,281 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [553.06976, 1134.2922, 670.8633, 959.0482, 1075.495, 932.47266, 594.13513, 654.1215, 988.3933, 782.3193]
2025-05-11 07:59:54,281 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 218.0, 124.0, 179.0, 201.0, 174.0, 112.0, 117.0, 184.0, 148.0]
2025-05-11 07:59:54,281 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (834.42) for latency ExtremeClogL1U23
2025-05-11 07:59:54,282 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 07:59:54,286 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 07:59:54,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 29 minutes, 46 seconds)
2025-05-11 08:04:10,645 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 08:04:13,756 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 749.13422 ± 207.934
2025-05-11 08:04:13,756 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [693.1045, 351.98404, 670.1065, 891.3106, 701.03125, 1099.7253, 1010.5652, 753.02374, 793.7503, 526.74054]
2025-05-11 08:04:13,756 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [128.0, 68.0, 120.0, 159.0, 141.0, 205.0, 183.0, 139.0, 151.0, 97.0]
2025-05-11 08:04:13,764 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 25 minutes, 29 seconds)
2025-05-11 08:08:36,260 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 08:08:39,382 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 741.35870 ± 222.270
2025-05-11 08:08:39,382 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [635.23834, 1125.2557, 348.98212, 998.37524, 716.0345, 568.93945, 725.4388, 999.9619, 660.5889, 634.7718]
2025-05-11 08:08:39,382 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 205.0, 65.0, 185.0, 134.0, 105.0, 146.0, 201.0, 121.0, 115.0]
2025-05-11 08:08:39,391 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 22 minutes, 10 seconds)
2025-05-11 08:12:59,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 08:13:02,760 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 673.17877 ± 153.321
2025-05-11 08:13:02,760 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [721.4391, 629.6909, 613.35205, 530.8552, 679.2837, 732.5357, 564.1197, 416.2061, 894.29236, 950.0129]
2025-05-11 08:13:02,760 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 122.0, 114.0, 99.0, 126.0, 155.0, 123.0, 76.0, 168.0, 195.0]
2025-05-11 08:13:02,769 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 18 minutes, 50 seconds)
2025-05-11 08:17:23,557 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 08:17:28,258 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 1052.47095 ± 571.073
2025-05-11 08:17:28,259 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [600.95844, 1104.9181, 911.5286, 1212.1445, 1086.8665, 671.40204, 2143.908, 171.52817, 693.4004, 1928.0553]
2025-05-11 08:17:28,259 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 204.0, 169.0, 224.0, 204.0, 131.0, 430.0, 33.0, 131.0, 368.0]
2025-05-11 08:17:28,259 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (1052.47) for latency ExtremeClogL1U23
2025-05-11 08:17:28,259 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 08:17:28,263 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-humanoid/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 08:17:28,280 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 2 hours, 15 minutes, 43 seconds)
2025-05-11 08:21:48,766 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 08:21:51,581 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 614.02576 ± 244.465
2025-05-11 08:21:51,582 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [754.2781, 371.34995, 673.1023, 590.0572, 740.71045, 891.4782, 669.7349, 928.3525, 73.18658, 448.00745]
2025-05-11 08:21:51,582 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [134.0, 78.0, 122.0, 125.0, 146.0, 181.0, 131.0, 179.0, 15.0, 83.0]
2025-05-11 08:21:51,593 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 2 hours, 11 minutes, 43 seconds)
2025-05-11 08:26:18,495 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 08:26:21,596 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 703.57434 ± 235.023
2025-05-11 08:26:21,596 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [583.57684, 1046.6576, 729.97363, 733.7486, 590.5617, 922.3002, 570.7926, 542.0079, 264.26358, 1051.8606]
2025-05-11 08:26:21,596 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 209.0, 148.0, 137.0, 113.0, 180.0, 106.0, 100.0, 59.0, 196.0]
2025-05-11 08:26:21,605 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 2 hours, 8 minutes, 21 seconds)
2025-05-11 08:30:40,839 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 08:30:44,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 717.01917 ± 342.097
2025-05-11 08:30:44,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [1473.4338, 461.97165, 808.8924, 640.0161, 324.4579, 853.56915, 161.06165, 842.6438, 859.64594, 744.49915]
2025-05-11 08:30:44,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [283.0, 92.0, 148.0, 118.0, 58.0, 161.0, 31.0, 162.0, 172.0, 145.0]
2025-05-11 08:30:44,075 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 2 hours, 3 minutes, 38 seconds)
2025-05-11 08:35:09,257 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 08:35:12,512 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 741.94464 ± 412.681
2025-05-11 08:35:12,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [436.4912, 904.8411, 582.9878, 722.9847, 762.6635, 875.3435, 167.08484, 1465.5251, 165.45164, 1336.0726]
2025-05-11 08:35:12,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 183.0, 109.0, 132.0, 151.0, 171.0, 32.0, 291.0, 32.0, 262.0]
2025-05-11 08:35:12,550 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 59 minutes, 40 seconds)
2025-05-11 08:39:27,508 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 08:39:31,381 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 906.16992 ± 314.199
2025-05-11 08:39:31,381 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [738.10315, 791.01324, 588.2403, 892.2108, 1682.3995, 1122.2183, 657.4312, 785.4605, 658.6436, 1145.9785]
2025-05-11 08:39:31,381 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 145.0, 111.0, 168.0, 307.0, 207.0, 131.0, 154.0, 125.0, 222.0]
2025-05-11 08:39:31,392 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 54 minutes, 40 seconds)
2025-05-11 08:43:48,920 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 08:43:51,855 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 711.23621 ± 105.136
2025-05-11 08:43:51,855 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [721.8173, 784.8916, 566.7759, 797.43335, 782.7384, 546.2221, 571.67377, 772.9865, 711.44135, 856.3812]
2025-05-11 08:43:51,855 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [131.0, 142.0, 107.0, 146.0, 151.0, 101.0, 106.0, 141.0, 136.0, 159.0]
2025-05-11 08:43:51,865 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 50 minutes, 1 second)
2025-05-11 08:48:17,187 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 08:48:20,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 713.11981 ± 362.234
2025-05-11 08:48:20,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [167.97061, 162.49197, 593.4147, 1231.749, 899.8785, 546.9001, 723.7249, 1290.6436, 876.38824, 638.0368]
2025-05-11 08:48:20,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 31.0, 109.0, 239.0, 170.0, 110.0, 137.0, 233.0, 169.0, 139.0]
2025-05-11 08:48:20,360 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 45 minutes, 30 seconds)
2025-05-11 08:52:36,261 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 08:52:39,272 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 703.37756 ± 293.170
2025-05-11 08:52:39,272 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [747.5241, 186.00891, 811.10345, 641.08264, 796.0828, 952.2873, 1060.3938, 187.61887, 625.3729, 1026.3007]
2025-05-11 08:52:39,272 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [141.0, 36.0, 150.0, 132.0, 155.0, 180.0, 197.0, 36.0, 115.0, 191.0]
2025-05-11 08:52:39,282 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 40 minutes, 49 seconds)
2025-05-11 08:56:55,327 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 08:56:59,141 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 891.28094 ± 260.756
2025-05-11 08:56:59,141 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [886.0388, 938.38605, 1494.926, 765.9105, 480.52036, 833.6131, 698.49536, 845.3909, 799.3113, 1170.2181]
2025-05-11 08:56:59,141 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [169.0, 176.0, 281.0, 142.0, 90.0, 183.0, 132.0, 150.0, 156.0, 221.0]
2025-05-11 08:56:59,153 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 35 minutes, 49 seconds)
2025-05-11 09:01:19,970 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 09:01:22,873 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 701.86029 ± 160.081
2025-05-11 09:01:22,873 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [732.84717, 741.56824, 565.7625, 977.59296, 357.23123, 827.0214, 638.73553, 781.4989, 782.9776, 613.3676]
2025-05-11 09:01:22,874 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [138.0, 147.0, 102.0, 183.0, 63.0, 157.0, 135.0, 151.0, 147.0, 112.0]
2025-05-11 09:01:22,884 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 31 minutes, 48 seconds)
2025-05-11 09:05:41,788 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 09:05:44,478 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 639.15320 ± 377.529
2025-05-11 09:05:44,478 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [702.7854, 335.87613, 145.70615, 338.4391, 510.79605, 971.5655, 1354.8922, 358.12106, 1152.1669, 521.18384]
2025-05-11 09:05:44,478 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [127.0, 63.0, 28.0, 63.0, 111.0, 178.0, 257.0, 65.0, 222.0, 97.0]
2025-05-11 09:05:44,488 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 27 minutes, 30 seconds)
2025-05-11 09:10:08,739 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 09:10:11,935 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 754.51404 ± 419.090
2025-05-11 09:10:11,935 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [589.29846, 708.58307, 213.67197, 1558.4434, 697.61176, 697.8402, 423.8181, 515.18146, 1526.1753, 614.5166]
2025-05-11 09:10:11,935 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [111.0, 132.0, 41.0, 293.0, 145.0, 132.0, 81.0, 95.0, 287.0, 115.0]
2025-05-11 09:10:11,947 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 23 minutes, 4 seconds)
2025-05-11 09:14:23,862 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 09:14:26,189 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 554.99738 ± 178.571
2025-05-11 09:14:26,190 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [529.5124, 655.2292, 129.59428, 656.8432, 513.1634, 596.9809, 742.96515, 356.28723, 734.011, 635.38684]
2025-05-11 09:14:26,190 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 124.0, 25.0, 125.0, 94.0, 112.0, 142.0, 65.0, 134.0, 115.0]
2025-05-11 09:14:26,202 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 18 minutes, 24 seconds)
2025-05-11 09:18:54,397 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 09:18:57,699 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 748.52161 ± 376.949
2025-05-11 09:18:57,700 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [810.75507, 716.5891, 794.5637, 1608.6617, 104.33259, 398.65723, 744.5686, 886.7686, 949.531, 470.7883]
2025-05-11 09:18:57,700 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [145.0, 137.0, 154.0, 305.0, 22.0, 81.0, 140.0, 171.0, 178.0, 102.0]
2025-05-11 09:18:57,711 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 14 minutes, 43 seconds)
2025-05-11 09:23:13,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 09:23:15,421 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 589.97278 ± 285.201
2025-05-11 09:23:15,421 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [632.7726, 948.35223, 506.02567, 277.26834, 847.18036, 112.41711, 202.26799, 881.45123, 738.21466, 753.7772]
2025-05-11 09:23:15,421 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [136.0, 178.0, 101.0, 53.0, 159.0, 22.0, 39.0, 163.0, 140.0, 145.0]
2025-05-11 09:23:15,431 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 10 minutes)
2025-05-11 09:27:29,379 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 09:27:32,780 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 831.13318 ± 249.772
2025-05-11 09:27:32,780 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [789.36774, 445.75986, 1079.1317, 493.4411, 926.7371, 809.33966, 780.73206, 1161.9636, 1198.1216, 626.7376]
2025-05-11 09:27:32,780 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [152.0, 84.0, 210.0, 92.0, 166.0, 147.0, 150.0, 226.0, 227.0, 119.0]
2025-05-11 09:27:32,790 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 1 hour, 5 minutes, 24 seconds)
2025-05-11 09:31:41,979 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 09:31:44,572 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 637.02606 ± 158.316
2025-05-11 09:31:44,572 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [557.7167, 749.9778, 860.56323, 427.52625, 655.85254, 899.10236, 653.9186, 495.09134, 652.6209, 417.89062]
2025-05-11 09:31:44,572 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 147.0, 158.0, 81.0, 127.0, 173.0, 127.0, 93.0, 124.0, 76.0]
2025-05-11 09:31:44,583 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 1 hour, 19 seconds)
2025-05-11 09:35:58,875 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 09:36:03,220 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 986.16437 ± 484.496
2025-05-11 09:36:03,221 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [727.3261, 966.2295, 2102.0916, 1050.7935, 330.4165, 560.4732, 1315.3218, 790.5781, 640.68427, 1377.7295]
2025-05-11 09:36:03,221 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [134.0, 182.0, 408.0, 194.0, 62.0, 101.0, 251.0, 151.0, 122.0, 260.0]
2025-05-11 09:36:03,232 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 56 minutes, 12 seconds)
2025-05-11 09:40:32,856 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 09:40:36,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 778.83276 ± 356.554
2025-05-11 09:40:36,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [1164.1895, 1334.9171, 847.3876, 1060.9832, 358.46826, 1039.8633, 721.0871, 485.42987, 599.55505, 176.44658]
2025-05-11 09:40:36,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [228.0, 256.0, 182.0, 191.0, 66.0, 194.0, 130.0, 90.0, 109.0, 34.0]
2025-05-11 09:40:36,248 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 51 minutes, 56 seconds)
2025-05-11 09:44:55,746 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 09:44:59,332 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 832.57874 ± 390.507
2025-05-11 09:44:59,332 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [316.00418, 958.05835, 633.89526, 160.4294, 613.77435, 1080.0085, 830.33246, 987.83075, 1503.834, 1241.6194]
2025-05-11 09:44:59,332 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [59.0, 185.0, 138.0, 31.0, 112.0, 207.0, 174.0, 192.0, 289.0, 234.0]
2025-05-11 09:44:59,344 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 47 minutes, 48 seconds)
2025-05-11 09:49:13,333 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 09:49:16,343 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 679.56421 ± 443.389
2025-05-11 09:49:16,343 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [161.83514, 461.7444, 699.47546, 534.9616, 1125.7811, 150.29282, 1038.7048, 1616.2822, 704.94464, 301.61954]
2025-05-11 09:49:16,343 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 88.0, 134.0, 113.0, 213.0, 29.0, 200.0, 311.0, 131.0, 57.0]
2025-05-11 09:49:16,354 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 43 minutes, 27 seconds)
2025-05-11 09:53:37,524 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 09:53:41,283 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 853.42511 ± 321.155
2025-05-11 09:53:41,283 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [762.96185, 423.59988, 749.0172, 680.13446, 1088.6654, 756.63837, 1670.2231, 724.0551, 995.2878, 683.6682]
2025-05-11 09:53:41,284 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [140.0, 78.0, 140.0, 125.0, 206.0, 143.0, 320.0, 147.0, 185.0, 125.0]
2025-05-11 09:53:41,295 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 39 minutes, 30 seconds)
2025-05-11 09:57:59,691 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 09:58:04,276 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 984.43213 ± 410.730
2025-05-11 09:58:04,276 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [1413.8667, 1290.3997, 324.9032, 693.3612, 688.0515, 1101.5741, 996.11847, 1018.01556, 1761.2903, 556.7403]
2025-05-11 09:58:04,276 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [260.0, 252.0, 59.0, 129.0, 134.0, 230.0, 196.0, 199.0, 343.0, 112.0]
2025-05-11 09:58:04,289 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 35 minutes, 13 seconds)
2025-05-11 10:02:27,388 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 10:02:29,942 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 599.08820 ± 335.381
2025-05-11 10:02:29,942 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [553.5709, 631.8733, 882.3342, 407.57132, 955.43195, 184.3786, 261.37363, 245.90962, 586.0989, 1282.3396]
2025-05-11 10:02:29,942 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 113.0, 169.0, 77.0, 182.0, 36.0, 51.0, 48.0, 120.0, 236.0]
2025-05-11 10:02:29,956 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 30 minutes, 39 seconds)
2025-05-11 10:06:47,117 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 10:06:50,587 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 813.48651 ± 147.114
2025-05-11 10:06:50,587 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [890.95575, 554.5499, 907.63995, 893.0444, 749.021, 952.9341, 1009.0182, 556.1131, 817.57477, 804.0139]
2025-05-11 10:06:50,587 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [176.0, 101.0, 177.0, 163.0, 141.0, 171.0, 190.0, 100.0, 156.0, 155.0]
2025-05-11 10:06:50,601 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 26 minutes, 13 seconds)
2025-05-11 10:11:12,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 10:11:15,936 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 827.86230 ± 313.189
2025-05-11 10:11:15,936 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [1437.5858, 1061.1036, 973.2826, 552.79535, 407.9653, 645.6837, 852.84186, 384.58023, 1026.8287, 935.956]
2025-05-11 10:11:15,936 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [269.0, 202.0, 173.0, 106.0, 74.0, 120.0, 160.0, 74.0, 186.0, 179.0]
2025-05-11 10:11:15,949 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 21 minutes, 59 seconds)
2025-05-11 10:15:31,196 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 10:15:34,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 673.91937 ± 338.366
2025-05-11 10:15:34,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [610.5903, 155.3129, 700.1936, 605.339, 1206.5739, 824.9928, 650.83405, 833.9607, 68.46327, 1082.9335]
2025-05-11 10:15:34,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 30.0, 136.0, 113.0, 227.0, 154.0, 129.0, 156.0, 14.0, 210.0]
2025-05-11 10:15:34,145 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 17 minutes, 30 seconds)
2025-05-11 10:19:54,528 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 10:19:58,570 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 953.52527 ± 363.945
2025-05-11 10:19:58,570 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [551.5224, 494.26022, 1156.1373, 1227.605, 843.3854, 1782.4045, 1154.736, 754.5348, 781.7677, 788.899]
2025-05-11 10:19:58,570 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 101.0, 216.0, 237.0, 154.0, 332.0, 214.0, 144.0, 146.0, 143.0]
2025-05-11 10:19:58,582 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 13 minutes, 8 seconds)
2025-05-11 10:24:12,521 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 10:24:15,689 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 763.87030 ± 228.744
2025-05-11 10:24:15,689 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [739.68555, 934.1059, 127.20044, 819.1984, 791.5783, 770.69495, 834.0981, 716.68286, 898.15454, 1007.30383]
2025-05-11 10:24:15,689 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 178.0, 27.0, 160.0, 150.0, 149.0, 154.0, 139.0, 169.0, 184.0]
2025-05-11 10:24:15,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 8 minutes, 42 seconds)
2025-05-11 10:28:31,483 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 10:28:35,268 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 853.26086 ± 443.826
2025-05-11 10:28:35,269 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [993.5834, 1840.9406, 538.3695, 438.08176, 920.7266, 831.94617, 1294.999, 807.3139, 705.48035, 161.16711]
2025-05-11 10:28:35,269 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [184.0, 339.0, 98.0, 82.0, 174.0, 169.0, 246.0, 159.0, 141.0, 31.0]
2025-05-11 10:28:35,281 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 4 minutes, 20 seconds)
2025-05-11 10:32:55,740 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 10:33:00,386 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 1030.71716 ± 457.068
2025-05-11 10:33:00,386 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [974.092, 436.08075, 456.76285, 1452.3265, 1074.9055, 868.0397, 1117.6332, 1269.8812, 637.9466, 2019.5038]
2025-05-11 10:33:00,386 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [188.0, 80.0, 94.0, 265.0, 210.0, 167.0, 209.0, 263.0, 123.0, 377.0]
2025-05-11 10:33:00,400 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1251 [DEBUG]: Training session finished
