2025-05-07 16:48:19,662 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1006 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-sac-aug-mem4
2025-05-07 16:48:19,662 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1007 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-sac-aug-mem4
2025-05-07 16:48:19,662 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1008 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x7ddf16bc6f10>}
2025-05-07 16:48:19,662 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1009 [DEBUG]: using device: cpu
2025-05-07 16:48:19,662 baseline-sac-noisy-humanoid:77 [WARNING]: args.memorize_actions != args.horizon: 4 != 32
2025-05-07 16:48:19,686 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1031 [INFO]: Creating new trainer
2025-05-07 16:48:19,700 baseline-sac-noisy-humanoid:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=444, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-05-07 16:48:19,700 baseline-sac-noisy-humanoid:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=461, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-07 16:48:20,246 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1092 [DEBUG]: Starting training session...
2025-05-07 16:48:20,247 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 1/100
2025-05-07 16:52:03,731 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:52:04,876 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 289.47769 ± 21.925
2025-05-07 16:52:04,876 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [265.8693, 283.75873, 269.25195, 314.29178, 262.29636, 291.5209, 299.4521, 269.772, 309.6728, 328.89084]
2025-05-07 16:52:04,876 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [48.0, 52.0, 50.0, 60.0, 52.0, 56.0, 55.0, 53.0, 58.0, 62.0]
2025-05-07 16:52:04,877 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1124 [INFO]: New best (289.48) for latency ExtremeSparseL4U32
2025-05-07 16:52:04,877 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 16:52:04,882 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 16:52:04,888 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 2/100 (estimated time remaining: 6 hours, 10 minutes, 39 seconds)
2025-05-07 16:56:09,815 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:56:11,268 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 342.75568 ± 99.847
2025-05-07 16:56:11,269 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [337.24728, 237.25906, 428.83197, 459.59686, 256.92767, 255.14308, 523.0373, 374.12473, 350.08072, 205.30792]
2025-05-07 16:56:11,269 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [65.0, 51.0, 80.0, 102.0, 59.0, 55.0, 111.0, 72.0, 69.0, 45.0]
2025-05-07 16:56:11,269 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1124 [INFO]: New best (342.76) for latency ExtremeSparseL4U32
2025-05-07 16:56:11,269 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 16:56:11,274 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 16:56:11,281 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 3/100 (estimated time remaining: 6 hours, 24 minutes, 40 seconds)
2025-05-07 17:00:17,504 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:00:18,805 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 331.07953 ± 45.296
2025-05-07 17:00:18,805 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [251.07379, 320.6205, 287.86688, 323.7513, 330.09985, 413.34622, 319.53714, 363.78238, 391.41266, 309.30457]
2025-05-07 17:00:18,805 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [47.0, 69.0, 54.0, 60.0, 63.0, 78.0, 58.0, 67.0, 73.0, 57.0]
2025-05-07 17:00:18,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 4/100 (estimated time remaining: 6 hours, 27 minutes, 13 seconds)
2025-05-07 17:04:22,391 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:04:23,763 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 351.26553 ± 101.596
2025-05-07 17:04:23,763 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [322.71912, 430.4563, 318.03265, 348.80038, 202.63121, 312.45206, 426.17825, 586.88025, 262.62247, 301.8827]
2025-05-07 17:04:23,763 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [61.0, 87.0, 59.0, 69.0, 40.0, 59.0, 76.0, 122.0, 51.0, 56.0]
2025-05-07 17:04:23,763 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1124 [INFO]: New best (351.27) for latency ExtremeSparseL4U32
2025-05-07 17:04:23,764 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 17:04:23,767 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 17:04:23,775 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 5/100 (estimated time remaining: 6 hours, 25 minutes, 24 seconds)
2025-05-07 17:08:37,359 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:08:38,412 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 293.99347 ± 59.248
2025-05-07 17:08:38,412 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [218.22266, 383.4533, 296.12286, 360.0898, 269.35406, 362.18024, 327.216, 202.53746, 261.25647, 259.50177]
2025-05-07 17:08:38,412 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [42.0, 74.0, 56.0, 68.0, 50.0, 72.0, 61.0, 39.0, 50.0, 49.0]
2025-05-07 17:08:38,414 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 6/100 (estimated time remaining: 6 hours, 25 minutes, 45 seconds)
2025-05-07 17:12:42,384 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:12:43,874 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 399.27374 ± 62.221
2025-05-07 17:12:43,874 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [389.32858, 462.09674, 504.47025, 384.67615, 352.04584, 294.91348, 424.07635, 313.88547, 432.68906, 434.55576]
2025-05-07 17:12:43,874 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [79.0, 90.0, 92.0, 76.0, 65.0, 65.0, 80.0, 59.0, 84.0, 79.0]
2025-05-07 17:12:43,874 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1124 [INFO]: New best (399.27) for latency ExtremeSparseL4U32
2025-05-07 17:12:43,875 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 17:12:43,878 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 17:12:43,886 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 7/100 (estimated time remaining: 6 hours, 28 minutes, 13 seconds)
2025-05-07 17:16:46,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:16:47,789 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 365.34991 ± 84.869
2025-05-07 17:16:47,790 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [347.07135, 384.15314, 467.74118, 429.3724, 467.4722, 326.2164, 230.78607, 347.18762, 434.86636, 218.63234]
2025-05-07 17:16:47,790 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [67.0, 83.0, 93.0, 80.0, 92.0, 72.0, 44.0, 64.0, 84.0, 46.0]
2025-05-07 17:16:47,792 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 8/100 (estimated time remaining: 6 hours, 23 minutes, 19 seconds)
2025-05-07 17:21:01,897 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:21:03,518 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 382.95380 ± 59.687
2025-05-07 17:21:03,519 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [374.49228, 380.92746, 459.8939, 413.57016, 421.2317, 450.91248, 284.5073, 378.4981, 395.42834, 270.0764]
2025-05-07 17:21:03,519 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [68.0, 73.0, 88.0, 84.0, 80.0, 86.0, 54.0, 68.0, 84.0, 52.0]
2025-05-07 17:21:03,521 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 9/100 (estimated time remaining: 6 hours, 21 minutes, 42 seconds)
2025-05-07 17:25:17,926 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:25:19,180 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 315.88531 ± 142.529
2025-05-07 17:25:19,180 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [343.86478, 403.0973, 180.6666, 171.45572, 191.86467, 151.25816, 497.76044, 594.93756, 279.19562, 344.75235]
2025-05-07 17:25:19,180 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [63.0, 86.0, 35.0, 34.0, 37.0, 29.0, 95.0, 117.0, 59.0, 66.0]
2025-05-07 17:25:19,182 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 10/100 (estimated time remaining: 6 hours, 20 minutes, 48 seconds)
2025-05-07 17:29:34,952 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:29:36,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 428.68759 ± 112.264
2025-05-07 17:29:36,649 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [340.21796, 335.3278, 542.2217, 363.1428, 667.88525, 310.58875, 534.7509, 392.82236, 340.98953, 458.92874]
2025-05-07 17:29:36,649 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [64.0, 62.0, 114.0, 70.0, 129.0, 61.0, 100.0, 74.0, 64.0, 86.0]
2025-05-07 17:29:36,649 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1124 [INFO]: New best (428.69) for latency ExtremeSparseL4U32
2025-05-07 17:29:36,649 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 17:29:36,653 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 17:29:36,663 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 11/100 (estimated time remaining: 6 hours, 17 minutes, 28 seconds)
2025-05-07 17:33:52,039 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:33:53,664 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 382.44254 ± 87.461
2025-05-07 17:33:53,664 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [328.6223, 489.67743, 336.24396, 388.00433, 212.64569, 441.16904, 404.48285, 534.85333, 321.57166, 367.1549]
2025-05-07 17:33:53,664 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [64.0, 96.0, 62.0, 72.0, 45.0, 100.0, 88.0, 102.0, 71.0, 70.0]
2025-05-07 17:33:53,666 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 12/100 (estimated time remaining: 6 hours, 16 minutes, 42 seconds)
2025-05-07 17:38:08,424 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:38:09,760 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 334.31763 ± 144.286
2025-05-07 17:38:09,761 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [641.09656, 176.20872, 145.72466, 253.07538, 336.77264, 356.7226, 177.98828, 413.06094, 403.39127, 439.13504]
2025-05-07 17:38:09,761 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [126.0, 34.0, 28.0, 47.0, 63.0, 67.0, 34.0, 76.0, 79.0, 82.0]
2025-05-07 17:38:09,763 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 13/100 (estimated time remaining: 6 hours, 16 minutes, 2 seconds)
2025-05-07 17:42:23,734 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:42:25,443 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 392.38898 ± 169.371
2025-05-07 17:42:25,443 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [682.5671, 595.49963, 475.0331, 256.90692, 526.7933, 291.99356, 365.00223, 172.05804, 407.6192, 150.41673]
2025-05-07 17:42:25,443 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [132.0, 115.0, 110.0, 57.0, 105.0, 67.0, 69.0, 33.0, 89.0, 31.0]
2025-05-07 17:42:25,446 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 14/100 (estimated time remaining: 6 hours, 11 minutes, 45 seconds)
2025-05-07 17:46:41,123 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:46:42,566 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 368.12674 ± 134.912
2025-05-07 17:46:42,567 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [316.7527, 422.85294, 286.15067, 287.9982, 343.17267, 741.94965, 340.911, 254.43756, 406.74442, 280.2977]
2025-05-07 17:46:42,567 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [59.0, 79.0, 55.0, 56.0, 65.0, 144.0, 63.0, 48.0, 77.0, 53.0]
2025-05-07 17:46:42,569 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 15/100 (estimated time remaining: 6 hours, 7 minutes, 54 seconds)
2025-05-07 17:50:58,154 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:50:59,895 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 418.23178 ± 144.912
2025-05-07 17:50:59,896 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [512.554, 145.02405, 554.54456, 639.43896, 351.561, 448.1046, 262.6589, 368.16907, 338.6435, 561.6192]
2025-05-07 17:50:59,896 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [96.0, 28.0, 105.0, 136.0, 70.0, 86.0, 52.0, 70.0, 62.0, 106.0]
2025-05-07 17:50:59,898 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 16/100 (estimated time remaining: 6 hours, 3 minutes, 34 seconds)
2025-05-07 17:55:17,395 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:55:18,839 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 350.52316 ± 152.438
2025-05-07 17:55:18,839 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [146.60495, 346.49368, 325.20895, 755.5022, 347.01334, 372.84314, 396.5471, 325.87122, 259.38208, 229.76506]
2025-05-07 17:55:18,839 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [28.0, 65.0, 71.0, 145.0, 63.0, 81.0, 73.0, 59.0, 55.0, 45.0]
2025-05-07 17:55:18,843 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 17/100 (estimated time remaining: 5 hours, 59 minutes, 50 seconds)
2025-05-07 17:59:37,477 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:59:39,073 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 381.03534 ± 128.014
2025-05-07 17:59:39,073 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [563.26184, 286.35117, 395.48602, 213.78476, 164.1587, 287.53513, 450.91776, 484.84824, 452.88562, 511.12396]
2025-05-07 17:59:39,073 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [110.0, 62.0, 72.0, 44.0, 32.0, 66.0, 85.0, 91.0, 100.0, 95.0]
2025-05-07 17:59:39,076 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 18/100 (estimated time remaining: 5 hours, 56 minutes, 42 seconds)
2025-05-07 18:03:58,175 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:03:59,727 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 358.22467 ± 84.032
2025-05-07 18:03:59,727 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [339.98257, 346.96436, 351.48364, 528.15656, 304.38565, 362.0614, 185.0136, 340.41177, 378.43985, 445.34747]
2025-05-07 18:03:59,727 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [63.0, 64.0, 65.0, 109.0, 56.0, 79.0, 36.0, 62.0, 69.0, 82.0]
2025-05-07 18:03:59,730 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 19/100 (estimated time remaining: 5 hours, 53 minutes, 46 seconds)
2025-05-07 18:08:16,691 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:08:18,386 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 379.27316 ± 143.128
2025-05-07 18:08:18,386 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [394.24463, 529.2638, 410.11703, 489.2972, 193.33696, 459.03842, 322.68622, 225.36293, 159.75258, 609.6319]
2025-05-07 18:08:18,386 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [72.0, 118.0, 76.0, 109.0, 39.0, 87.0, 59.0, 45.0, 31.0, 117.0]
2025-05-07 18:08:18,388 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 20/100 (estimated time remaining: 5 hours, 49 minutes, 52 seconds)
2025-05-07 18:12:39,595 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:12:41,852 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 503.96860 ± 102.720
2025-05-07 18:12:41,852 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [713.66534, 485.02054, 504.18268, 457.17368, 355.99826, 518.85144, 533.3118, 380.80954, 638.6731, 451.9994]
2025-05-07 18:12:41,852 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [132.0, 93.0, 99.0, 88.0, 65.0, 118.0, 98.0, 71.0, 137.0, 94.0]
2025-05-07 18:12:41,852 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1124 [INFO]: New best (503.97) for latency ExtremeSparseL4U32
2025-05-07 18:12:41,852 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 18:12:41,857 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 18:12:41,869 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 21/100 (estimated time remaining: 5 hours, 47 minutes, 11 seconds)
2025-05-07 18:17:03,145 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:17:05,145 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 452.23111 ± 95.685
2025-05-07 18:17:05,145 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [524.6508, 615.3481, 561.0623, 460.62497, 376.1051, 296.9731, 510.32812, 436.9089, 390.23425, 350.07538]
2025-05-07 18:17:05,145 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [115.0, 114.0, 114.0, 99.0, 72.0, 66.0, 107.0, 92.0, 72.0, 66.0]
2025-05-07 18:17:05,149 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 22/100 (estimated time remaining: 5 hours, 43 minutes, 59 seconds)
2025-05-07 18:21:16,777 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:21:18,581 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 441.26489 ± 94.095
2025-05-07 18:21:18,581 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [442.10645, 335.00885, 504.0602, 503.61346, 410.81293, 371.51398, 575.8768, 274.78714, 421.67618, 573.1928]
2025-05-07 18:21:18,581 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [84.0, 67.0, 97.0, 93.0, 82.0, 70.0, 111.0, 54.0, 80.0, 110.0]
2025-05-07 18:21:18,584 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 23/100 (estimated time remaining: 5 hours, 37 minutes, 52 seconds)
2025-05-07 18:25:29,369 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:25:31,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 467.78546 ± 119.225
2025-05-07 18:25:31,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [594.6717, 378.9823, 493.33044, 480.84344, 559.9867, 342.39496, 388.94437, 391.3503, 330.27438, 717.07574]
2025-05-07 18:25:31,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [110.0, 80.0, 96.0, 103.0, 108.0, 78.0, 72.0, 72.0, 60.0, 137.0]
2025-05-07 18:25:31,410 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 24/100 (estimated time remaining: 5 hours, 31 minutes, 31 seconds)
2025-05-07 18:29:46,743 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:29:48,912 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 488.39023 ± 148.846
2025-05-07 18:29:48,913 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [481.83594, 545.6696, 543.85016, 465.76227, 787.1602, 164.96541, 447.10825, 593.2347, 410.16565, 444.15048]
2025-05-07 18:29:48,913 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [97.0, 120.0, 104.0, 89.0, 167.0, 33.0, 94.0, 110.0, 78.0, 98.0]
2025-05-07 18:29:48,916 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 25/100 (estimated time remaining: 5 hours, 26 minutes, 56 seconds)
2025-05-07 18:34:15,854 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:34:17,964 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 469.84750 ± 76.671
2025-05-07 18:34:17,964 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [349.30695, 465.27182, 419.02612, 526.0969, 493.17896, 550.0333, 512.67163, 587.4472, 449.89886, 345.54327]
2025-05-07 18:34:17,964 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [64.0, 97.0, 79.0, 96.0, 93.0, 103.0, 95.0, 110.0, 87.0, 62.0]
2025-05-07 18:34:17,968 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 26/100 (estimated time remaining: 5 hours, 24 minutes, 1 second)
2025-05-07 18:38:38,090 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:38:40,584 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 561.76794 ± 220.693
2025-05-07 18:38:40,584 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [525.6512, 466.42114, 370.2502, 326.22583, 469.25037, 547.0182, 674.5916, 523.53625, 551.77295, 1162.9614]
2025-05-07 18:38:40,584 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [98.0, 87.0, 67.0, 73.0, 87.0, 121.0, 136.0, 95.0, 105.0, 243.0]
2025-05-07 18:38:40,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1124 [INFO]: New best (561.77) for latency ExtremeSparseL4U32
2025-05-07 18:38:40,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 18:38:40,589 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 18:38:40,600 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 27/100 (estimated time remaining: 5 hours, 19 minutes, 32 seconds)
2025-05-07 18:42:51,604 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:42:53,996 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 520.30457 ± 159.168
2025-05-07 18:42:53,996 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [426.54825, 471.21286, 479.06277, 921.24335, 604.5881, 407.8224, 300.23486, 545.4584, 597.0986, 449.77634]
2025-05-07 18:42:53,996 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [78.0, 89.0, 95.0, 193.0, 113.0, 78.0, 54.0, 103.0, 116.0, 85.0]
2025-05-07 18:42:54,000 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 28/100 (estimated time remaining: 5 hours, 15 minutes, 13 seconds)
2025-05-07 18:47:17,210 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:47:19,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 476.91962 ± 72.426
2025-05-07 18:47:19,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [447.3064, 442.74548, 477.8729, 352.742, 599.83734, 466.01987, 496.198, 601.92535, 416.16858, 468.3804]
2025-05-07 18:47:19,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [84.0, 84.0, 90.0, 76.0, 120.0, 91.0, 93.0, 118.0, 85.0, 88.0]
2025-05-07 18:47:19,353 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 29/100 (estimated time remaining: 5 hours, 13 minutes, 54 seconds)
2025-05-07 18:51:43,626 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:51:45,726 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 485.29327 ± 96.890
2025-05-07 18:51:45,726 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [607.30237, 681.54895, 331.07397, 502.1285, 460.1629, 405.32428, 448.09845, 470.4064, 534.6568, 412.23004]
2025-05-07 18:51:45,726 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [115.0, 128.0, 64.0, 103.0, 83.0, 83.0, 84.0, 89.0, 102.0, 79.0]
2025-05-07 18:51:45,731 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 30/100 (estimated time remaining: 5 hours, 11 minutes, 38 seconds)
2025-05-07 18:56:08,237 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:56:10,641 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 512.60156 ± 146.714
2025-05-07 18:56:10,641 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [467.4967, 500.0511, 513.3537, 840.62134, 350.40222, 381.184, 731.1068, 474.26038, 444.6713, 422.86792]
2025-05-07 18:56:10,641 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [93.0, 97.0, 96.0, 161.0, 72.0, 76.0, 142.0, 93.0, 86.0, 81.0]
2025-05-07 18:56:10,646 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 31/100 (estimated time remaining: 5 hours, 6 minutes, 17 seconds)
2025-05-07 19:00:31,275 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:00:32,907 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 368.08621 ± 137.153
2025-05-07 19:00:32,907 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [467.23404, 455.26288, 447.28088, 473.05188, 185.50012, 378.57236, 378.28055, 563.5003, 170.86903, 161.30971]
2025-05-07 19:00:32,907 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [87.0, 87.0, 94.0, 102.0, 36.0, 71.0, 75.0, 113.0, 33.0, 31.0]
2025-05-07 19:00:32,911 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 32/100 (estimated time remaining: 5 hours, 1 minute, 49 seconds)
2025-05-07 19:04:56,963 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:04:59,100 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 475.31366 ± 105.058
2025-05-07 19:04:59,100 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [532.05566, 335.03702, 372.47772, 682.56085, 453.26236, 621.2801, 382.22995, 489.324, 418.62, 466.28915]
2025-05-07 19:04:59,100 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [98.0, 62.0, 71.0, 127.0, 85.0, 121.0, 71.0, 98.0, 77.0, 90.0]
2025-05-07 19:04:59,105 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 33/100 (estimated time remaining: 5 hours, 21 seconds)
2025-05-07 19:09:21,985 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:09:23,680 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 385.44675 ± 158.177
2025-05-07 19:09:23,681 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [574.1834, 557.14764, 350.39157, 437.62997, 449.2118, 164.55898, 151.57462, 176.6289, 445.13562, 548.0052]
2025-05-07 19:09:23,681 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [106.0, 106.0, 65.0, 83.0, 88.0, 32.0, 29.0, 34.0, 88.0, 106.0]
2025-05-07 19:09:23,686 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 34/100 (estimated time remaining: 4 hours, 55 minutes, 46 seconds)
2025-05-07 19:13:45,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:13:47,473 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 417.79282 ± 98.186
2025-05-07 19:13:47,473 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [610.22644, 328.22998, 510.26212, 360.0067, 298.48175, 350.60388, 518.0923, 453.40662, 321.37915, 427.23953]
2025-05-07 19:13:47,473 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [115.0, 66.0, 99.0, 72.0, 57.0, 70.0, 97.0, 87.0, 63.0, 79.0]
2025-05-07 19:13:47,479 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 35/100 (estimated time remaining: 4 hours, 50 minutes, 47 seconds)
2025-05-07 19:18:12,417 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:18:15,072 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 562.06873 ± 130.283
2025-05-07 19:18:15,073 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [282.69952, 444.82602, 547.38086, 756.7406, 557.94147, 534.7433, 575.97784, 523.2927, 690.7655, 706.3188]
2025-05-07 19:18:15,073 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [49.0, 95.0, 102.0, 156.0, 103.0, 100.0, 105.0, 95.0, 135.0, 133.0]
2025-05-07 19:18:15,073 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1124 [INFO]: New best (562.07) for latency ExtremeSparseL4U32
2025-05-07 19:18:15,073 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 19:18:15,078 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 19:18:15,089 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 36/100 (estimated time remaining: 4 hours, 46 minutes, 57 seconds)
2025-05-07 19:22:35,610 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:22:37,440 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 397.38397 ± 196.929
2025-05-07 19:22:37,440 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [258.8974, 671.04407, 355.4839, 586.598, 633.31573, 135.00017, 504.44638, 499.88004, 138.91562, 190.25847]
2025-05-07 19:22:37,440 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [51.0, 130.0, 67.0, 118.0, 130.0, 26.0, 94.0, 94.0, 27.0, 37.0]
2025-05-07 19:22:37,445 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 37/100 (estimated time remaining: 4 hours, 42 minutes, 34 seconds)
2025-05-07 19:27:02,787 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:27:05,450 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 563.84290 ± 104.277
2025-05-07 19:27:05,451 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [653.114, 663.2923, 477.48758, 492.4242, 460.89285, 392.06906, 735.1997, 643.84344, 525.2487, 594.8567]
2025-05-07 19:27:05,451 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [125.0, 131.0, 89.0, 93.0, 86.0, 72.0, 156.0, 132.0, 99.0, 114.0]
2025-05-07 19:27:05,451 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1124 [INFO]: New best (563.84) for latency ExtremeSparseL4U32
2025-05-07 19:27:05,451 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 19:27:05,455 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 19:27:05,470 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 38/100 (estimated time remaining: 4 hours, 38 minutes, 32 seconds)
2025-05-07 19:31:25,559 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:31:27,515 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 436.38486 ± 150.582
2025-05-07 19:31:27,516 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [580.2117, 476.17734, 476.24222, 614.7919, 522.6632, 386.97824, 167.33018, 150.08397, 497.9806, 491.38913]
2025-05-07 19:31:27,516 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [109.0, 87.0, 89.0, 125.0, 96.0, 71.0, 32.0, 29.0, 92.0, 92.0]
2025-05-07 19:31:27,520 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 39/100 (estimated time remaining: 4 hours, 33 minutes, 35 seconds)
2025-05-07 19:35:49,280 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:35:51,390 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 471.09760 ± 74.602
2025-05-07 19:35:51,390 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [327.90674, 504.6332, 517.6894, 422.5923, 400.05374, 492.07535, 567.49023, 472.60287, 423.3101, 582.62213]
2025-05-07 19:35:51,390 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [63.0, 98.0, 100.0, 82.0, 73.0, 92.0, 105.0, 92.0, 83.0, 111.0]
2025-05-07 19:35:51,395 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 40/100 (estimated time remaining: 4 hours, 29 minutes, 11 seconds)
2025-05-07 19:40:10,100 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:40:11,786 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 384.14212 ± 216.170
2025-05-07 19:40:11,787 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [528.5253, 452.50223, 192.67126, 171.53658, 175.68077, 128.9879, 334.06723, 808.89136, 403.53738, 645.02106]
2025-05-07 19:40:11,787 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [99.0, 84.0, 37.0, 33.0, 34.0, 25.0, 64.0, 173.0, 76.0, 118.0]
2025-05-07 19:40:11,793 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 41/100 (estimated time remaining: 4 hours, 23 minutes, 20 seconds)
2025-05-07 19:44:33,509 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:44:36,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 558.23022 ± 110.285
2025-05-07 19:44:36,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [494.10745, 518.3286, 543.6321, 547.26514, 590.15314, 710.6447, 436.812, 409.7196, 540.53345, 791.1058]
2025-05-07 19:44:36,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [98.0, 105.0, 101.0, 100.0, 114.0, 138.0, 82.0, 78.0, 103.0, 152.0]
2025-05-07 19:44:36,118 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 42/100 (estimated time remaining: 4 hours, 19 minutes, 20 seconds)
2025-05-07 19:48:54,823 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:48:56,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 448.30362 ± 87.619
2025-05-07 19:48:56,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [407.0133, 521.694, 507.91208, 349.95615, 429.41595, 597.98236, 521.04736, 429.13556, 281.2025, 437.67697]
2025-05-07 19:48:56,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [80.0, 96.0, 94.0, 66.0, 87.0, 116.0, 104.0, 84.0, 52.0, 87.0]
2025-05-07 19:48:56,826 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 43/100 (estimated time remaining: 4 hours, 13 minutes, 31 seconds)
2025-05-07 19:53:13,157 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:53:14,920 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 421.94952 ± 123.708
2025-05-07 19:53:14,920 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [284.57495, 138.9782, 614.2762, 450.17636, 412.53793, 530.028, 425.41437, 434.06717, 483.68457, 445.75742]
2025-05-07 19:53:14,920 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [49.0, 27.0, 118.0, 83.0, 74.0, 100.0, 80.0, 82.0, 91.0, 77.0]
2025-05-07 19:53:14,925 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 44/100 (estimated time remaining: 4 hours, 8 minutes, 24 seconds)
2025-05-07 19:57:30,833 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:57:33,436 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 559.17755 ± 148.093
2025-05-07 19:57:33,436 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [504.1097, 636.19867, 748.075, 365.33786, 720.8407, 491.29727, 499.49356, 381.37683, 800.21674, 444.82928]
2025-05-07 19:57:33,436 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [98.0, 125.0, 145.0, 67.0, 136.0, 91.0, 96.0, 71.0, 166.0, 87.0]
2025-05-07 19:57:33,448 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 45/100 (estimated time remaining: 4 hours, 3 minutes, 2 seconds)
2025-05-07 20:01:43,801 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:01:45,970 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 495.63519 ± 172.379
2025-05-07 20:01:45,970 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [516.5018, 175.54497, 154.80864, 540.6822, 649.03687, 627.64233, 531.3419, 598.0473, 515.6582, 647.0881]
2025-05-07 20:01:45,970 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [97.0, 34.0, 30.0, 110.0, 123.0, 119.0, 105.0, 124.0, 99.0, 123.0]
2025-05-07 20:01:45,977 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 57 minutes, 16 seconds)
2025-05-07 20:05:59,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:06:02,140 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 488.84164 ± 217.349
2025-05-07 20:06:02,140 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [161.5115, 459.06677, 187.54465, 209.29874, 645.67487, 588.6923, 672.3413, 507.2165, 805.89075, 651.17914]
2025-05-07 20:06:02,140 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [31.0, 98.0, 36.0, 41.0, 132.0, 117.0, 140.0, 100.0, 166.0, 132.0]
2025-05-07 20:06:02,146 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 51 minutes, 29 seconds)
2025-05-07 20:10:14,204 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:10:16,676 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 566.75250 ± 241.864
2025-05-07 20:10:16,676 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [546.0488, 416.48764, 575.3947, 352.7618, 626.18, 518.665, 529.7042, 345.50598, 1242.8788, 513.89777]
2025-05-07 20:10:16,677 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [102.0, 79.0, 111.0, 66.0, 122.0, 103.0, 100.0, 71.0, 241.0, 97.0]
2025-05-07 20:10:16,677 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1124 [INFO]: New best (566.75) for latency ExtremeSparseL4U32
2025-05-07 20:10:16,677 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 20:10:16,681 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 20:10:16,693 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 46 minutes, 6 seconds)
2025-05-07 20:14:27,400 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:14:29,759 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 538.37952 ± 110.054
2025-05-07 20:14:29,760 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [458.67905, 593.19324, 489.061, 655.2589, 432.9101, 541.712, 352.4818, 714.2649, 481.938, 664.29626]
2025-05-07 20:14:29,760 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [89.0, 116.0, 96.0, 130.0, 88.0, 106.0, 62.0, 147.0, 98.0, 129.0]
2025-05-07 20:14:29,765 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 40 minutes, 58 seconds)
2025-05-07 20:18:45,068 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:18:47,901 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 603.58008 ± 76.567
2025-05-07 20:18:47,901 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [634.57654, 657.7384, 573.57587, 724.8264, 628.2195, 530.8143, 546.053, 458.92978, 696.4961, 584.57104]
2025-05-07 20:18:47,901 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [127.0, 127.0, 107.0, 137.0, 116.0, 101.0, 107.0, 93.0, 136.0, 108.0]
2025-05-07 20:18:47,901 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1124 [INFO]: New best (603.58) for latency ExtremeSparseL4U32
2025-05-07 20:18:47,902 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 20:18:47,906 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 20:18:47,920 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 36 minutes, 39 seconds)
2025-05-07 20:23:00,618 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:23:03,002 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 522.60504 ± 76.870
2025-05-07 20:23:03,002 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [568.3965, 645.3533, 351.87186, 484.88672, 549.33356, 500.51395, 547.10236, 536.5526, 589.3994, 452.64017]
2025-05-07 20:23:03,002 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [105.0, 123.0, 68.0, 104.0, 103.0, 112.0, 111.0, 103.0, 121.0, 93.0]
2025-05-07 20:23:03,009 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 32 minutes, 50 seconds)
2025-05-07 20:27:13,401 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:27:16,091 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 611.65149 ± 127.218
2025-05-07 20:27:16,092 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [766.1793, 390.935, 507.5948, 431.45078, 602.8603, 692.0943, 757.1197, 635.4602, 590.0018, 742.81854]
2025-05-07 20:27:16,092 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [146.0, 72.0, 97.0, 83.0, 121.0, 131.0, 146.0, 132.0, 106.0, 145.0]
2025-05-07 20:27:16,092 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1124 [INFO]: New best (611.65) for latency ExtremeSparseL4U32
2025-05-07 20:27:16,092 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 20:27:16,096 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 20:27:16,109 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 28 minutes, 4 seconds)
2025-05-07 20:31:32,752 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:31:35,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 553.35095 ± 170.654
2025-05-07 20:31:35,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [425.6196, 408.2361, 350.43832, 463.395, 548.85614, 523.75226, 921.562, 549.4692, 813.1682, 529.01276]
2025-05-07 20:31:35,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [78.0, 88.0, 66.0, 90.0, 106.0, 104.0, 194.0, 112.0, 162.0, 106.0]
2025-05-07 20:31:35,344 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 53/100 (estimated time remaining: 3 hours, 24 minutes, 35 seconds)
2025-05-07 20:35:43,573 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:35:46,057 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 552.05701 ± 149.342
2025-05-07 20:35:46,057 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [590.631, 679.7086, 653.70905, 174.78519, 630.55743, 482.2335, 584.6247, 501.62155, 487.03717, 735.66144]
2025-05-07 20:35:46,057 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [114.0, 132.0, 125.0, 34.0, 130.0, 94.0, 117.0, 96.0, 88.0, 157.0]
2025-05-07 20:35:46,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 54/100 (estimated time remaining: 3 hours, 19 minutes, 57 seconds)
2025-05-07 20:40:02,277 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:40:04,522 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 483.97568 ± 135.405
2025-05-07 20:40:04,523 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [191.48665, 638.35535, 344.71356, 553.27466, 460.3078, 497.6493, 447.23666, 557.1981, 461.1419, 688.3926]
2025-05-07 20:40:04,523 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [37.0, 121.0, 65.0, 107.0, 95.0, 96.0, 87.0, 116.0, 94.0, 129.0]
2025-05-07 20:40:04,529 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 55/100 (estimated time remaining: 3 hours, 15 minutes, 44 seconds)
2025-05-07 20:44:22,607 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:44:25,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 566.66400 ± 75.448
2025-05-07 20:44:25,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [437.87, 586.8811, 635.24646, 518.24896, 541.9725, 556.93335, 710.38324, 624.58026, 577.56256, 476.9617]
2025-05-07 20:44:25,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [81.0, 119.0, 116.0, 97.0, 108.0, 105.0, 141.0, 118.0, 109.0, 92.0]
2025-05-07 20:44:25,137 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 56/100 (estimated time remaining: 3 hours, 12 minutes, 19 seconds)
2025-05-07 20:48:37,979 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:48:40,853 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 627.95587 ± 166.150
2025-05-07 20:48:40,853 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [617.4914, 596.65765, 685.23694, 383.85703, 447.57394, 795.4774, 1000.40204, 636.8208, 596.2658, 519.77527]
2025-05-07 20:48:40,853 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [126.0, 118.0, 143.0, 79.0, 83.0, 160.0, 200.0, 117.0, 124.0, 113.0]
2025-05-07 20:48:40,853 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1124 [INFO]: New best (627.96) for latency ExtremeSparseL4U32
2025-05-07 20:48:40,853 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 20:48:40,857 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 20:48:40,871 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 57/100 (estimated time remaining: 3 hours, 8 minutes, 25 seconds)
2025-05-07 20:52:58,297 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:53:00,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 461.10361 ± 214.699
2025-05-07 20:53:00,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [825.9329, 550.9844, 195.0483, 181.24683, 472.4713, 478.25446, 577.56647, 145.77315, 699.2246, 484.53363]
2025-05-07 20:53:00,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [161.0, 108.0, 35.0, 35.0, 88.0, 96.0, 112.0, 28.0, 147.0, 95.0]
2025-05-07 20:53:00,343 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 58/100 (estimated time remaining: 3 hours, 4 minutes, 10 seconds)
2025-05-07 20:57:15,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:57:18,096 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 637.28430 ± 127.843
2025-05-07 20:57:18,096 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [505.17255, 572.43304, 488.12225, 535.70575, 654.633, 888.54407, 768.9954, 776.70654, 640.03485, 542.496]
2025-05-07 20:57:18,096 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [100.0, 107.0, 94.0, 108.0, 125.0, 167.0, 151.0, 154.0, 128.0, 108.0]
2025-05-07 20:57:18,097 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1124 [INFO]: New best (637.28) for latency ExtremeSparseL4U32
2025-05-07 20:57:18,097 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 20:57:18,101 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 20:57:18,115 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 59/100 (estimated time remaining: 3 hours, 53 seconds)
2025-05-07 21:01:37,951 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:01:40,928 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 613.45380 ± 137.391
2025-05-07 21:01:40,928 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [690.778, 694.93256, 732.72986, 480.86453, 442.3208, 505.19727, 916.9952, 525.7681, 563.84717, 581.10486]
2025-05-07 21:01:40,928 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [137.0, 133.0, 143.0, 98.0, 91.0, 100.0, 188.0, 99.0, 109.0, 115.0]
2025-05-07 21:01:40,935 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 57 minutes, 10 seconds)
2025-05-07 21:05:57,920 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:06:00,606 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 572.80408 ± 171.064
2025-05-07 21:06:00,607 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [762.5839, 403.63086, 628.4567, 509.33252, 573.9375, 169.2112, 731.7283, 653.11383, 734.4963, 561.54956]
2025-05-07 21:06:00,607 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [146.0, 91.0, 126.0, 108.0, 111.0, 33.0, 148.0, 122.0, 143.0, 112.0]
2025-05-07 21:06:00,615 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 52 minutes, 43 seconds)
2025-05-07 21:10:19,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:10:21,374 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 451.49445 ± 255.314
2025-05-07 21:10:21,374 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [513.6257, 536.7527, 828.88495, 513.3082, 174.77084, 174.69653, 624.3013, 156.49794, 164.06882, 828.0374]
2025-05-07 21:10:21,374 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [101.0, 103.0, 173.0, 109.0, 34.0, 34.0, 132.0, 30.0, 32.0, 151.0]
2025-05-07 21:10:21,382 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 49 minutes, 3 seconds)
2025-05-07 21:14:41,622 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:14:43,849 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 525.06989 ± 146.643
2025-05-07 21:14:43,850 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [156.26788, 752.71704, 559.70337, 530.2116, 547.80206, 483.969, 527.0478, 663.6535, 561.2818, 468.04498]
2025-05-07 21:14:43,850 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [30.0, 153.0, 107.0, 98.0, 115.0, 98.0, 99.0, 131.0, 112.0, 93.0]
2025-05-07 21:14:43,857 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 45 minutes, 6 seconds)
2025-05-07 21:19:10,234 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:19:13,115 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 584.17786 ± 78.224
2025-05-07 21:19:13,115 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [570.68005, 761.17114, 518.747, 512.6181, 511.34363, 590.08795, 622.44086, 512.4859, 568.1465, 674.0573]
2025-05-07 21:19:13,115 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [109.0, 154.0, 96.0, 97.0, 101.0, 111.0, 123.0, 99.0, 113.0, 141.0]
2025-05-07 21:19:13,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 42 minutes, 11 seconds)
2025-05-07 21:24:13,070 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:24:15,889 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 584.50830 ± 124.695
2025-05-07 21:24:15,889 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [674.089, 492.092, 573.7829, 677.2367, 399.8597, 393.98236, 532.58, 596.3527, 736.7544, 768.3532]
2025-05-07 21:24:15,890 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [131.0, 94.0, 120.0, 130.0, 76.0, 83.0, 102.0, 117.0, 141.0, 157.0]
2025-05-07 21:24:15,898 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 42 minutes, 35 seconds)
2025-05-07 21:28:36,567 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:28:39,236 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 571.22711 ± 128.367
2025-05-07 21:28:39,236 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [910.88336, 524.27856, 559.9259, 538.3718, 550.4571, 440.48642, 504.12372, 445.38785, 661.26337, 577.09344]
2025-05-07 21:28:39,236 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [178.0, 98.0, 106.0, 107.0, 101.0, 79.0, 92.0, 87.0, 125.0, 106.0]
2025-05-07 21:28:39,245 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 38 minutes, 30 seconds)
2025-05-07 21:32:54,342 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:32:56,593 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 483.16083 ± 228.562
2025-05-07 21:32:56,593 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [136.12585, 180.0596, 527.265, 807.5211, 591.4506, 223.78825, 413.92557, 703.1674, 748.6384, 499.66672]
2025-05-07 21:32:56,593 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [26.0, 35.0, 103.0, 171.0, 109.0, 44.0, 77.0, 137.0, 144.0, 100.0]
2025-05-07 21:32:56,602 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 33 minutes, 35 seconds)
2025-05-07 21:37:16,625 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:37:18,713 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 457.83246 ± 201.239
2025-05-07 21:37:18,713 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [524.0527, 136.04204, 549.3445, 640.2241, 212.35426, 181.74698, 386.61584, 681.2049, 667.43854, 599.30096]
2025-05-07 21:37:18,713 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [96.0, 26.0, 112.0, 132.0, 42.0, 35.0, 71.0, 131.0, 141.0, 114.0]
2025-05-07 21:37:18,723 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 29 minutes, 2 seconds)
2025-05-07 21:41:33,965 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:41:36,763 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 595.12842 ± 122.503
2025-05-07 21:41:36,764 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [933.92426, 622.78296, 560.5071, 572.3142, 480.12604, 497.1693, 644.48987, 528.8019, 558.6973, 552.47144]
2025-05-07 21:41:36,764 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [188.0, 117.0, 113.0, 110.0, 96.0, 105.0, 124.0, 104.0, 114.0, 104.0]
2025-05-07 21:41:36,771 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 23 minutes, 19 seconds)
2025-05-07 21:45:53,660 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:45:55,857 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 483.10596 ± 122.556
2025-05-07 21:45:55,857 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [458.4968, 471.8086, 559.6194, 474.73105, 474.4026, 156.04266, 606.8725, 613.74194, 469.56033, 545.78326]
2025-05-07 21:45:55,857 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [85.0, 92.0, 112.0, 95.0, 92.0, 30.0, 119.0, 115.0, 92.0, 101.0]
2025-05-07 21:45:55,866 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 70/100 (estimated time remaining: 2 hours, 14 minutes, 19 seconds)
2025-05-07 21:50:11,440 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:50:13,500 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 441.40851 ± 162.901
2025-05-07 21:50:13,500 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [510.9008, 544.6386, 100.24012, 167.28366, 564.4538, 557.53156, 389.39734, 467.89658, 563.2, 548.5423]
2025-05-07 21:50:13,500 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [103.0, 101.0, 23.0, 32.0, 119.0, 113.0, 73.0, 94.0, 122.0, 107.0]
2025-05-07 21:50:13,508 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 71/100 (estimated time remaining: 2 hours, 9 minutes, 25 seconds)
2025-05-07 21:54:31,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:54:33,093 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 440.85611 ± 225.515
2025-05-07 21:54:33,093 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [160.49884, 785.00085, 753.75006, 482.37912, 478.59637, 644.6995, 299.63272, 161.75673, 175.48566, 466.76138]
2025-05-07 21:54:33,093 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [31.0, 159.0, 153.0, 99.0, 93.0, 119.0, 55.0, 31.0, 34.0, 86.0]
2025-05-07 21:54:33,102 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 72/100 (estimated time remaining: 2 hours, 5 minutes, 19 seconds)
2025-05-07 21:58:41,494 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:58:43,879 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 536.02893 ± 104.578
2025-05-07 21:58:43,879 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [491.6494, 592.254, 483.34207, 451.56238, 740.1715, 667.79803, 616.0347, 461.05844, 431.0754, 425.34363]
2025-05-07 21:58:43,879 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [94.0, 122.0, 91.0, 98.0, 144.0, 131.0, 119.0, 87.0, 87.0, 84.0]
2025-05-07 21:58:43,888 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 59 minutes, 56 seconds)
2025-05-07 22:02:58,918 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:03:02,190 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 664.24182 ± 264.585
2025-05-07 22:03:02,190 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [1061.2743, 585.07886, 395.72995, 465.55652, 195.89386, 638.62823, 991.5336, 628.8956, 708.61273, 971.21454]
2025-05-07 22:03:02,191 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [213.0, 111.0, 72.0, 90.0, 38.0, 119.0, 202.0, 134.0, 134.0, 194.0]
2025-05-07 22:03:02,191 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1124 [INFO]: New best (664.24) for latency ExtremeSparseL4U32
2025-05-07 22:03:02,191 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 22:03:02,195 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 22:03:02,211 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 55 minutes, 41 seconds)
2025-05-07 22:07:15,796 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:07:18,429 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 553.92639 ± 278.501
2025-05-07 22:07:18,429 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [732.07904, 712.7748, 150.9416, 181.27823, 166.7377, 441.14703, 665.4965, 906.44855, 843.4196, 738.94086]
2025-05-07 22:07:18,429 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [138.0, 136.0, 29.0, 35.0, 32.0, 85.0, 130.0, 185.0, 171.0, 156.0]
2025-05-07 22:07:18,438 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 51 minutes, 9 seconds)
2025-05-07 22:11:29,050 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:11:31,494 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 543.56213 ± 251.914
2025-05-07 22:11:31,494 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [596.9861, 181.0063, 708.6718, 414.73438, 474.5616, 741.28815, 544.30304, 151.10106, 1051.2789, 571.6899]
2025-05-07 22:11:31,494 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [111.0, 37.0, 134.0, 90.0, 92.0, 155.0, 101.0, 29.0, 201.0, 106.0]
2025-05-07 22:11:31,502 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 46 minutes, 29 seconds)
2025-05-07 22:15:43,217 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:15:45,579 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 542.47668 ± 191.873
2025-05-07 22:15:45,580 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [442.2445, 627.09393, 634.05316, 509.19022, 605.6116, 965.8559, 503.39368, 404.51407, 560.8915, 171.91791]
2025-05-07 22:15:45,580 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [78.0, 119.0, 120.0, 98.0, 118.0, 194.0, 95.0, 75.0, 105.0, 33.0]
2025-05-07 22:15:45,589 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 41 minutes, 47 seconds)
2025-05-07 22:19:52,618 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:19:55,364 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 599.77283 ± 241.465
2025-05-07 22:19:55,364 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [114.817604, 825.42914, 761.34625, 769.5334, 328.56448, 446.3762, 578.0547, 974.807, 643.0919, 555.7078]
2025-05-07 22:19:55,364 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [22.0, 161.0, 147.0, 151.0, 65.0, 99.0, 114.0, 189.0, 128.0, 110.0]
2025-05-07 22:19:55,373 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 37 minutes, 28 seconds)
2025-05-07 22:24:05,993 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:24:08,692 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 584.87689 ± 175.586
2025-05-07 22:24:08,692 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [681.6049, 713.98724, 175.2264, 529.06244, 651.6892, 754.08673, 462.25177, 451.22552, 790.9319, 638.7038]
2025-05-07 22:24:08,692 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [131.0, 135.0, 34.0, 104.0, 126.0, 155.0, 93.0, 85.0, 147.0, 122.0]
2025-05-07 22:24:08,701 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 32 minutes, 52 seconds)
2025-05-07 22:28:20,320 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:28:22,683 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 508.79965 ± 220.316
2025-05-07 22:28:22,684 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [157.20668, 240.53914, 145.45753, 610.72064, 738.04724, 714.192, 666.04614, 604.14307, 608.903, 602.74115]
2025-05-07 22:28:22,684 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [30.0, 47.0, 28.0, 118.0, 143.0, 144.0, 128.0, 114.0, 114.0, 117.0]
2025-05-07 22:28:22,692 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 28 minutes, 29 seconds)
2025-05-07 22:32:34,826 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:32:37,404 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 584.78778 ± 210.719
2025-05-07 22:32:37,404 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [461.00156, 1030.2744, 550.34985, 655.39465, 425.11713, 179.7922, 526.59467, 732.7781, 678.0892, 608.48566]
2025-05-07 22:32:37,404 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [89.0, 192.0, 110.0, 129.0, 82.0, 35.0, 96.0, 137.0, 144.0, 121.0]
2025-05-07 22:32:37,414 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 24 minutes, 23 seconds)
2025-05-07 22:36:49,871 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:36:52,997 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 708.45972 ± 122.138
2025-05-07 22:36:52,997 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [1026.6844, 716.1239, 676.88983, 623.4713, 649.16034, 666.6172, 673.9229, 705.0087, 549.4719, 797.24677]
2025-05-07 22:36:52,997 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [201.0, 135.0, 133.0, 117.0, 123.0, 127.0, 136.0, 143.0, 108.0, 154.0]
2025-05-07 22:36:52,997 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1124 [INFO]: New best (708.46) for latency ExtremeSparseL4U32
2025-05-07 22:36:52,998 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 22:36:53,001 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 22:36:53,017 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 20 minutes, 16 seconds)
2025-05-07 22:41:07,489 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:41:09,973 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 578.26868 ± 153.711
2025-05-07 22:41:09,973 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [575.2081, 528.2054, 666.3484, 571.9043, 525.36414, 829.7621, 647.7988, 598.03796, 648.72577, 191.33185]
2025-05-07 22:41:09,973 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [108.0, 105.0, 134.0, 104.0, 100.0, 170.0, 134.0, 112.0, 120.0, 37.0]
2025-05-07 22:41:09,982 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 16 minutes, 28 seconds)
2025-05-07 22:45:21,196 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:45:23,358 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 483.15536 ± 270.395
2025-05-07 22:45:23,358 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [587.8694, 610.10583, 765.0809, 327.98044, 171.81665, 651.4578, 238.5746, 1015.0436, 287.33212, 176.29227]
2025-05-07 22:45:23,359 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [115.0, 117.0, 144.0, 59.0, 33.0, 122.0, 43.0, 198.0, 51.0, 34.0]
2025-05-07 22:45:23,370 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 12 minutes, 13 seconds)
2025-05-07 22:49:33,986 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:49:36,511 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 576.41187 ± 88.198
2025-05-07 22:49:36,511 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [551.8208, 598.4955, 613.68616, 628.1623, 666.4965, 670.4809, 462.5392, 546.45996, 642.90326, 383.07434]
2025-05-07 22:49:36,511 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [97.0, 118.0, 116.0, 124.0, 124.0, 121.0, 92.0, 100.0, 122.0, 70.0]
2025-05-07 22:49:36,520 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 7 minutes, 56 seconds)
2025-05-07 22:53:49,151 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:53:52,014 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 614.38898 ± 189.843
2025-05-07 22:53:52,015 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [695.0631, 1087.7667, 566.60297, 291.8778, 581.95245, 537.62964, 561.4024, 515.2078, 631.86096, 674.52594]
2025-05-07 22:53:52,015 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [134.0, 202.0, 107.0, 58.0, 120.0, 104.0, 109.0, 101.0, 121.0, 135.0]
2025-05-07 22:53:52,025 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 86/100 (estimated time remaining: 1 hour, 3 minutes, 43 seconds)
2025-05-07 22:58:06,476 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:58:08,811 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 520.49677 ± 154.134
2025-05-07 22:58:08,811 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [162.32664, 616.1649, 466.48044, 614.6115, 531.60974, 411.9788, 795.54913, 542.5881, 537.3256, 526.3331]
2025-05-07 22:58:08,811 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [31.0, 121.0, 93.0, 118.0, 104.0, 76.0, 146.0, 111.0, 108.0, 96.0]
2025-05-07 22:58:08,822 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 87/100 (estimated time remaining: 59 minutes, 32 seconds)
2025-05-07 23:02:25,942 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:02:28,931 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 655.87805 ± 139.605
2025-05-07 23:02:28,931 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [526.35583, 933.6396, 583.49146, 630.5744, 576.29504, 898.54663, 691.9727, 525.80725, 643.4146, 548.68286]
2025-05-07 23:02:28,931 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [98.0, 176.0, 120.0, 123.0, 121.0, 170.0, 140.0, 100.0, 132.0, 99.0]
2025-05-07 23:02:28,940 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 88/100 (estimated time remaining: 55 minutes, 25 seconds)
2025-05-07 23:06:35,928 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:06:38,258 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 522.94373 ± 270.415
2025-05-07 23:06:38,258 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [549.604, 922.29333, 725.89716, 801.9209, 592.5914, 693.5296, 155.55223, 492.54962, 129.80315, 165.69604]
2025-05-07 23:06:38,258 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [104.0, 193.0, 135.0, 153.0, 122.0, 138.0, 30.0, 95.0, 25.0, 32.0]
2025-05-07 23:06:38,268 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 89/100 (estimated time remaining: 50 minutes, 59 seconds)
2025-05-07 23:10:47,357 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:10:49,605 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 519.73267 ± 142.965
2025-05-07 23:10:49,605 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [634.0543, 334.7519, 503.4917, 655.84314, 616.63586, 582.52155, 181.94025, 561.11505, 615.1409, 511.83136]
2025-05-07 23:10:49,605 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [116.0, 75.0, 96.0, 124.0, 120.0, 116.0, 37.0, 111.0, 116.0, 101.0]
2025-05-07 23:10:49,615 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 90/100 (estimated time remaining: 46 minutes, 40 seconds)
2025-05-07 23:15:08,213 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:15:11,022 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 569.85492 ± 152.346
2025-05-07 23:15:11,022 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [632.48315, 452.60843, 756.165, 642.42786, 344.26797, 427.59967, 408.18335, 712.5961, 519.41785, 802.7999]
2025-05-07 23:15:11,022 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [120.0, 83.0, 161.0, 122.0, 65.0, 78.0, 86.0, 143.0, 97.0, 168.0]
2025-05-07 23:15:11,033 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 91/100 (estimated time remaining: 42 minutes, 38 seconds)
2025-05-07 23:19:18,599 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:19:21,292 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 559.34119 ± 98.559
2025-05-07 23:19:21,292 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [671.98535, 534.7362, 533.56036, 612.47546, 573.80835, 775.5437, 446.0276, 462.70535, 522.6082, 459.96127]
2025-05-07 23:19:21,292 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [127.0, 103.0, 98.0, 117.0, 111.0, 150.0, 94.0, 94.0, 106.0, 96.0]
2025-05-07 23:19:21,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 92/100 (estimated time remaining: 38 minutes, 10 seconds)
2025-05-07 23:23:38,423 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:23:40,470 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 477.92221 ± 258.104
2025-05-07 23:23:40,470 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [312.38177, 988.1475, 392.4723, 663.66235, 563.47784, 607.88116, 659.4768, 129.81703, 365.68588, 96.21919]
2025-05-07 23:23:40,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [55.0, 185.0, 76.0, 130.0, 108.0, 113.0, 128.0, 25.0, 77.0, 19.0]
2025-05-07 23:23:40,480 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 93/100 (estimated time remaining: 33 minutes, 54 seconds)
2025-05-07 23:27:46,535 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:27:49,097 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 575.77655 ± 226.170
2025-05-07 23:27:49,097 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [1155.3477, 438.89273, 548.2201, 357.6978, 428.78152, 713.6168, 472.8643, 680.94183, 599.5179, 361.8848]
2025-05-07 23:27:49,097 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [241.0, 79.0, 111.0, 66.0, 80.0, 140.0, 96.0, 129.0, 110.0, 64.0]
2025-05-07 23:27:49,108 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 94/100 (estimated time remaining: 29 minutes, 39 seconds)
2025-05-07 23:32:03,528 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:32:06,577 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 634.77167 ± 174.634
2025-05-07 23:32:06,577 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [633.79926, 611.5199, 564.23895, 664.9108, 615.0742, 1093.596, 722.85315, 422.0509, 468.526, 551.1481]
2025-05-07 23:32:06,578 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [124.0, 115.0, 121.0, 127.0, 126.0, 229.0, 139.0, 78.0, 86.0, 112.0]
2025-05-07 23:32:06,590 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 95/100 (estimated time remaining: 25 minutes, 32 seconds)
2025-05-07 23:36:21,479 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:36:24,696 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 663.13885 ± 92.136
2025-05-07 23:36:24,696 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [878.6596, 724.98126, 615.7841, 612.81586, 755.21344, 679.8283, 585.3185, 595.3115, 580.95514, 602.5211]
2025-05-07 23:36:24,697 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [169.0, 135.0, 129.0, 115.0, 149.0, 133.0, 114.0, 121.0, 112.0, 115.0]
2025-05-07 23:36:24,708 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 96/100 (estimated time remaining: 21 minutes, 13 seconds)
2025-05-07 23:40:39,268 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:40:41,694 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 528.38611 ± 169.515
2025-05-07 23:40:41,694 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [570.0193, 492.9487, 324.11862, 191.00081, 426.4078, 600.26764, 705.4337, 617.9691, 557.0259, 798.6693]
2025-05-07 23:40:41,694 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [114.0, 91.0, 57.0, 37.0, 79.0, 116.0, 132.0, 120.0, 104.0, 162.0]
2025-05-07 23:40:41,707 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 97/100 (estimated time remaining: 17 minutes, 4 seconds)
2025-05-07 23:44:57,185 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:44:59,523 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 509.33286 ± 179.512
2025-05-07 23:44:59,523 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [542.312, 602.78735, 613.95496, 180.45, 688.8477, 591.6421, 586.8462, 135.867, 583.1033, 567.51807]
2025-05-07 23:44:59,523 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [100.0, 115.0, 117.0, 35.0, 137.0, 112.0, 107.0, 26.0, 110.0, 107.0]
2025-05-07 23:44:59,534 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 98/100 (estimated time remaining: 12 minutes, 47 seconds)
2025-05-07 23:49:06,484 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:49:08,884 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 541.07733 ± 189.672
2025-05-07 23:49:08,884 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [523.243, 665.32715, 652.0494, 647.92944, 757.34973, 154.17441, 211.5872, 613.8024, 640.7425, 544.5676]
2025-05-07 23:49:08,884 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [97.0, 133.0, 134.0, 122.0, 143.0, 30.0, 41.0, 128.0, 122.0, 100.0]
2025-05-07 23:49:08,896 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 99/100 (estimated time remaining: 8 minutes, 31 seconds)
2025-05-07 23:53:13,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:53:15,817 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 545.49420 ± 146.486
2025-05-07 23:53:15,817 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [728.05444, 540.95325, 277.3534, 503.9211, 447.6445, 607.0725, 676.1364, 576.2069, 353.60156, 743.9977]
2025-05-07 23:53:15,817 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [152.0, 102.0, 51.0, 95.0, 83.0, 120.0, 132.0, 109.0, 64.0, 149.0]
2025-05-07 23:53:15,828 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 100/100 (estimated time remaining: 4 minutes, 13 seconds)
2025-05-07 23:57:16,608 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:57:19,080 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 556.63202 ± 175.915
2025-05-07 23:57:19,080 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [307.48438, 170.82478, 625.9718, 529.4526, 644.8751, 689.8908, 674.5227, 787.29694, 574.4576, 561.54395]
2025-05-07 23:57:19,080 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [54.0, 33.0, 133.0, 108.0, 123.0, 131.0, 129.0, 153.0, 105.0, 111.0]
2025-05-07 23:57:19,091 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1149 [DEBUG]: Training session finished
