2025-05-07 15:48:17,498 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1006 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-sac
2025-05-07 15:48:17,498 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1007 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-sac
2025-05-07 15:48:17,498 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1008 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x70726a5c5f10>}
2025-05-07 15:48:17,498 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1009 [DEBUG]: using device: cpu
2025-05-07 15:48:17,502 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1031 [INFO]: Creating new trainer
2025-05-07 15:48:17,509 baseline-sac-noisy-humanoid:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=376, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-05-07 15:48:17,509 baseline-sac-noisy-humanoid:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-07 15:48:17,831 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1092 [DEBUG]: Starting training session...
2025-05-07 15:48:17,831 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 1/100
2025-05-07 15:51:32,140 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:51:32,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 165.86424 ± 32.259
2025-05-07 15:51:32,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [152.67207, 147.68741, 237.04323, 147.89038, 138.29669, 167.53175, 145.3409, 148.00235, 154.61888, 219.55867]
2025-05-07 15:51:32,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [32.0, 32.0, 48.0, 31.0, 30.0, 35.0, 31.0, 31.0, 33.0, 43.0]
2025-05-07 15:51:32,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1124 [INFO]: New best (165.86) for latency ExtremeSparseL4U32
2025-05-07 15:51:32,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 15:51:32,786 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-sac/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 15:51:32,793 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 2/100 (estimated time remaining: 5 hours, 21 minutes, 41 seconds)
2025-05-07 15:55:08,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:55:09,127 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 235.08130 ± 55.765
2025-05-07 15:55:09,127 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [341.4365, 272.02448, 275.90286, 238.41277, 140.74075, 174.79819, 208.60324, 266.26868, 246.57143, 186.05383]
2025-05-07 15:55:09,127 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [67.0, 59.0, 58.0, 49.0, 27.0, 34.0, 44.0, 52.0, 53.0, 38.0]
2025-05-07 15:55:09,128 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1124 [INFO]: New best (235.08) for latency ExtremeSparseL4U32
2025-05-07 15:55:09,128 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 15:55:09,131 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-sac/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 15:55:09,137 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 3/100 (estimated time remaining: 5 hours, 35 minutes, 53 seconds)
2025-05-07 15:58:45,943 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:58:46,719 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 224.60684 ± 60.223
2025-05-07 15:58:46,719 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [223.37788, 362.07428, 193.5235, 251.16348, 257.97873, 265.63553, 181.13818, 202.78072, 136.5859, 171.81012]
2025-05-07 15:58:46,720 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [45.0, 71.0, 40.0, 49.0, 50.0, 53.0, 39.0, 40.0, 28.0, 33.0]
2025-05-07 15:58:46,721 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 4/100 (estimated time remaining: 5 hours, 38 minutes, 54 seconds)
2025-05-07 16:02:24,209 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:02:25,564 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 396.86249 ± 74.841
2025-05-07 16:02:25,564 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [413.06137, 407.01202, 461.18976, 564.35614, 349.1864, 346.47186, 433.36548, 300.29883, 385.4456, 308.23734]
2025-05-07 16:02:25,564 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [79.0, 77.0, 84.0, 106.0, 74.0, 68.0, 84.0, 56.0, 73.0, 67.0]
2025-05-07 16:02:25,564 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1124 [INFO]: New best (396.86) for latency ExtremeSparseL4U32
2025-05-07 16:02:25,565 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 16:02:25,568 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-sac/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 16:02:25,576 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 5/100 (estimated time remaining: 5 hours, 39 minutes, 5 seconds)
2025-05-07 16:06:03,098 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:06:04,175 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 294.38986 ± 110.631
2025-05-07 16:06:04,175 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [556.25366, 292.2144, 288.84296, 295.0341, 266.8144, 288.4705, 206.13982, 151.12259, 188.54654, 410.45963]
2025-05-07 16:06:04,175 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [117.0, 63.0, 63.0, 60.0, 60.0, 62.0, 41.0, 29.0, 37.0, 81.0]
2025-05-07 16:06:04,177 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 6/100 (estimated time remaining: 5 hours, 37 minutes, 40 seconds)
2025-05-07 16:09:41,278 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:09:42,621 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 398.12790 ± 107.422
2025-05-07 16:09:42,622 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [353.89844, 375.31573, 365.8712, 373.2944, 461.6868, 344.847, 264.36127, 689.37115, 395.21002, 357.4231]
2025-05-07 16:09:42,622 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [64.0, 68.0, 69.0, 72.0, 90.0, 64.0, 49.0, 128.0, 75.0, 70.0]
2025-05-07 16:09:42,622 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1124 [INFO]: New best (398.13) for latency ExtremeSparseL4U32
2025-05-07 16:09:42,622 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 16:09:42,626 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-sac/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 16:09:42,633 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 7/100 (estimated time remaining: 5 hours, 41 minutes, 29 seconds)
2025-05-07 16:13:20,654 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:13:21,967 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 369.25165 ± 77.374
2025-05-07 16:13:21,967 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [334.86548, 252.77946, 409.62698, 451.76355, 457.4267, 341.13867, 334.15823, 303.1573, 504.12057, 303.47964]
2025-05-07 16:13:21,967 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [65.0, 52.0, 76.0, 88.0, 90.0, 65.0, 69.0, 61.0, 109.0, 57.0]
2025-05-07 16:13:21,969 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 8/100 (estimated time remaining: 5 hours, 38 minutes, 46 seconds)
2025-05-07 16:17:00,136 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:17:01,322 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 329.62814 ± 86.974
2025-05-07 16:17:01,322 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [235.4712, 297.05875, 309.54254, 365.13733, 207.9706, 360.8023, 526.96545, 294.97116, 409.904, 288.45804]
2025-05-07 16:17:01,322 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [45.0, 58.0, 62.0, 81.0, 42.0, 81.0, 104.0, 58.0, 77.0, 58.0]
2025-05-07 16:17:01,323 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 9/100 (estimated time remaining: 5 hours, 35 minutes, 40 seconds)
2025-05-07 16:20:40,407 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:20:41,266 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 256.46765 ± 100.860
2025-05-07 16:20:41,267 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [227.98576, 426.2136, 85.544754, 223.14462, 330.94858, 383.95544, 300.2826, 230.17249, 127.62997, 228.79874]
2025-05-07 16:20:41,267 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [44.0, 82.0, 18.0, 44.0, 63.0, 77.0, 58.0, 44.0, 25.0, 43.0]
2025-05-07 16:20:41,268 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 10/100 (estimated time remaining: 5 hours, 32 minutes, 21 seconds)
2025-05-07 16:24:22,239 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:24:23,579 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 371.16617 ± 63.556
2025-05-07 16:24:23,579 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [322.45963, 401.8706, 299.6585, 382.59088, 424.786, 266.8714, 372.23492, 496.03882, 407.82053, 337.33038]
2025-05-07 16:24:23,579 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [61.0, 86.0, 60.0, 77.0, 81.0, 53.0, 69.0, 103.0, 76.0, 64.0]
2025-05-07 16:24:23,581 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 11/100 (estimated time remaining: 5 hours, 29 minutes, 49 seconds)
2025-05-07 16:28:02,542 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:28:03,696 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 313.03796 ± 79.442
2025-05-07 16:28:03,696 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [457.01807, 370.55524, 259.455, 407.88333, 249.34465, 308.22668, 367.75626, 276.64062, 205.25833, 228.24152]
2025-05-07 16:28:03,696 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [100.0, 75.0, 58.0, 75.0, 49.0, 61.0, 73.0, 55.0, 45.0, 48.0]
2025-05-07 16:28:03,698 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 12/100 (estimated time remaining: 5 hours, 26 minutes, 38 seconds)
2025-05-07 16:31:43,693 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:31:44,888 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 331.69986 ± 97.783
2025-05-07 16:31:44,888 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [570.07495, 273.8185, 425.595, 292.3105, 304.715, 365.6675, 344.3353, 238.93056, 224.99321, 276.55807]
2025-05-07 16:31:44,888 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [111.0, 55.0, 84.0, 56.0, 67.0, 72.0, 73.0, 48.0, 47.0, 60.0]
2025-05-07 16:31:44,890 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 13/100 (estimated time remaining: 5 hours, 23 minutes, 31 seconds)
2025-05-07 16:35:25,658 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:35:26,931 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 348.31082 ± 93.134
2025-05-07 16:35:26,932 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [299.5704, 518.9614, 506.36755, 308.8416, 256.57452, 260.20935, 334.29074, 369.3135, 245.20653, 383.77252]
2025-05-07 16:35:26,932 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [56.0, 101.0, 97.0, 70.0, 58.0, 58.0, 75.0, 77.0, 50.0, 72.0]
2025-05-07 16:35:26,934 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 14/100 (estimated time remaining: 5 hours, 20 minutes, 37 seconds)
2025-05-07 16:39:09,270 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:39:10,225 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 278.96765 ± 81.731
2025-05-07 16:39:10,225 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [250.12527, 322.88318, 156.98131, 176.3673, 348.27255, 254.99757, 321.8833, 359.07126, 189.66373, 409.43094]
2025-05-07 16:39:10,225 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [47.0, 65.0, 30.0, 34.0, 67.0, 51.0, 61.0, 70.0, 42.0, 77.0]
2025-05-07 16:39:10,227 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 15/100 (estimated time remaining: 5 hours, 17 minutes, 54 seconds)
2025-05-07 16:42:54,715 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:42:55,776 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 304.25055 ± 58.845
2025-05-07 16:42:55,776 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [297.242, 230.83812, 258.42868, 216.96828, 369.31073, 308.43628, 323.84567, 425.657, 312.68427, 299.0943]
2025-05-07 16:42:55,776 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [56.0, 46.0, 52.0, 46.0, 75.0, 62.0, 58.0, 80.0, 63.0, 57.0]
2025-05-07 16:42:55,779 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 16/100 (estimated time remaining: 5 hours, 15 minutes, 7 seconds)
2025-05-07 16:46:39,627 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:46:40,758 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 325.04999 ± 62.488
2025-05-07 16:46:40,758 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [242.3323, 351.84363, 468.5601, 349.7033, 332.69577, 294.0774, 274.57812, 323.17502, 361.75635, 251.77757]
2025-05-07 16:46:40,758 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [45.0, 66.0, 88.0, 65.0, 68.0, 56.0, 52.0, 60.0, 80.0, 49.0]
2025-05-07 16:46:40,761 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 17/100 (estimated time remaining: 5 hours, 12 minutes, 46 seconds)
2025-05-07 16:50:25,729 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:50:26,850 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 315.70526 ± 130.417
2025-05-07 16:50:26,850 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [172.17204, 186.8752, 339.35284, 397.082, 200.0052, 150.67633, 268.03622, 460.47284, 502.70026, 479.68]
2025-05-07 16:50:26,850 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [33.0, 36.0, 64.0, 78.0, 40.0, 29.0, 50.0, 86.0, 96.0, 100.0]
2025-05-07 16:50:26,853 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 18/100 (estimated time remaining: 5 hours, 10 minutes, 24 seconds)
2025-05-07 16:54:14,669 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:54:15,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 334.07260 ± 97.631
2025-05-07 16:54:15,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [400.53506, 421.8311, 486.68228, 279.12128, 286.77594, 313.77267, 114.05187, 297.0815, 340.27786, 400.59665]
2025-05-07 16:54:15,821 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [75.0, 77.0, 89.0, 52.0, 58.0, 60.0, 22.0, 58.0, 73.0, 74.0]
2025-05-07 16:54:15,823 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 19/100 (estimated time remaining: 5 hours, 8 minutes, 33 seconds)
2025-05-07 16:58:02,635 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:58:03,536 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 280.21246 ± 64.235
2025-05-07 16:58:03,536 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [244.63754, 236.55742, 325.14838, 191.89235, 212.08823, 273.54648, 275.6086, 354.7672, 273.64368, 414.23474]
2025-05-07 16:58:03,536 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [44.0, 47.0, 62.0, 39.0, 40.0, 51.0, 53.0, 64.0, 51.0, 77.0]
2025-05-07 16:58:03,538 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 20/100 (estimated time remaining: 5 hours, 5 minutes, 59 seconds)
2025-05-07 17:01:49,194 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:01:50,289 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 328.35199 ± 74.187
2025-05-07 17:01:50,289 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [220.50554, 271.87238, 310.27234, 297.69772, 413.6622, 472.456, 268.02725, 278.7752, 360.4019, 389.84943]
2025-05-07 17:01:50,289 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [42.0, 52.0, 56.0, 56.0, 79.0, 86.0, 51.0, 52.0, 66.0, 73.0]
2025-05-07 17:01:50,292 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 21/100 (estimated time remaining: 5 hours, 2 minutes, 32 seconds)
2025-05-07 17:05:37,038 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:05:37,951 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 277.93127 ± 53.649
2025-05-07 17:05:37,951 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [257.27002, 216.01604, 315.01328, 364.69717, 261.32904, 229.99129, 239.03104, 350.37036, 216.9817, 328.6128]
2025-05-07 17:05:37,951 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [49.0, 41.0, 58.0, 66.0, 49.0, 45.0, 45.0, 64.0, 43.0, 61.0]
2025-05-07 17:05:37,954 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 22/100 (estimated time remaining: 4 hours, 59 minutes, 27 seconds)
2025-05-07 17:09:26,880 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:09:28,123 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 352.65198 ± 89.825
2025-05-07 17:09:28,123 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [312.10144, 420.0273, 290.06903, 370.744, 496.47882, 283.06992, 251.67694, 487.26843, 386.09824, 228.9856]
2025-05-07 17:09:28,123 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [57.0, 77.0, 57.0, 69.0, 93.0, 55.0, 56.0, 96.0, 75.0, 44.0]
2025-05-07 17:09:28,126 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 23/100 (estimated time remaining: 4 hours, 56 minutes, 43 seconds)
2025-05-07 17:13:17,257 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:13:18,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 339.11591 ± 98.877
2025-05-07 17:13:18,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [342.38162, 160.42342, 434.0806, 372.83646, 284.08215, 372.96405, 268.42075, 372.5394, 250.56361, 532.8668]
2025-05-07 17:13:18,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [63.0, 31.0, 78.0, 73.0, 53.0, 69.0, 50.0, 69.0, 54.0, 104.0]
2025-05-07 17:13:18,474 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 53 minutes, 16 seconds)
2025-05-07 17:17:09,902 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:17:10,971 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 308.68860 ± 74.187
2025-05-07 17:17:10,971 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [209.03714, 371.8625, 371.36865, 262.90814, 246.53194, 233.67162, 377.22736, 231.64178, 416.91904, 365.71768]
2025-05-07 17:17:10,971 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [40.0, 72.0, 72.0, 51.0, 52.0, 47.0, 72.0, 43.0, 79.0, 66.0]
2025-05-07 17:17:10,975 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 25/100 (estimated time remaining: 4 hours, 50 minutes, 41 seconds)
2025-05-07 17:21:00,051 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:21:01,160 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 312.40930 ± 112.076
2025-05-07 17:21:01,160 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [101.73408, 417.02652, 471.17328, 381.44525, 213.58759, 243.84254, 227.52766, 347.78424, 279.80203, 440.16998]
2025-05-07 17:21:01,160 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [20.0, 78.0, 98.0, 74.0, 42.0, 49.0, 44.0, 64.0, 52.0, 83.0]
2025-05-07 17:21:01,163 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 47 minutes, 43 seconds)
2025-05-07 17:24:49,972 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:24:51,321 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 374.84695 ± 63.320
2025-05-07 17:24:51,321 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [253.05893, 399.0577, 353.96512, 516.2932, 380.53833, 345.6422, 384.0051, 406.6736, 327.8622, 381.37277]
2025-05-07 17:24:51,321 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [46.0, 87.0, 65.0, 92.0, 69.0, 74.0, 81.0, 74.0, 67.0, 78.0]
2025-05-07 17:24:51,325 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 27/100 (estimated time remaining: 4 hours, 44 minutes, 29 seconds)
2025-05-07 17:28:41,570 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:28:42,743 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 342.96371 ± 125.634
2025-05-07 17:28:42,743 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [258.61185, 87.09457, 385.19388, 448.50388, 459.6506, 263.24008, 561.56696, 303.23782, 368.16272, 294.37494]
2025-05-07 17:28:42,743 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [48.0, 19.0, 69.0, 86.0, 84.0, 51.0, 105.0, 56.0, 75.0, 61.0]
2025-05-07 17:28:42,746 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 28/100 (estimated time remaining: 4 hours, 40 minutes, 57 seconds)
2025-05-07 17:32:28,507 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:32:29,459 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 277.40341 ± 91.760
2025-05-07 17:32:29,459 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [252.24638, 331.0137, 302.97296, 292.05576, 452.07312, 280.46277, 211.38153, 130.39244, 154.00296, 367.4327]
2025-05-07 17:32:29,459 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [46.0, 61.0, 53.0, 56.0, 91.0, 57.0, 42.0, 25.0, 30.0, 68.0]
2025-05-07 17:32:29,462 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 29/100 (estimated time remaining: 4 hours, 36 minutes, 14 seconds)
2025-05-07 17:36:15,276 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:36:16,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 372.75531 ± 39.588
2025-05-07 17:36:16,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [381.37503, 353.1909, 335.91873, 376.5034, 400.04602, 439.90677, 413.7072, 353.56348, 381.0569, 292.2847]
2025-05-07 17:36:16,576 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [67.0, 66.0, 65.0, 71.0, 76.0, 84.0, 87.0, 68.0, 80.0, 59.0]
2025-05-07 17:36:16,579 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 31 minutes, 7 seconds)
2025-05-07 17:40:04,113 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:40:05,265 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 320.51889 ± 108.661
2025-05-07 17:40:05,265 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [204.46301, 325.12952, 313.07843, 386.54407, 260.91495, 298.33493, 295.06125, 220.82417, 611.3339, 289.5046]
2025-05-07 17:40:05,265 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [39.0, 67.0, 58.0, 88.0, 55.0, 56.0, 54.0, 42.0, 113.0, 61.0]
2025-05-07 17:40:05,269 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 26 minutes, 57 seconds)
2025-05-07 17:43:51,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:43:52,846 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 457.36920 ± 110.662
2025-05-07 17:43:52,846 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [268.78873, 416.31665, 456.70508, 641.90155, 529.7001, 327.67554, 582.9641, 528.52966, 366.46863, 454.64178]
2025-05-07 17:43:52,846 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [47.0, 83.0, 85.0, 144.0, 104.0, 67.0, 116.0, 97.0, 74.0, 96.0]
2025-05-07 17:43:52,846 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1124 [INFO]: New best (457.37) for latency ExtremeSparseL4U32
2025-05-07 17:43:52,847 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 17:43:52,852 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-sac/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 17:43:52,860 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 32/100 (estimated time remaining: 4 hours, 22 minutes, 33 seconds)
2025-05-07 17:47:40,698 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:47:41,804 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 326.80914 ± 103.442
2025-05-07 17:47:41,804 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [217.65654, 426.2098, 519.2459, 327.51962, 352.4804, 191.16972, 230.57903, 249.24731, 443.3451, 310.63773]
2025-05-07 17:47:41,804 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [39.0, 79.0, 92.0, 60.0, 66.0, 37.0, 44.0, 46.0, 82.0, 58.0]
2025-05-07 17:47:41,808 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 18 minutes, 11 seconds)
2025-05-07 17:51:27,456 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:51:28,518 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 297.84213 ± 73.141
2025-05-07 17:51:28,518 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [151.46843, 374.3399, 344.38864, 188.88849, 379.89197, 338.115, 333.701, 252.75917, 320.17865, 294.6902]
2025-05-07 17:51:28,518 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [29.0, 71.0, 65.0, 37.0, 73.0, 65.0, 61.0, 51.0, 60.0, 55.0]
2025-05-07 17:51:28,522 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 34/100 (estimated time remaining: 4 hours, 14 minutes, 23 seconds)
2025-05-07 17:55:15,483 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:55:16,399 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 266.95178 ± 72.054
2025-05-07 17:55:16,400 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [153.0956, 146.3735, 269.55795, 300.089, 289.86783, 330.6993, 340.97678, 190.78905, 345.54114, 302.52762]
2025-05-07 17:55:16,400 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [30.0, 28.0, 49.0, 58.0, 54.0, 62.0, 64.0, 37.0, 64.0, 56.0]
2025-05-07 17:55:16,403 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 35/100 (estimated time remaining: 4 hours, 10 minutes, 45 seconds)
2025-05-07 17:59:04,988 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:59:06,237 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 349.27045 ± 80.608
2025-05-07 17:59:06,238 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [301.1743, 345.1839, 441.4497, 199.25562, 444.52075, 345.3462, 373.11163, 450.8189, 350.61987, 241.22363]
2025-05-07 17:59:06,238 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [55.0, 62.0, 81.0, 39.0, 83.0, 63.0, 67.0, 94.0, 64.0, 46.0]
2025-05-07 17:59:06,242 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 36/100 (estimated time remaining: 4 hours, 7 minutes, 12 seconds)
2025-05-07 18:02:54,005 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:02:55,273 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 337.63074 ± 80.610
2025-05-07 18:02:55,273 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [328.13715, 502.53754, 386.75397, 324.30835, 226.82541, 369.04126, 301.30423, 223.99251, 298.05157, 415.35556]
2025-05-07 18:02:55,273 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [59.0, 97.0, 86.0, 62.0, 44.0, 71.0, 68.0, 44.0, 62.0, 95.0]
2025-05-07 18:02:55,277 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 37/100 (estimated time remaining: 4 hours, 3 minutes, 42 seconds)
2025-05-07 18:06:42,534 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:06:43,985 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 394.94919 ± 121.549
2025-05-07 18:06:43,985 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [431.94235, 276.79675, 390.4349, 292.6635, 473.80652, 380.45813, 690.23553, 233.53069, 356.64874, 422.97495]
2025-05-07 18:06:43,985 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [78.0, 52.0, 76.0, 55.0, 89.0, 72.0, 148.0, 45.0, 66.0, 82.0]
2025-05-07 18:06:43,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 59 minutes, 51 seconds)
2025-05-07 18:10:31,470 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:10:32,972 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 406.49048 ± 60.267
2025-05-07 18:10:32,972 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [378.38586, 446.81335, 512.70245, 362.40378, 497.90057, 332.6321, 432.48462, 380.84613, 385.68073, 335.0551]
2025-05-07 18:10:32,972 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [70.0, 82.0, 94.0, 68.0, 95.0, 61.0, 79.0, 69.0, 73.0, 61.0]
2025-05-07 18:10:32,977 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 56 minutes, 31 seconds)
2025-05-07 18:14:23,044 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:14:24,463 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 357.31793 ± 82.176
2025-05-07 18:14:24,464 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [340.07635, 325.4851, 564.1962, 372.84634, 285.67993, 304.62628, 378.41663, 405.23294, 245.5594, 351.06024]
2025-05-07 18:14:24,464 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [62.0, 74.0, 116.0, 78.0, 66.0, 57.0, 87.0, 76.0, 53.0, 77.0]
2025-05-07 18:14:24,469 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 53 minutes, 26 seconds)
2025-05-07 18:18:11,334 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:18:12,540 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 314.44958 ± 177.319
2025-05-07 18:18:12,540 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [266.01752, 309.777, 415.94675, 378.5701, 172.85904, 155.66347, 140.81068, 183.69186, 355.67645, 765.4829]
2025-05-07 18:18:12,540 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [57.0, 62.0, 77.0, 71.0, 33.0, 30.0, 27.0, 36.0, 67.0, 170.0]
2025-05-07 18:18:12,545 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 49 minutes, 15 seconds)
2025-05-07 18:22:02,125 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:22:03,405 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 325.05740 ± 111.205
2025-05-07 18:22:03,405 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [288.86212, 202.46257, 279.82596, 383.19, 519.93866, 486.73355, 341.11322, 156.05356, 356.59595, 235.79839]
2025-05-07 18:22:03,405 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [55.0, 45.0, 66.0, 80.0, 101.0, 100.0, 64.0, 30.0, 67.0, 54.0]
2025-05-07 18:22:03,409 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 45 minutes, 47 seconds)
2025-05-07 18:25:53,745 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:25:54,982 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 325.82086 ± 169.794
2025-05-07 18:25:54,982 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [156.02533, 327.06577, 559.4268, 130.93356, 681.0505, 335.68845, 254.75923, 355.2298, 139.65222, 318.37695]
2025-05-07 18:25:54,982 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [30.0, 75.0, 106.0, 25.0, 131.0, 75.0, 58.0, 66.0, 27.0, 66.0]
2025-05-07 18:25:54,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 42 minutes, 31 seconds)
2025-05-07 18:29:45,823 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:29:46,982 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 316.79239 ± 107.079
2025-05-07 18:29:46,982 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [229.30017, 224.61563, 400.05045, 299.17636, 354.98862, 454.82916, 184.98856, 155.55737, 451.46494, 412.95267]
2025-05-07 18:29:46,982 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [44.0, 48.0, 73.0, 56.0, 65.0, 85.0, 40.0, 30.0, 88.0, 77.0]
2025-05-07 18:29:46,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 39 minutes, 15 seconds)
2025-05-07 18:33:39,730 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:33:41,048 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 354.56705 ± 78.836
2025-05-07 18:33:41,048 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [346.43594, 300.2237, 391.11472, 213.7621, 247.44652, 373.14502, 413.3765, 339.16592, 462.8742, 458.12573]
2025-05-07 18:33:41,048 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [73.0, 55.0, 76.0, 41.0, 47.0, 68.0, 76.0, 63.0, 88.0, 86.0]
2025-05-07 18:33:41,053 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 35 minutes, 53 seconds)
2025-05-07 18:37:31,163 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:37:32,359 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 319.07092 ± 62.779
2025-05-07 18:37:32,359 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [436.9834, 220.10857, 303.24176, 256.44113, 399.3216, 296.8475, 287.4103, 283.9114, 348.9401, 357.50333]
2025-05-07 18:37:32,360 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [95.0, 42.0, 57.0, 47.0, 73.0, 54.0, 53.0, 57.0, 65.0, 65.0]
2025-05-07 18:37:32,365 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 32 minutes, 38 seconds)
2025-05-07 18:41:22,200 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:41:23,553 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 368.06415 ± 69.992
2025-05-07 18:41:23,553 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [324.69608, 377.0184, 366.93173, 401.4845, 367.65628, 373.24722, 338.6287, 217.58783, 514.4079, 398.98267]
2025-05-07 18:41:23,553 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [71.0, 70.0, 70.0, 77.0, 68.0, 72.0, 67.0, 46.0, 103.0, 79.0]
2025-05-07 18:41:23,558 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 28 minutes, 49 seconds)
2025-05-07 18:45:14,141 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:45:15,607 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 398.30548 ± 124.743
2025-05-07 18:45:15,607 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [454.51318, 372.077, 194.8234, 330.0491, 414.45187, 352.14578, 378.73273, 330.67947, 708.0196, 447.56238]
2025-05-07 18:45:15,607 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [91.0, 68.0, 41.0, 60.0, 75.0, 64.0, 77.0, 62.0, 144.0, 82.0]
2025-05-07 18:45:15,612 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 25 minutes, 2 seconds)
2025-05-07 18:49:06,383 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:49:07,935 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 403.87097 ± 91.266
2025-05-07 18:49:07,935 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [328.20956, 519.8206, 404.71158, 409.02402, 332.54776, 620.44073, 368.41452, 375.54575, 309.2532, 370.74222]
2025-05-07 18:49:07,935 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [66.0, 109.0, 76.0, 76.0, 62.0, 122.0, 69.0, 74.0, 67.0, 70.0]
2025-05-07 18:49:07,940 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 21 minutes, 13 seconds)
2025-05-07 18:52:58,309 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:52:59,498 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 319.98346 ± 91.182
2025-05-07 18:52:59,498 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [263.65097, 398.1369, 427.98428, 284.1271, 140.31824, 403.6317, 332.6625, 394.25775, 198.94577, 356.1192]
2025-05-07 18:52:59,498 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [54.0, 75.0, 82.0, 52.0, 27.0, 75.0, 62.0, 79.0, 42.0, 76.0]
2025-05-07 18:52:59,503 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 16 minutes, 56 seconds)
2025-05-07 18:56:48,652 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:56:49,896 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 327.15570 ± 50.051
2025-05-07 18:56:49,896 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [353.7324, 420.4686, 365.65097, 307.5193, 298.39804, 286.94937, 337.38953, 377.8057, 269.54483, 254.09782]
2025-05-07 18:56:49,896 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [70.0, 76.0, 68.0, 56.0, 54.0, 55.0, 64.0, 70.0, 55.0, 51.0]
2025-05-07 18:56:49,901 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 12 minutes, 55 seconds)
2025-05-07 19:00:37,626 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:00:38,917 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 330.63931 ± 31.730
2025-05-07 19:00:38,918 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [330.83133, 352.72675, 348.08655, 302.0112, 342.07874, 319.9802, 366.64804, 368.02637, 257.33234, 318.67166]
2025-05-07 19:00:38,918 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [59.0, 65.0, 65.0, 56.0, 64.0, 59.0, 68.0, 68.0, 52.0, 58.0]
2025-05-07 19:00:38,923 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 8 minutes, 42 seconds)
2025-05-07 19:04:27,467 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:04:28,973 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 376.25195 ± 63.179
2025-05-07 19:04:28,974 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [399.76727, 430.78278, 310.00705, 432.12006, 375.97998, 397.21964, 482.2348, 265.06723, 361.0106, 308.33035]
2025-05-07 19:04:28,974 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [76.0, 92.0, 65.0, 79.0, 86.0, 77.0, 93.0, 58.0, 80.0, 61.0]
2025-05-07 19:04:28,980 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 53/100 (estimated time remaining: 3 hours, 4 minutes, 32 seconds)
2025-05-07 19:08:18,865 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:08:20,130 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 336.70477 ± 50.340
2025-05-07 19:08:20,130 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [313.6263, 380.97116, 333.68347, 309.0677, 320.80795, 396.25693, 305.43073, 242.12216, 428.0976, 336.984]
2025-05-07 19:08:20,130 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [59.0, 81.0, 59.0, 56.0, 57.0, 72.0, 56.0, 51.0, 81.0, 60.0]
2025-05-07 19:08:20,136 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 54/100 (estimated time remaining: 3 hours, 30 seconds)
2025-05-07 19:12:08,859 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:12:10,242 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 353.32291 ± 159.224
2025-05-07 19:12:10,243 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [456.3981, 691.18414, 369.04285, 83.43434, 365.71735, 415.23593, 398.77054, 140.92744, 300.49744, 312.02112]
2025-05-07 19:12:10,243 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [87.0, 137.0, 75.0, 17.0, 79.0, 75.0, 77.0, 27.0, 69.0, 68.0]
2025-05-07 19:12:10,249 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 56 minutes, 26 seconds)
2025-05-07 19:15:59,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:16:00,609 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 396.06601 ± 31.514
2025-05-07 19:16:00,609 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [377.7207, 436.9241, 426.69128, 443.05005, 371.22052, 357.19977, 382.8053, 362.78372, 426.7952, 375.46936]
2025-05-07 19:16:00,610 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [68.0, 83.0, 78.0, 87.0, 67.0, 71.0, 70.0, 67.0, 79.0, 71.0]
2025-05-07 19:16:00,615 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 52 minutes, 36 seconds)
2025-05-07 19:19:49,319 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:19:50,627 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 355.78500 ± 50.282
2025-05-07 19:19:50,627 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [382.56833, 347.083, 313.58228, 372.9917, 320.48428, 273.30743, 322.06442, 362.58087, 402.55383, 460.63382]
2025-05-07 19:19:50,627 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [69.0, 65.0, 57.0, 72.0, 58.0, 52.0, 59.0, 66.0, 76.0, 88.0]
2025-05-07 19:19:50,633 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 48 minutes, 55 seconds)
2025-05-07 19:23:41,179 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:23:42,467 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 336.29153 ± 88.990
2025-05-07 19:23:42,467 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [408.08246, 448.22833, 237.96844, 340.486, 394.9646, 421.9954, 295.05484, 202.11421, 207.97276, 406.04834]
2025-05-07 19:23:42,467 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [74.0, 86.0, 50.0, 63.0, 74.0, 78.0, 54.0, 39.0, 43.0, 75.0]
2025-05-07 19:23:42,472 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 45 minutes, 20 seconds)
2025-05-07 19:27:30,703 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:27:32,015 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 360.32300 ± 67.747
2025-05-07 19:27:32,015 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [386.35663, 370.12082, 357.71942, 282.75266, 331.92453, 456.61142, 397.63315, 461.03262, 232.45447, 326.62457]
2025-05-07 19:27:32,015 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [76.0, 68.0, 64.0, 51.0, 61.0, 85.0, 75.0, 83.0, 46.0, 60.0]
2025-05-07 19:27:32,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 41 minutes, 15 seconds)
2025-05-07 19:31:19,846 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:31:21,220 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 366.39282 ± 94.263
2025-05-07 19:31:21,220 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [358.68018, 355.72638, 474.6122, 389.01767, 427.92328, 193.69038, 198.71, 362.12466, 443.11707, 460.32678]
2025-05-07 19:31:21,220 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [66.0, 66.0, 90.0, 72.0, 83.0, 38.0, 39.0, 77.0, 82.0, 86.0]
2025-05-07 19:31:21,226 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 37 minutes, 18 seconds)
2025-05-07 19:35:09,222 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:35:10,672 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 369.07132 ± 98.639
2025-05-07 19:35:10,672 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [255.42075, 284.6333, 375.12732, 260.80896, 467.13486, 594.57434, 390.93015, 303.65543, 373.61038, 384.8176]
2025-05-07 19:35:10,672 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [47.0, 55.0, 80.0, 49.0, 90.0, 120.0, 72.0, 60.0, 69.0, 75.0]
2025-05-07 19:35:10,678 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 33 minutes, 20 seconds)
2025-05-07 19:38:59,914 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:39:01,446 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 371.87296 ± 134.276
2025-05-07 19:39:01,446 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [440.6428, 510.1571, 303.18198, 107.14915, 328.58597, 606.9766, 370.64563, 318.24823, 470.42474, 262.71692]
2025-05-07 19:39:01,446 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [93.0, 98.0, 64.0, 21.0, 66.0, 124.0, 81.0, 63.0, 102.0, 59.0]
2025-05-07 19:39:01,452 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 29 minutes, 36 seconds)
2025-05-07 19:42:50,741 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:42:52,273 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 394.77353 ± 66.294
2025-05-07 19:42:52,274 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [520.5905, 399.52704, 477.93213, 378.8252, 263.3245, 389.44037, 366.43246, 388.99857, 416.13095, 346.53375]
2025-05-07 19:42:52,274 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [99.0, 74.0, 94.0, 68.0, 55.0, 75.0, 72.0, 73.0, 77.0, 63.0]
2025-05-07 19:42:52,281 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 25 minutes, 38 seconds)
2025-05-07 19:46:41,626 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:46:43,071 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 374.59625 ± 76.868
2025-05-07 19:46:43,071 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [398.7793, 347.43085, 437.79398, 211.92226, 393.00354, 364.01364, 433.78827, 266.2281, 421.3191, 471.6834]
2025-05-07 19:46:43,071 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [70.0, 68.0, 83.0, 40.0, 77.0, 67.0, 81.0, 50.0, 90.0, 93.0]
2025-05-07 19:46:43,079 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 21 minutes, 57 seconds)
2025-05-07 19:50:31,795 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:50:32,869 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 296.27014 ± 67.077
2025-05-07 19:50:32,869 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [394.8698, 225.42107, 240.37558, 219.35023, 291.92902, 351.68036, 307.46198, 324.99023, 395.70197, 210.92146]
2025-05-07 19:50:32,869 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [71.0, 42.0, 45.0, 40.0, 57.0, 64.0, 56.0, 59.0, 72.0, 42.0]
2025-05-07 19:50:32,877 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 18 minutes, 11 seconds)
2025-05-07 19:54:23,226 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:54:24,422 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 321.02728 ± 112.717
2025-05-07 19:54:24,422 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [192.73323, 179.42358, 384.68692, 101.3488, 325.27228, 398.92566, 392.7654, 450.96698, 394.93613, 389.2137]
2025-05-07 19:54:24,422 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [37.0, 35.0, 69.0, 20.0, 59.0, 72.0, 71.0, 91.0, 73.0, 70.0]
2025-05-07 19:54:24,430 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 14 minutes, 36 seconds)
2025-05-07 19:58:12,551 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 19:58:13,898 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 357.79514 ± 35.844
2025-05-07 19:58:13,898 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [403.15958, 379.52655, 355.33432, 309.00928, 370.35947, 289.80505, 373.8139, 323.15247, 387.50385, 386.28687]
2025-05-07 19:58:13,898 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [72.0, 68.0, 64.0, 57.0, 67.0, 53.0, 68.0, 58.0, 71.0, 71.0]
2025-05-07 19:58:13,904 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 10 minutes, 36 seconds)
2025-05-07 20:02:03,633 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:02:05,162 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 399.29050 ± 76.529
2025-05-07 20:02:05,162 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [502.16223, 426.25797, 443.711, 261.30933, 404.21133, 475.3417, 407.29224, 296.10284, 319.4048, 457.11157]
2025-05-07 20:02:05,162 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [103.0, 78.0, 81.0, 48.0, 77.0, 87.0, 88.0, 55.0, 64.0, 87.0]
2025-05-07 20:02:05,169 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 6 minutes, 49 seconds)
2025-05-07 20:05:55,008 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:05:56,440 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 352.27469 ± 55.260
2025-05-07 20:05:56,440 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [465.22034, 320.10336, 423.1412, 342.9869, 320.8197, 267.71793, 368.48825, 340.84906, 372.90268, 300.51755]
2025-05-07 20:05:56,440 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [104.0, 59.0, 81.0, 64.0, 71.0, 49.0, 70.0, 76.0, 82.0, 58.0]
2025-05-07 20:05:56,448 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 3 minutes, 1 second)
2025-05-07 20:09:46,588 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:09:48,285 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 426.25616 ± 119.887
2025-05-07 20:09:48,285 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [442.23813, 302.036, 529.33954, 230.248, 390.23767, 297.0594, 659.77313, 485.78955, 451.86084, 473.97888]
2025-05-07 20:09:48,285 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [95.0, 55.0, 102.0, 43.0, 72.0, 54.0, 124.0, 97.0, 84.0, 88.0]
2025-05-07 20:09:48,293 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 59 minutes, 23 seconds)
2025-05-07 20:13:39,203 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:13:40,713 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 387.74670 ± 80.094
2025-05-07 20:13:40,713 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [428.20508, 377.81403, 390.45538, 364.62155, 517.40546, 375.7805, 309.5413, 526.8441, 264.6618, 322.13788]
2025-05-07 20:13:40,713 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [80.0, 69.0, 73.0, 65.0, 101.0, 69.0, 60.0, 99.0, 55.0, 69.0]
2025-05-07 20:13:40,721 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 55 minutes, 37 seconds)
2025-05-07 20:17:28,370 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:17:29,673 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 338.56473 ± 120.187
2025-05-07 20:17:29,673 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [260.57193, 256.07233, 303.11392, 248.6955, 477.7322, 313.67453, 181.92738, 473.57474, 578.01294, 292.27148]
2025-05-07 20:17:29,673 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [49.0, 56.0, 55.0, 46.0, 90.0, 58.0, 35.0, 89.0, 114.0, 52.0]
2025-05-07 20:17:29,680 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 51 minutes, 43 seconds)
2025-05-07 20:21:20,190 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:21:21,583 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 367.88614 ± 148.374
2025-05-07 20:21:21,583 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [582.79584, 419.9891, 462.9847, 311.23712, 183.02716, 511.42856, 272.8868, 81.20803, 371.41486, 481.88947]
2025-05-07 20:21:21,583 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [112.0, 75.0, 89.0, 57.0, 36.0, 102.0, 57.0, 17.0, 75.0, 89.0]
2025-05-07 20:21:21,590 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 47 minutes, 55 seconds)
2025-05-07 20:25:10,137 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:25:11,292 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 307.23724 ± 191.474
2025-05-07 20:25:11,292 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [151.11809, 158.34827, 134.9243, 161.5354, 678.85724, 288.4671, 620.7542, 176.67854, 274.79663, 426.89252]
2025-05-07 20:25:11,292 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [29.0, 31.0, 26.0, 31.0, 135.0, 54.0, 116.0, 34.0, 55.0, 79.0]
2025-05-07 20:25:11,299 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 43 minutes, 56 seconds)
2025-05-07 20:29:01,313 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:29:02,791 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 386.19666 ± 44.035
2025-05-07 20:29:02,791 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [354.99295, 375.92365, 454.39734, 309.35443, 363.7945, 417.8629, 366.53464, 354.89902, 450.4074, 413.79956]
2025-05-07 20:29:02,791 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [63.0, 71.0, 85.0, 57.0, 66.0, 79.0, 69.0, 73.0, 85.0, 79.0]
2025-05-07 20:29:02,799 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 40 minutes, 3 seconds)
2025-05-07 20:32:53,232 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:32:54,757 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 393.45758 ± 31.293
2025-05-07 20:32:54,757 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [332.53232, 410.67972, 354.19882, 417.72437, 414.36517, 405.25018, 424.76108, 428.9395, 371.22794, 374.89655]
2025-05-07 20:32:54,757 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [61.0, 75.0, 66.0, 76.0, 83.0, 81.0, 80.0, 81.0, 78.0, 69.0]
2025-05-07 20:32:54,766 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 36 minutes, 10 seconds)
2025-05-07 20:36:45,597 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:36:47,079 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 369.15445 ± 51.452
2025-05-07 20:36:47,079 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [410.3742, 375.40268, 358.1, 263.5841, 394.84637, 390.82034, 335.1693, 329.76376, 366.2302, 467.25336]
2025-05-07 20:36:47,079 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [84.0, 77.0, 80.0, 57.0, 78.0, 74.0, 62.0, 73.0, 75.0, 89.0]
2025-05-07 20:36:47,086 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 32 minutes, 35 seconds)
2025-05-07 20:40:36,835 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:40:38,286 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 351.93207 ± 85.283
2025-05-07 20:40:38,286 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [393.63782, 430.79733, 363.32776, 276.20398, 364.9842, 309.11658, 333.527, 171.37411, 370.76376, 505.58817]
2025-05-07 20:40:38,286 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [85.0, 92.0, 80.0, 54.0, 66.0, 68.0, 69.0, 33.0, 81.0, 95.0]
2025-05-07 20:40:38,295 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 28 minutes, 40 seconds)
2025-05-07 20:44:32,077 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:44:33,311 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 331.88278 ± 96.974
2025-05-07 20:44:33,311 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [293.63266, 396.9264, 367.7362, 297.9544, 387.99826, 482.1194, 223.29268, 340.21158, 124.4056, 404.55087]
2025-05-07 20:44:33,311 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [53.0, 72.0, 67.0, 55.0, 69.0, 90.0, 43.0, 65.0, 24.0, 77.0]
2025-05-07 20:44:33,319 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 25 minutes, 12 seconds)
2025-05-07 20:48:23,025 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:48:24,632 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 408.19852 ± 119.094
2025-05-07 20:48:24,632 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [334.74136, 397.7384, 396.4954, 265.87784, 539.2635, 390.85602, 621.63495, 230.15015, 545.4986, 359.72894]
2025-05-07 20:48:24,632 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [69.0, 84.0, 72.0, 56.0, 100.0, 76.0, 124.0, 48.0, 115.0, 66.0]
2025-05-07 20:48:24,640 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 21 minutes, 19 seconds)
2025-05-07 20:52:15,280 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:52:16,857 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 397.29953 ± 137.421
2025-05-07 20:52:16,857 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [318.95566, 711.88043, 288.8558, 326.67944, 411.21594, 289.394, 318.15747, 309.18954, 600.11316, 398.55405]
2025-05-07 20:52:16,857 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [57.0, 151.0, 52.0, 60.0, 74.0, 59.0, 58.0, 56.0, 134.0, 83.0]
2025-05-07 20:52:16,866 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 17 minutes, 28 seconds)
2025-05-07 20:56:07,148 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 20:56:08,582 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 354.84943 ± 57.834
2025-05-07 20:56:08,582 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [366.86212, 364.07358, 405.9903, 270.28513, 293.33813, 436.0149, 324.53024, 441.27045, 286.50992, 359.61935]
2025-05-07 20:56:08,582 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [65.0, 67.0, 76.0, 50.0, 59.0, 94.0, 75.0, 82.0, 65.0, 81.0]
2025-05-07 20:56:08,590 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 13 minutes, 33 seconds)
2025-05-07 20:59:58,672 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:00:00,578 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 467.61456 ± 97.586
2025-05-07 21:00:00,579 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [379.3511, 523.02026, 287.06693, 444.38452, 540.6785, 458.86624, 504.9563, 399.41962, 471.39386, 667.00836]
2025-05-07 21:00:00,579 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [72.0, 111.0, 63.0, 80.0, 98.0, 94.0, 93.0, 87.0, 88.0, 130.0]
2025-05-07 21:00:00,579 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1124 [INFO]: New best (467.61) for latency ExtremeSparseL4U32
2025-05-07 21:00:00,579 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 21:00:00,583 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-sac/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 21:00:00,596 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 9 minutes, 44 seconds)
2025-05-07 21:03:51,398 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:03:52,890 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 403.64166 ± 79.393
2025-05-07 21:03:52,890 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [409.80887, 564.6648, 262.52835, 412.26816, 468.4588, 429.0273, 293.94595, 389.7295, 404.54617, 401.43893]
2025-05-07 21:03:52,890 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [75.0, 105.0, 48.0, 74.0, 84.0, 77.0, 52.0, 72.0, 72.0, 73.0]
2025-05-07 21:03:52,899 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 5 minutes, 42 seconds)
2025-05-07 21:07:41,954 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:07:43,666 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 406.40704 ± 286.612
2025-05-07 21:07:43,666 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [524.4441, 1173.9902, 145.49077, 130.38173, 162.51122, 408.3574, 374.10187, 325.8954, 350.0775, 468.8198]
2025-05-07 21:07:43,666 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [98.0, 234.0, 28.0, 25.0, 31.0, 87.0, 68.0, 71.0, 81.0, 96.0]
2025-05-07 21:07:43,674 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 1 minute, 48 seconds)
2025-05-07 21:11:32,518 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:11:34,237 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 436.29770 ± 144.328
2025-05-07 21:11:34,237 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [412.33514, 461.45535, 452.16187, 768.45197, 322.7187, 617.6728, 355.63275, 261.09097, 372.12323, 339.334]
2025-05-07 21:11:34,237 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [78.0, 86.0, 85.0, 150.0, 60.0, 118.0, 65.0, 56.0, 69.0, 67.0]
2025-05-07 21:11:34,245 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 86/100 (estimated time remaining: 57 minutes, 52 seconds)
2025-05-07 21:15:24,103 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:15:25,418 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 339.24518 ± 58.155
2025-05-07 21:15:25,419 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [275.5555, 389.8626, 264.818, 417.39468, 286.38223, 445.65732, 310.0827, 316.90463, 348.40042, 337.39362]
2025-05-07 21:15:25,419 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [50.0, 90.0, 48.0, 77.0, 53.0, 90.0, 56.0, 57.0, 69.0, 64.0]
2025-05-07 21:15:25,427 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 87/100 (estimated time remaining: 53 minutes, 59 seconds)
2025-05-07 21:19:12,965 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:19:14,615 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 414.74115 ± 131.807
2025-05-07 21:19:14,615 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [352.4025, 496.54584, 280.14383, 562.73114, 455.0938, 608.1216, 141.16718, 495.1958, 371.0523, 384.9577]
2025-05-07 21:19:14,615 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [64.0, 98.0, 51.0, 118.0, 84.0, 121.0, 27.0, 91.0, 68.0, 72.0]
2025-05-07 21:19:14,624 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 88/100 (estimated time remaining: 50 minutes)
2025-05-07 21:23:03,480 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:23:04,907 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 383.41699 ± 67.182
2025-05-07 21:23:04,907 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [359.2306, 431.25754, 463.37253, 353.8505, 264.5381, 482.28812, 456.7776, 340.80652, 328.90536, 353.1432]
2025-05-07 21:23:04,907 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [63.0, 85.0, 85.0, 78.0, 48.0, 89.0, 81.0, 61.0, 59.0, 63.0]
2025-05-07 21:23:04,915 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 89/100 (estimated time remaining: 46 minutes, 4 seconds)
2025-05-07 21:26:52,994 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:26:54,771 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 441.54184 ± 146.745
2025-05-07 21:26:54,771 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [412.36514, 349.79037, 655.9284, 574.66797, 303.03537, 454.28165, 620.3459, 554.39545, 219.65233, 270.9561]
2025-05-07 21:26:54,771 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [76.0, 77.0, 124.0, 108.0, 65.0, 89.0, 115.0, 105.0, 49.0, 63.0]
2025-05-07 21:26:54,781 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 90/100 (estimated time remaining: 42 minutes, 12 seconds)
2025-05-07 21:30:44,832 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:30:46,397 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 390.63458 ± 75.838
2025-05-07 21:30:46,397 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [396.95987, 431.0975, 485.40588, 314.45657, 452.02557, 444.56302, 266.10336, 433.53156, 418.946, 263.25644]
2025-05-07 21:30:46,397 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [74.0, 81.0, 93.0, 62.0, 84.0, 101.0, 60.0, 80.0, 78.0, 54.0]
2025-05-07 21:30:46,407 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 91/100 (estimated time remaining: 38 minutes, 24 seconds)
2025-05-07 21:34:36,304 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:34:38,068 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 442.45990 ± 107.850
2025-05-07 21:34:38,069 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [634.8374, 564.1933, 344.8831, 431.02332, 356.0939, 511.03116, 519.77277, 448.6041, 308.57242, 305.58798]
2025-05-07 21:34:38,069 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [119.0, 122.0, 61.0, 88.0, 64.0, 97.0, 97.0, 84.0, 59.0, 56.0]
2025-05-07 21:34:38,079 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 92/100 (estimated time remaining: 34 minutes, 34 seconds)
2025-05-07 21:38:27,768 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:38:29,142 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 353.04547 ± 120.751
2025-05-07 21:38:29,142 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [465.5172, 456.01074, 390.34473, 369.12454, 447.26364, 145.48236, 151.01183, 245.50902, 478.53473, 381.65564]
2025-05-07 21:38:29,142 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [86.0, 85.0, 77.0, 68.0, 88.0, 28.0, 29.0, 49.0, 93.0, 69.0]
2025-05-07 21:38:29,151 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 93/100 (estimated time remaining: 30 minutes, 47 seconds)
2025-05-07 21:42:17,970 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:42:19,401 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 381.76044 ± 91.319
2025-05-07 21:42:19,401 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [357.41232, 467.37155, 399.93332, 473.05737, 363.74405, 491.38144, 369.05518, 409.9852, 329.15042, 156.51367]
2025-05-07 21:42:19,401 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [64.0, 87.0, 73.0, 90.0, 68.0, 89.0, 70.0, 77.0, 62.0, 30.0]
2025-05-07 21:42:19,410 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 94/100 (estimated time remaining: 26 minutes, 56 seconds)
2025-05-07 21:46:08,720 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:46:10,140 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 357.16949 ± 109.380
2025-05-07 21:46:10,141 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [477.35394, 370.72452, 363.49548, 150.65784, 146.15715, 419.85373, 392.98038, 407.49323, 391.64905, 451.3296]
2025-05-07 21:46:10,141 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [89.0, 70.0, 67.0, 29.0, 28.0, 83.0, 76.0, 76.0, 87.0, 85.0]
2025-05-07 21:46:10,150 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 95/100 (estimated time remaining: 23 minutes, 6 seconds)
2025-05-07 21:50:00,961 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:50:02,505 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 390.53857 ± 123.976
2025-05-07 21:50:02,506 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [392.97305, 430.50375, 156.35938, 348.33432, 635.3636, 553.3016, 332.24194, 326.0112, 355.31772, 374.97934]
2025-05-07 21:50:02,506 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [70.0, 80.0, 30.0, 73.0, 126.0, 114.0, 60.0, 66.0, 82.0, 68.0]
2025-05-07 21:50:02,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 96/100 (estimated time remaining: 19 minutes, 16 seconds)
2025-05-07 21:53:52,410 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:53:54,502 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 494.42657 ± 156.011
2025-05-07 21:53:54,502 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [690.88214, 310.20242, 375.88272, 860.3319, 491.18454, 504.82498, 461.2955, 426.14066, 452.84094, 370.67957]
2025-05-07 21:53:54,502 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [148.0, 70.0, 68.0, 176.0, 92.0, 93.0, 84.0, 82.0, 88.0, 67.0]
2025-05-07 21:53:54,502 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1124 [INFO]: New best (494.43) for latency ExtremeSparseL4U32
2025-05-07 21:53:54,502 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1127 [INFO]: saving network
2025-05-07 21:53:54,506 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-humanoid/ExtremeSparseL4U32-sac/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 21:53:54,523 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 97/100 (estimated time remaining: 15 minutes, 25 seconds)
2025-05-07 21:57:44,200 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 21:57:45,412 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 336.97720 ± 59.950
2025-05-07 21:57:45,413 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [213.25037, 366.05203, 356.22794, 314.9048, 324.65408, 445.95706, 345.37686, 280.0738, 327.17276, 396.10236]
2025-05-07 21:57:45,413 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [41.0, 67.0, 66.0, 59.0, 59.0, 82.0, 62.0, 53.0, 60.0, 77.0]
2025-05-07 21:57:45,424 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 98/100 (estimated time remaining: 11 minutes, 33 seconds)
2025-05-07 22:01:37,017 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:01:38,404 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 355.65582 ± 141.318
2025-05-07 22:01:38,404 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [506.63797, 390.67386, 135.02596, 165.98766, 329.301, 392.98334, 337.3359, 645.61, 288.84427, 364.1582]
2025-05-07 22:01:38,404 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [95.0, 72.0, 26.0, 32.0, 60.0, 73.0, 65.0, 124.0, 58.0, 64.0]
2025-05-07 22:01:38,413 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 99/100 (estimated time remaining: 7 minutes, 43 seconds)
2025-05-07 22:05:29,455 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:05:30,907 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 357.90298 ± 128.168
2025-05-07 22:05:30,907 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [521.23517, 244.67064, 520.71844, 160.78076, 399.6738, 159.92003, 351.63638, 342.53912, 388.6333, 489.22226]
2025-05-07 22:05:30,907 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [107.0, 54.0, 110.0, 31.0, 77.0, 31.0, 68.0, 70.0, 72.0, 92.0]
2025-05-07 22:05:30,916 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1097 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 52 seconds)
2025-05-07 22:09:22,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:09:23,209 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1119 [DEBUG]: Total Reward: 284.34503 ± 104.884
2025-05-07 22:09:23,209 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1120 [DEBUG]: All rewards: [294.0693, 170.58928, 135.80275, 440.97543, 430.9404, 333.49402, 350.21628, 249.62376, 297.44992, 140.28941]
2025-05-07 22:09:23,209 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [53.0, 33.0, 26.0, 82.0, 82.0, 65.0, 67.0, 46.0, 61.0, 27.0]
2025-05-07 22:09:23,218 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1149 [DEBUG]: Training session finished
