2026-01-22 23:52:42,199 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-humanoid/DatasetOffice-sac-aug-mem1  
2026-01-22 23:52:42,199 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-humanoid/DatasetOffice-sac-aug-mem1  
2026-01-22 23:52:42,199 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x147197265c10>}
2026-01-22 23:52:42,199 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1159 [DEBUG]: using device: cuda
2026-01-22 23:52:42,199 baseline-sac-noisy-humanoid:77 [WARNING]: args.memorize_actions != args.horizon: 1 != 32
2026-01-22 23:52:42,345 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1181 [INFO]: Creating new trainer
2026-01-22 23:52:42,362 baseline-sac-noisy-humanoid:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=393, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2026-01-22 23:52:42,362 baseline-sac-noisy-humanoid:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=410, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-22 23:52:43,410 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1242 [DEBUG]: Starting training session...
2026-01-22 23:52:43,410 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 1/100
2026-01-22 23:54:09,929 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:54:10,508 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 257.12375 ± 14.700
2026-01-22 23:54:10,508 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [239.95717, 269.6514, 242.85591, 269.04138, 245.1799, 273.77045, 281.80856, 239.74323, 257.7732, 251.45659]
2026-01-22 23:54:10,508 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [43.0, 48.0, 44.0, 48.0, 44.0, 49.0, 50.0, 43.0, 47.0, 45.0]
2026-01-22 23:54:10,508 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (257.12) for latency DatasetOffice
2026-01-22 23:54:10,522 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 23 minutes, 44 seconds)
2026-01-22 23:55:45,049 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:55:46,008 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 427.87091 ± 96.671
2026-01-22 23:55:46,009 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [352.56204, 336.06512, 364.65594, 618.9954, 380.06836, 457.8835, 358.34448, 577.1296, 352.19293, 480.8116]
2026-01-22 23:55:46,009 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [71.0, 68.0, 65.0, 114.0, 79.0, 88.0, 71.0, 122.0, 68.0, 94.0]
2026-01-22 23:55:46,009 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (427.87) for latency DatasetOffice
2026-01-22 23:55:46,012 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 29 minutes, 7 seconds)
2026-01-22 23:57:20,474 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:57:21,356 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 359.69012 ± 56.270
2026-01-22 23:57:21,356 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [271.1016, 479.7995, 369.08273, 279.86368, 382.5066, 366.15643, 339.04202, 393.00665, 378.48483, 337.85727]
2026-01-22 23:57:21,356 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [54.0, 95.0, 73.0, 58.0, 81.0, 73.0, 69.0, 79.0, 83.0, 69.0]
2026-01-22 23:57:21,362 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 29 minutes, 47 seconds)
2026-01-22 23:58:56,286 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:58:57,287 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 417.40283 ± 89.620
2026-01-22 23:58:57,287 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [419.19128, 480.15012, 347.54123, 355.1555, 652.8225, 357.9625, 405.545, 413.057, 417.42776, 325.17535]
2026-01-22 23:58:57,287 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [78.0, 96.0, 77.0, 72.0, 125.0, 80.0, 74.0, 77.0, 92.0, 73.0]
2026-01-22 23:58:57,318 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 29 minutes, 33 seconds)
2026-01-23 00:00:32,587 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:00:33,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 446.93677 ± 131.727
2026-01-23 00:00:33,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [442.37604, 352.77393, 473.80966, 488.5078, 439.47757, 299.39822, 406.68887, 315.05365, 795.1637, 456.11832]
2026-01-23 00:00:33,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [83.0, 76.0, 101.0, 92.0, 84.0, 59.0, 90.0, 60.0, 158.0, 86.0]
2026-01-23 00:00:33,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (446.94) for latency DatasetOffice
2026-01-23 00:00:33,633 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 28 minutes, 54 seconds)
2026-01-23 00:02:08,650 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:02:09,736 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 465.20142 ± 133.457
2026-01-23 00:02:09,736 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [418.28625, 478.6187, 769.7977, 631.4022, 320.75217, 391.3744, 337.61832, 417.72977, 374.25198, 512.18243]
2026-01-23 00:02:09,736 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [77.0, 88.0, 150.0, 128.0, 64.0, 81.0, 62.0, 79.0, 84.0, 112.0]
2026-01-23 00:02:09,736 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (465.20) for latency DatasetOffice
2026-01-23 00:02:09,739 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 30 minutes, 9 seconds)
2026-01-23 00:03:44,871 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:03:45,896 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 455.92194 ± 68.154
2026-01-23 00:03:45,896 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [454.84604, 458.11435, 392.73035, 441.65298, 564.416, 409.3702, 440.68912, 399.65643, 395.86203, 601.8815]
2026-01-23 00:03:45,896 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [88.0, 88.0, 74.0, 83.0, 113.0, 76.0, 81.0, 75.0, 74.0, 116.0]
2026-01-23 00:03:45,900 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 28 minutes, 45 seconds)
2026-01-23 00:05:21,676 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:05:22,854 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 508.43115 ± 87.699
2026-01-23 00:05:22,854 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [408.48297, 651.54474, 438.8278, 491.1974, 575.74817, 563.51056, 339.21133, 571.82605, 540.8449, 503.11765]
2026-01-23 00:05:22,854 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [78.0, 121.0, 82.0, 92.0, 104.0, 104.0, 65.0, 107.0, 102.0, 105.0]
2026-01-23 00:05:22,854 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (508.43) for latency DatasetOffice
2026-01-23 00:05:22,858 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 27 minutes, 39 seconds)
2026-01-23 00:06:58,180 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:06:59,373 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 494.68719 ± 90.516
2026-01-23 00:06:59,373 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [601.6747, 498.98666, 661.40985, 369.42578, 399.531, 497.8904, 514.4059, 383.93094, 460.80722, 558.8093]
2026-01-23 00:06:59,373 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [114.0, 107.0, 142.0, 76.0, 75.0, 94.0, 98.0, 81.0, 86.0, 115.0]
2026-01-23 00:06:59,377 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 26 minutes, 13 seconds)
2026-01-23 00:08:35,028 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:08:36,267 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 543.12689 ± 190.796
2026-01-23 00:08:36,267 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [407.13028, 527.26306, 483.96146, 356.97064, 579.02466, 326.62088, 504.4992, 1001.0222, 755.12946, 489.64746]
2026-01-23 00:08:36,267 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [77.0, 100.0, 93.0, 69.0, 105.0, 63.0, 105.0, 188.0, 146.0, 96.0]
2026-01-23 00:08:36,267 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (543.13) for latency DatasetOffice
2026-01-23 00:08:36,271 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 24 minutes, 47 seconds)
2026-01-23 00:10:11,485 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:10:12,491 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 452.61597 ± 81.265
2026-01-23 00:10:12,491 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [461.3314, 450.46268, 456.80206, 461.96268, 379.88663, 665.7883, 405.52252, 340.40808, 464.5186, 439.47693]
2026-01-23 00:10:12,491 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [85.0, 84.0, 83.0, 85.0, 72.0, 127.0, 74.0, 63.0, 88.0, 80.0]
2026-01-23 00:10:12,494 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 23 minutes, 13 seconds)
2026-01-23 00:11:48,448 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:11:49,896 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 614.40271 ± 150.159
2026-01-23 00:11:49,897 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [615.83307, 656.164, 416.58987, 448.198, 545.70856, 707.2866, 532.1115, 666.3605, 978.76654, 577.0084]
2026-01-23 00:11:49,897 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [116.0, 141.0, 79.0, 84.0, 106.0, 140.0, 102.0, 126.0, 186.0, 117.0]
2026-01-23 00:11:49,897 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (614.40) for latency DatasetOffice
2026-01-23 00:11:49,901 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 21 minutes, 58 seconds)
2026-01-23 00:13:25,597 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:13:26,691 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 482.87973 ± 101.334
2026-01-23 00:13:26,691 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [379.27997, 442.67642, 431.18027, 391.55588, 465.13028, 743.30383, 449.29007, 567.3647, 517.5836, 441.43195]
2026-01-23 00:13:26,691 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [70.0, 85.0, 79.0, 72.0, 87.0, 141.0, 82.0, 107.0, 101.0, 82.0]
2026-01-23 00:13:26,696 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 20 minutes, 18 seconds)
2026-01-23 00:15:03,082 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:15:04,423 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 571.16272 ± 74.956
2026-01-23 00:15:04,423 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [707.29865, 654.49365, 483.96222, 549.5593, 481.4793, 466.36316, 577.7347, 572.1627, 591.13086, 627.4423]
2026-01-23 00:15:04,423 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [136.0, 123.0, 101.0, 107.0, 106.0, 85.0, 124.0, 110.0, 112.0, 119.0]
2026-01-23 00:15:04,428 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 19 minutes, 2 seconds)
2026-01-23 00:16:39,821 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:16:41,308 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 616.99567 ± 189.129
2026-01-23 00:16:41,308 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [634.80804, 588.959, 584.0911, 569.4455, 1018.7747, 863.6256, 652.1289, 425.16647, 337.2643, 495.69278]
2026-01-23 00:16:41,308 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [119.0, 113.0, 111.0, 116.0, 196.0, 170.0, 119.0, 97.0, 78.0, 108.0]
2026-01-23 00:16:41,308 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (617.00) for latency DatasetOffice
2026-01-23 00:16:41,314 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 17 minutes, 25 seconds)
2026-01-23 00:18:17,182 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:18:18,457 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 560.72498 ± 106.266
2026-01-23 00:18:18,457 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [495.82907, 595.0386, 526.26654, 493.25385, 444.98044, 781.6081, 737.72833, 488.9329, 514.90857, 528.70374]
2026-01-23 00:18:18,457 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [91.0, 123.0, 97.0, 91.0, 92.0, 151.0, 137.0, 91.0, 95.0, 99.0]
2026-01-23 00:18:18,461 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 16 minutes, 4 seconds)
2026-01-23 00:19:53,346 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:19:54,748 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 604.12311 ± 205.836
2026-01-23 00:19:54,748 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [798.5495, 940.0501, 871.23846, 501.54236, 379.33746, 484.1071, 544.8461, 289.8061, 724.609, 507.14523]
2026-01-23 00:19:54,748 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [156.0, 185.0, 179.0, 99.0, 75.0, 92.0, 105.0, 56.0, 138.0, 111.0]
2026-01-23 00:19:54,751 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 14 minutes, 8 seconds)
2026-01-23 00:21:30,470 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:21:31,828 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 551.84106 ± 114.909
2026-01-23 00:21:31,828 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [397.285, 573.5188, 594.50507, 454.70273, 534.8251, 478.36072, 651.32904, 820.0067, 457.47058, 556.40674]
2026-01-23 00:21:31,829 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [82.0, 109.0, 125.0, 96.0, 118.0, 107.0, 123.0, 157.0, 90.0, 117.0]
2026-01-23 00:21:31,834 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 12 minutes, 36 seconds)
2026-01-23 00:23:07,965 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:23:09,408 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 599.45087 ± 170.515
2026-01-23 00:23:09,409 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [606.3229, 775.12555, 698.96, 493.49924, 356.40067, 764.8904, 428.53696, 505.5867, 906.67664, 458.51]
2026-01-23 00:23:09,409 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [130.0, 146.0, 135.0, 105.0, 74.0, 149.0, 82.0, 94.0, 177.0, 91.0]
2026-01-23 00:23:09,446 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 10 minutes, 57 seconds)
2026-01-23 00:24:45,024 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:24:46,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 721.46185 ± 224.233
2026-01-23 00:24:46,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [567.9302, 769.2159, 579.06433, 635.8842, 1002.31415, 1042.2471, 437.59662, 1029.7227, 418.39343, 732.2501]
2026-01-23 00:24:46,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [112.0, 150.0, 110.0, 119.0, 188.0, 210.0, 92.0, 193.0, 92.0, 137.0]
2026-01-23 00:24:46,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (721.46) for latency DatasetOffice
2026-01-23 00:24:46,694 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 9 minutes, 26 seconds)
2026-01-23 00:26:22,624 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:26:24,043 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 617.64587 ± 247.500
2026-01-23 00:26:24,043 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [916.764, 885.345, 523.116, 626.8213, 310.21606, 475.28528, 445.25955, 421.09763, 1104.4016, 468.1526]
2026-01-23 00:26:24,044 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [170.0, 160.0, 108.0, 128.0, 60.0, 90.0, 82.0, 86.0, 214.0, 89.0]
2026-01-23 00:26:24,047 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 7 minutes, 52 seconds)
2026-01-23 00:28:00,054 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:28:01,377 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 585.48206 ± 107.774
2026-01-23 00:28:01,377 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [544.0971, 714.14923, 770.4875, 443.8276, 558.93256, 563.1589, 637.3711, 445.69446, 488.21298, 688.88916]
2026-01-23 00:28:01,377 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [101.0, 130.0, 144.0, 81.0, 102.0, 103.0, 121.0, 91.0, 90.0, 132.0]
2026-01-23 00:28:01,381 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 6 minutes, 31 seconds)
2026-01-23 00:29:38,063 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:29:39,378 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 568.09833 ± 179.631
2026-01-23 00:29:39,378 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [994.6688, 360.9757, 651.38464, 627.61444, 540.4236, 461.41824, 712.3392, 423.75876, 515.8466, 392.5539]
2026-01-23 00:29:39,378 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [177.0, 64.0, 121.0, 118.0, 112.0, 97.0, 148.0, 78.0, 109.0, 80.0]
2026-01-23 00:29:39,383 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 5 minutes, 8 seconds)
2026-01-23 00:31:15,323 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:31:16,801 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 669.98364 ± 167.350
2026-01-23 00:31:16,801 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [611.1734, 570.68976, 755.983, 608.85474, 607.23535, 474.6885, 988.963, 922.83044, 705.31366, 454.1052]
2026-01-23 00:31:16,801 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [115.0, 104.0, 142.0, 113.0, 111.0, 91.0, 174.0, 179.0, 130.0, 86.0]
2026-01-23 00:31:16,811 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 3 minutes, 27 seconds)
2026-01-23 00:32:52,693 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:32:54,403 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 703.37756 ± 206.302
2026-01-23 00:32:54,403 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [439.36786, 682.60645, 639.91876, 1060.1841, 573.353, 556.48645, 777.79846, 458.4492, 1031.9473, 813.6642]
2026-01-23 00:32:54,403 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [86.0, 141.0, 127.0, 217.0, 116.0, 113.0, 144.0, 86.0, 202.0, 157.0]
2026-01-23 00:32:54,407 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 1 minute, 55 seconds)
2026-01-23 00:34:30,726 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:34:32,238 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 660.53430 ± 108.599
2026-01-23 00:34:32,238 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [677.5102, 591.2229, 447.05896, 567.1401, 674.44366, 790.4421, 665.16156, 800.6296, 794.19336, 597.54095]
2026-01-23 00:34:32,238 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [137.0, 108.0, 81.0, 102.0, 124.0, 146.0, 134.0, 150.0, 150.0, 111.0]
2026-01-23 00:34:32,242 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 25 seconds)
2026-01-23 00:36:10,410 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:36:12,168 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 750.24432 ± 149.781
2026-01-23 00:36:12,168 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [564.30786, 778.296, 647.6754, 587.07245, 1047.5132, 680.63556, 809.1429, 689.2422, 723.1445, 975.4136]
2026-01-23 00:36:12,168 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [113.0, 148.0, 123.0, 109.0, 195.0, 128.0, 149.0, 126.0, 134.0, 189.0]
2026-01-23 00:36:12,168 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (750.24) for latency DatasetOffice
2026-01-23 00:36:12,174 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 28/100 (estimated time remaining: 1 hour, 59 minutes, 25 seconds)
2026-01-23 00:37:45,961 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:37:47,355 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 582.60809 ± 178.710
2026-01-23 00:37:47,355 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [603.236, 517.5225, 602.8594, 340.2148, 947.8245, 451.65756, 834.9419, 566.28876, 581.60095, 379.93387]
2026-01-23 00:37:47,355 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [112.0, 109.0, 119.0, 62.0, 176.0, 96.0, 159.0, 105.0, 122.0, 74.0]
2026-01-23 00:37:47,361 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 29/100 (estimated time remaining: 1 hour, 57 minutes, 6 seconds)
2026-01-23 00:39:25,154 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:39:27,407 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 897.75067 ± 381.794
2026-01-23 00:39:27,407 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [524.85803, 1391.3802, 883.7539, 1417.0, 545.5109, 728.4861, 474.54996, 1126.7872, 1399.1143, 486.06668]
2026-01-23 00:39:27,407 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [105.0, 277.0, 179.0, 277.0, 120.0, 135.0, 92.0, 222.0, 270.0, 104.0]
2026-01-23 00:39:27,407 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (897.75) for latency DatasetOffice
2026-01-23 00:39:27,413 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 30/100 (estimated time remaining: 1 hour, 56 minutes, 6 seconds)
2026-01-23 00:41:03,293 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:41:05,326 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 834.17169 ± 221.823
2026-01-23 00:41:05,326 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1003.4943, 1257.9319, 1037.4984, 545.19055, 929.426, 864.46423, 737.4382, 574.76056, 826.13965, 565.3729]
2026-01-23 00:41:05,326 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [194.0, 251.0, 196.0, 100.0, 181.0, 166.0, 139.0, 126.0, 158.0, 118.0]
2026-01-23 00:41:05,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 54 minutes, 32 seconds)
2026-01-23 00:42:40,520 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:42:42,253 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 740.55579 ± 220.546
2026-01-23 00:42:42,253 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [619.4209, 880.0038, 758.7572, 617.9116, 1033.3829, 842.94293, 450.33163, 584.57336, 1142.8236, 475.41022]
2026-01-23 00:42:42,253 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [130.0, 164.0, 140.0, 116.0, 197.0, 159.0, 94.0, 107.0, 213.0, 95.0]
2026-01-23 00:42:42,258 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 52 minutes, 42 seconds)
2026-01-23 00:44:19,825 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:44:22,493 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1141.18579 ± 743.530
2026-01-23 00:44:22,493 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [725.89935, 717.19617, 2903.449, 2241.6853, 971.12585, 706.2397, 1101.5072, 722.37976, 703.59406, 618.7826]
2026-01-23 00:44:22,493 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [135.0, 132.0, 557.0, 416.0, 182.0, 135.0, 208.0, 137.0, 132.0, 120.0]
2026-01-23 00:44:22,493 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (1141.19) for latency DatasetOffice
2026-01-23 00:44:22,499 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 51 minutes, 8 seconds)
2026-01-23 00:45:58,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:46:00,927 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1076.58215 ± 334.218
2026-01-23 00:46:00,928 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [854.6282, 1488.7758, 854.1763, 875.9473, 1445.7081, 1428.6351, 1528.02, 904.0752, 592.93176, 792.92395]
2026-01-23 00:46:00,928 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [157.0, 282.0, 162.0, 172.0, 268.0, 270.0, 293.0, 172.0, 107.0, 145.0]
2026-01-23 00:46:00,933 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 50 minutes, 13 seconds)
2026-01-23 00:47:38,370 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:47:40,300 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 744.29401 ± 247.177
2026-01-23 00:47:40,300 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [552.2704, 572.5259, 1314.1588, 658.8379, 612.3934, 947.5691, 1004.5805, 524.9801, 554.2099, 701.414]
2026-01-23 00:47:40,300 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [115.0, 125.0, 254.0, 135.0, 121.0, 188.0, 218.0, 99.0, 119.0, 136.0]
2026-01-23 00:47:40,307 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 48 minutes, 26 seconds)
2026-01-23 00:49:15,695 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:49:17,475 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 749.57086 ± 316.241
2026-01-23 00:49:17,475 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [641.1008, 589.4296, 1485.1193, 636.022, 858.25977, 600.71716, 770.8927, 963.7324, 773.48, 176.95511]
2026-01-23 00:49:17,475 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [133.0, 117.0, 283.0, 118.0, 162.0, 115.0, 144.0, 183.0, 155.0, 34.0]
2026-01-23 00:49:17,480 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 46 minutes, 37 seconds)
2026-01-23 00:50:54,630 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:50:56,498 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 779.19910 ± 229.228
2026-01-23 00:50:56,498 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [784.52295, 972.64404, 536.92035, 523.6589, 791.04755, 583.52026, 603.92865, 813.1213, 1313.6583, 868.9694]
2026-01-23 00:50:56,498 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [147.0, 185.0, 102.0, 101.0, 147.0, 125.0, 116.0, 160.0, 246.0, 167.0]
2026-01-23 00:50:56,503 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 45 minutes, 26 seconds)
2026-01-23 00:52:32,932 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:52:35,283 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 945.25323 ± 870.056
2026-01-23 00:52:35,283 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [3354.9697, 1039.585, 704.09515, 1100.0708, 616.38, 811.78986, 225.47011, 1121.7507, 236.61072, 241.8102]
2026-01-23 00:52:35,283 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [691.0, 191.0, 132.0, 208.0, 119.0, 154.0, 43.0, 214.0, 45.0, 46.0]
2026-01-23 00:52:35,291 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 43 minutes, 29 seconds)
2026-01-23 00:54:12,713 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:54:15,117 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 999.50018 ± 291.592
2026-01-23 00:54:15,117 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [542.97925, 615.0374, 784.7673, 1578.5211, 1228.09, 1001.88184, 1018.2208, 906.3223, 1152.1178, 1167.0646]
2026-01-23 00:54:15,117 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [104.0, 116.0, 148.0, 300.0, 241.0, 190.0, 194.0, 174.0, 224.0, 223.0]
2026-01-23 00:54:15,122 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 42 minutes, 7 seconds)
2026-01-23 00:55:51,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:55:53,290 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 963.39667 ± 300.893
2026-01-23 00:55:53,290 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [799.337, 509.68253, 565.1143, 935.8811, 951.1256, 1600.7971, 1160.8473, 1024.845, 883.0312, 1203.3058]
2026-01-23 00:55:53,290 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [146.0, 100.0, 107.0, 182.0, 177.0, 303.0, 229.0, 186.0, 164.0, 229.0]
2026-01-23 00:55:53,297 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 40 minutes, 14 seconds)
2026-01-23 00:57:32,686 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:57:36,122 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1373.64160 ± 607.856
2026-01-23 00:57:36,122 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1063.3009, 1994.5333, 1277.0084, 1190.6881, 989.77466, 651.1834, 2277.2754, 578.5783, 2397.8408, 1316.2335]
2026-01-23 00:57:36,122 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [203.0, 399.0, 261.0, 228.0, 191.0, 124.0, 443.0, 107.0, 463.0, 251.0]
2026-01-23 00:57:36,122 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (1373.64) for latency DatasetOffice
2026-01-23 00:57:36,127 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 39 minutes, 43 seconds)
2026-01-23 00:59:09,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:59:13,728 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1517.80371 ± 793.321
2026-01-23 00:59:13,728 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1147.7982, 995.573, 550.8088, 2439.9614, 855.1187, 2000.1926, 1057.1968, 822.6538, 2382.6057, 2926.1282]
2026-01-23 00:59:13,728 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [218.0, 187.0, 103.0, 465.0, 174.0, 385.0, 206.0, 158.0, 458.0, 566.0]
2026-01-23 00:59:13,728 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (1517.80) for latency DatasetOffice
2026-01-23 00:59:13,734 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 37 minutes, 47 seconds)
2026-01-23 01:00:51,660 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:00:56,786 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 2040.88416 ± 825.055
2026-01-23 01:00:56,786 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [2539.1836, 2908.8848, 1687.4425, 2062.3796, 3708.0105, 1209.5626, 1824.6155, 1840.3274, 2044.4761, 583.95886]
2026-01-23 01:00:56,786 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [486.0, 571.0, 323.0, 402.0, 713.0, 230.0, 351.0, 350.0, 394.0, 108.0]
2026-01-23 01:00:56,786 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (2040.88) for latency DatasetOffice
2026-01-23 01:00:56,792 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 36 minutes, 57 seconds)
2026-01-23 01:02:34,020 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:02:38,014 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1593.86487 ± 961.700
2026-01-23 01:02:38,014 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1334.1418, 2162.0698, 764.45166, 1629.0187, 1582.2407, 1148.4773, 1525.7667, 885.0353, 4183.5933, 723.8537]
2026-01-23 01:02:38,014 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [259.0, 416.0, 143.0, 316.0, 303.0, 220.0, 309.0, 167.0, 807.0, 137.0]
2026-01-23 01:02:38,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 35 minutes, 33 seconds)
2026-01-23 01:04:15,789 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:04:20,356 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1819.78552 ± 655.717
2026-01-23 01:04:20,356 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [2894.0137, 1955.2462, 2416.0706, 1024.8109, 1014.731, 1934.5466, 1598.3315, 1399.482, 2755.322, 1205.3027]
2026-01-23 01:04:20,356 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [572.0, 371.0, 472.0, 202.0, 194.0, 383.0, 306.0, 263.0, 533.0, 231.0]
2026-01-23 01:04:20,364 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 34 minutes, 39 seconds)
2026-01-23 01:05:56,326 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:05:59,271 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1222.28528 ± 552.286
2026-01-23 01:05:59,271 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [849.959, 1737.5707, 855.4285, 866.6451, 951.8897, 1641.3859, 1827.047, 704.5601, 2243.6982, 544.66754]
2026-01-23 01:05:59,271 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [162.0, 327.0, 161.0, 168.0, 184.0, 324.0, 351.0, 132.0, 427.0, 104.0]
2026-01-23 01:05:59,276 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 32 minutes, 14 seconds)
2026-01-23 01:07:37,743 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:07:43,707 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 2317.49707 ± 1421.114
2026-01-23 01:07:43,707 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [4688.3877, 1625.871, 1767.4238, 785.5094, 2638.1409, 2828.9548, 1347.6992, 1006.22314, 1412.0226, 5074.739]
2026-01-23 01:07:43,707 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [925.0, 314.0, 340.0, 147.0, 502.0, 542.0, 256.0, 189.0, 270.0, 1000.0]
2026-01-23 01:07:43,707 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (2317.50) for latency DatasetOffice
2026-01-23 01:07:43,714 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 31 minutes, 47 seconds)
2026-01-23 01:09:21,324 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:09:26,696 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 2057.61914 ± 1190.556
2026-01-23 01:09:26,696 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [440.47202, 2305.2097, 2040.178, 2147.0999, 5057.7554, 1836.7501, 770.1814, 1494.9635, 2687.8494, 1795.7324]
2026-01-23 01:09:26,696 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [82.0, 447.0, 407.0, 431.0, 1000.0, 349.0, 140.0, 316.0, 531.0, 341.0]
2026-01-23 01:09:26,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 30 minutes, 5 seconds)
2026-01-23 01:11:02,877 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:11:09,748 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 2642.22363 ± 1439.694
2026-01-23 01:11:09,748 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [3001.5505, 1131.8806, 1122.9374, 2574.6384, 1209.7662, 1433.4615, 5146.316, 5113.679, 2968.682, 2719.3245]
2026-01-23 01:11:09,748 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [569.0, 218.0, 221.0, 492.0, 250.0, 282.0, 1000.0, 1000.0, 584.0, 512.0]
2026-01-23 01:11:09,748 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (2642.22) for latency DatasetOffice
2026-01-23 01:11:09,756 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 28 minutes, 42 seconds)
2026-01-23 01:12:45,378 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:12:50,908 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 2194.73096 ± 1291.814
2026-01-23 01:12:50,908 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [3329.1584, 5142.564, 1204.9584, 237.66498, 1214.0996, 1976.3735, 1489.3889, 2378.1096, 2795.3564, 2179.639]
2026-01-23 01:12:50,908 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [652.0, 1000.0, 230.0, 47.0, 246.0, 374.0, 285.0, 459.0, 551.0, 429.0]
2026-01-23 01:12:50,914 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 26 minutes, 47 seconds)
2026-01-23 01:14:33,511 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:14:37,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1650.78540 ± 1566.633
2026-01-23 01:14:37,783 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [3018.6042, 209.07103, 704.44696, 180.86505, 277.27765, 1017.44543, 3443.6252, 1189.5436, 5079.179, 1387.7954]
2026-01-23 01:14:37,783 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [588.0, 42.0, 141.0, 35.0, 54.0, 208.0, 665.0, 237.0, 1000.0, 278.0]
2026-01-23 01:14:37,789 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 26 minutes, 25 seconds)
2026-01-23 01:16:10,142 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:16:18,860 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 3386.04834 ± 1290.584
2026-01-23 01:16:18,860 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [2253.183, 3131.0237, 5184.7656, 2498.582, 4263.9067, 2710.813, 2713.1255, 1221.2197, 4745.4077, 5138.4575]
2026-01-23 01:16:18,860 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [444.0, 602.0, 1000.0, 496.0, 830.0, 521.0, 535.0, 233.0, 914.0, 1000.0]
2026-01-23 01:16:18,860 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (3386.05) for latency DatasetOffice
2026-01-23 01:16:18,866 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 24 minutes, 8 seconds)
2026-01-23 01:17:52,113 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:17:56,590 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1796.03784 ± 1005.865
2026-01-23 01:17:56,590 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [454.28827, 1457.2239, 2275.841, 739.1024, 905.90955, 2048.3438, 2061.6787, 2614.786, 1356.5665, 4046.6387]
2026-01-23 01:17:56,590 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [89.0, 274.0, 438.0, 137.0, 187.0, 388.0, 397.0, 503.0, 269.0, 785.0]
2026-01-23 01:17:56,596 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 21 minutes, 34 seconds)
2026-01-23 01:19:36,017 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:19:41,621 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 2179.27588 ± 1072.614
2026-01-23 01:19:41,621 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1444.8005, 2598.837, 3046.9382, 1767.5991, 1295.1383, 1948.5912, 3593.611, 987.7323, 922.24243, 4187.2686]
2026-01-23 01:19:41,621 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [275.0, 513.0, 593.0, 336.0, 259.0, 374.0, 703.0, 202.0, 174.0, 799.0]
2026-01-23 01:19:41,627 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 20 minutes, 11 seconds)
2026-01-23 01:21:14,296 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:21:20,673 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 2474.30029 ± 1522.049
2026-01-23 01:21:20,673 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1067.3024, 419.85236, 2563.7441, 5041.3926, 5104.78, 3280.1267, 1455.3647, 1657.7528, 2699.8357, 1452.8524]
2026-01-23 01:21:20,673 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [208.0, 75.0, 504.0, 975.0, 1000.0, 642.0, 276.0, 327.0, 527.0, 274.0]
2026-01-23 01:21:20,680 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 18 minutes, 9 seconds)
2026-01-23 01:23:05,747 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:23:16,867 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 4098.16650 ± 1527.930
2026-01-23 01:23:16,867 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [5110.4067, 5000.14, 5082.945, 5072.673, 5134.718, 1113.8744, 1658.9674, 5064.3135, 2726.8184, 5016.8047]
2026-01-23 01:23:16,867 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 218.0, 327.0, 1000.0, 532.0, 977.0]
2026-01-23 01:23:16,868 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (4098.17) for latency DatasetOffice
2026-01-23 01:23:16,874 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 17 minutes, 51 seconds)
2026-01-23 01:24:53,391 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:25:03,203 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 3591.52002 ± 1668.483
2026-01-23 01:25:03,203 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [3142.76, 5044.3213, 3172.4084, 2923.6958, 529.6964, 4985.5796, 928.99243, 5056.9893, 5013.659, 5117.0977]
2026-01-23 01:25:03,203 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [612.0, 1000.0, 627.0, 579.0, 94.0, 1000.0, 189.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:25:03,210 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 16 minutes, 54 seconds)
2026-01-23 01:26:34,067 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:26:36,779 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1057.40222 ± 1477.177
2026-01-23 01:26:36,779 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [201.45291, 1897.5697, 365.88644, 591.1305, 145.88817, 134.85384, 166.08293, 150.8302, 1904.2745, 5016.053]
2026-01-23 01:26:36,779 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [39.0, 372.0, 72.0, 119.0, 28.0, 26.0, 32.0, 29.0, 368.0, 1000.0]
2026-01-23 01:26:36,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 14 minutes, 33 seconds)
2026-01-23 01:28:14,644 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:28:25,500 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 4025.51489 ± 1339.142
2026-01-23 01:28:25,500 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [5125.6533, 5073.459, 5125.465, 2862.6956, 5140.1133, 3820.96, 1813.082, 4470.9697, 1689.3585, 5133.392]
2026-01-23 01:28:25,500 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 544.0, 1000.0, 759.0, 357.0, 874.0, 334.0, 1000.0]
2026-01-23 01:28:25,508 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 13 minutes, 20 seconds)
2026-01-23 01:29:58,527 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:30:06,927 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 3086.02881 ± 1826.530
2026-01-23 01:30:06,927 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [783.7622, 5103.351, 1649.5092, 5101.744, 2128.9954, 504.13928, 1704.959, 3718.588, 5043.3496, 5121.89]
2026-01-23 01:30:06,927 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [150.0, 1000.0, 336.0, 1000.0, 419.0, 105.0, 337.0, 736.0, 1000.0, 1000.0]
2026-01-23 01:30:06,935 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 11 minutes, 55 seconds)
2026-01-23 01:31:46,817 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:31:53,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 2471.16309 ± 2210.303
2026-01-23 01:31:53,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [4135.8433, 287.2607, 326.85492, 72.99579, 665.4365, 169.68428, 5105.3604, 5120.527, 3742.631, 5085.0376]
2026-01-23 01:31:53,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [807.0, 53.0, 74.0, 15.0, 128.0, 33.0, 1000.0, 1000.0, 731.0, 1000.0]
2026-01-23 01:31:53,554 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 8 minutes, 53 seconds)
2026-01-23 01:33:34,380 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:33:46,150 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 4207.14746 ± 1384.465
2026-01-23 01:33:46,150 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [3021.4976, 5117.249, 4920.807, 4964.81, 5137.2437, 5083.989, 4972.3467, 899.2827, 5113.5156, 2840.7322]
2026-01-23 01:33:46,150 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [597.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 180.0, 1000.0, 561.0]
2026-01-23 01:33:46,150 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (4207.15) for latency DatasetOffice
2026-01-23 01:33:46,157 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 7 minutes, 58 seconds)
2026-01-23 01:35:20,647 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:35:32,135 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 4198.60400 ± 1435.977
2026-01-23 01:35:32,136 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [2554.3516, 5150.3843, 1283.1134, 5100.484, 5139.3486, 5132.272, 5092.636, 2326.6987, 5096.449, 5110.3]
2026-01-23 01:35:32,136 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [505.0, 1000.0, 269.0, 1000.0, 1000.0, 1000.0, 1000.0, 464.0, 1000.0, 1000.0]
2026-01-23 01:35:32,147 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 7 minutes, 48 seconds)
2026-01-23 01:37:08,726 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:37:12,687 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1494.72485 ± 1668.009
2026-01-23 01:37:12,687 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [5040.7686, 3484.09, 140.24869, 3036.8823, 145.59912, 145.61728, 1453.3485, 261.11783, 1057.3519, 182.2249]
2026-01-23 01:37:12,687 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 695.0, 27.0, 606.0, 28.0, 28.0, 292.0, 53.0, 203.0, 35.0]
2026-01-23 01:37:12,694 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 5 minutes, 1 second)
2026-01-23 01:38:45,194 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:38:57,545 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 4437.26660 ± 1315.243
2026-01-23 01:38:57,545 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [5021.756, 5042.069, 4997.691, 5079.213, 5071.173, 947.27954, 5067.961, 2997.4307, 5048.395, 5099.7007]
2026-01-23 01:38:57,545 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 196.0, 1000.0, 586.0, 1000.0, 1000.0]
2026-01-23 01:38:57,545 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (4437.27) for latency DatasetOffice
2026-01-23 01:38:57,552 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 3 minutes, 40 seconds)
2026-01-23 01:40:37,983 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:40:48,750 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 3958.40039 ± 1258.223
2026-01-23 01:40:48,750 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [2629.1787, 4043.9224, 5127.615, 3116.0403, 2322.9106, 5143.487, 5109.0913, 5100.188, 1939.2808, 5052.2925]
2026-01-23 01:40:48,750 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [516.0, 814.0, 1000.0, 606.0, 450.0, 1000.0, 1000.0, 1000.0, 387.0, 1000.0]
2026-01-23 01:40:48,759 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 2 minutes, 26 seconds)
2026-01-23 01:42:23,328 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:42:28,986 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 2073.61304 ± 2257.340
2026-01-23 01:42:28,986 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [152.36996, 466.27667, 276.4289, 307.44418, 156.7595, 160.44377, 5150.568, 5072.111, 3866.6118, 5127.117]
2026-01-23 01:42:28,986 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [29.0, 87.0, 54.0, 63.0, 30.0, 31.0, 1000.0, 1000.0, 757.0, 1000.0]
2026-01-23 01:42:28,996 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 67/100 (estimated time remaining: 59 minutes, 15 seconds)
2026-01-23 01:44:02,753 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:44:15,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 4751.54541 ± 1138.212
2026-01-23 01:44:15,808 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [5166.424, 1338.158, 5113.0864, 5122.4956, 5092.2417, 5095.476, 5188.965, 5110.1206, 5122.73, 5165.754]
2026-01-23 01:44:15,808 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 255.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:44:15,808 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (4751.55) for latency DatasetOffice
2026-01-23 01:44:15,817 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 68/100 (estimated time remaining: 57 minutes, 36 seconds)
2026-01-23 01:45:53,430 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:46:04,995 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 4289.10986 ± 1565.518
2026-01-23 01:46:04,996 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [5165.676, 5151.941, 5214.5396, 5156.357, 5218.8496, 3251.229, 170.51003, 5122.1685, 5163.062, 3276.7644]
2026-01-23 01:46:04,996 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 621.0, 33.0, 1000.0, 1000.0, 638.0]
2026-01-23 01:46:05,006 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 69/100 (estimated time remaining: 56 minutes, 46 seconds)
2026-01-23 01:47:44,686 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:47:47,267 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1057.49780 ± 1298.068
2026-01-23 01:47:47,268 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [3858.586, 3193.3293, 156.57123, 408.06946, 1476.3518, 327.96844, 607.4023, 182.50285, 184.52855, 179.66885]
2026-01-23 01:47:47,268 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [729.0, 610.0, 30.0, 81.0, 287.0, 61.0, 117.0, 35.0, 36.0, 35.0]
2026-01-23 01:47:47,275 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 70/100 (estimated time remaining: 54 minutes, 44 seconds)
2026-01-23 01:49:19,200 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:49:31,723 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 4443.29736 ± 1216.418
2026-01-23 01:49:31,724 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [5035.7783, 5049.4355, 5047.9287, 5032.838, 5049.4766, 5115.7686, 1748.4312, 2298.4158, 5036.941, 5017.9614]
2026-01-23 01:49:31,724 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 353.0, 452.0, 1000.0, 1000.0]
2026-01-23 01:49:31,732 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 71/100 (estimated time remaining: 52 minutes, 17 seconds)
2026-01-23 01:51:12,078 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:51:22,554 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 3961.42114 ± 1319.995
2026-01-23 01:51:22,555 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [2018.1323, 5192.917, 5168.7783, 2399.7087, 2568.1196, 5158.616, 5153.048, 4145.4233, 2601.0017, 5208.4663]
2026-01-23 01:51:22,555 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [398.0, 1000.0, 1000.0, 455.0, 501.0, 1000.0, 1000.0, 795.0, 500.0, 1000.0]
2026-01-23 01:51:22,562 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 72/100 (estimated time remaining: 51 minutes, 34 seconds)
2026-01-23 01:52:56,434 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:53:00,576 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1601.91528 ± 1913.918
2026-01-23 01:53:00,576 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [311.20288, 264.09848, 222.2083, 403.17157, 176.8421, 164.62315, 2389.8396, 5156.0356, 5129.7354, 1801.3967]
2026-01-23 01:53:00,576 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [58.0, 57.0, 45.0, 86.0, 34.0, 32.0, 463.0, 1000.0, 1000.0, 341.0]
2026-01-23 01:53:00,584 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 73/100 (estimated time remaining: 48 minutes, 58 seconds)
2026-01-23 01:54:41,416 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:54:54,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 4750.60596 ± 970.372
2026-01-23 01:54:54,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1897.2244, 5123.619, 5140.671, 5151.171, 5110.03, 5128.1836, 5151.0674, 5133.4014, 4496.4688, 5174.2227]
2026-01-23 01:54:54,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [371.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 881.0, 1000.0]
2026-01-23 01:54:54,479 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 74/100 (estimated time remaining: 47 minutes, 39 seconds)
2026-01-23 01:56:25,030 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:56:35,643 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 3954.33325 ± 1725.656
2026-01-23 01:56:35,643 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [5212.571, 983.4043, 5135.3916, 5170.929, 5159.6475, 5119.43, 5219.9243, 3361.7434, 3636.2349, 544.0575]
2026-01-23 01:56:35,643 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 209.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 656.0, 699.0, 115.0]
2026-01-23 01:56:35,652 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 75/100 (estimated time remaining: 45 minutes, 47 seconds)
2026-01-23 01:58:15,736 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:58:21,031 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1979.72461 ± 1896.080
2026-01-23 01:58:21,031 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [5101.1514, 5113.092, 3015.9253, 3192.4668, 146.08818, 1711.9387, 500.27335, 648.43994, 165.31781, 202.55243]
2026-01-23 01:58:21,031 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 588.0, 631.0, 28.0, 345.0, 91.0, 136.0, 32.0, 39.0]
2026-01-23 01:58:21,041 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 76/100 (estimated time remaining: 44 minutes, 6 seconds)
2026-01-23 02:00:04,535 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:00:17,300 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 4745.51855 ± 780.578
2026-01-23 02:00:17,300 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [5152.3403, 5137.755, 5136.571, 5121.72, 5127.112, 5141.568, 3269.4758, 5106.9043, 5158.457, 3103.2834]
2026-01-23 02:00:17,300 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 641.0, 1000.0, 1000.0, 610.0]
2026-01-23 02:00:17,307 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 77/100 (estimated time remaining: 42 minutes, 46 seconds)
2026-01-23 02:01:49,326 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:02:02,096 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 4729.76416 ± 905.314
2026-01-23 02:02:02,096 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [5105.062, 5105.107, 4427.895, 5097.575, 5112.7725, 5104.7734, 2081.023, 5088.738, 5082.7285, 5091.9697]
2026-01-23 02:02:02,096 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 876.0, 1000.0, 1000.0, 1000.0, 401.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:02:02,103 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 78/100 (estimated time remaining: 41 minutes, 30 seconds)
2026-01-23 02:03:41,077 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:03:53,790 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 4642.96289 ± 1385.791
2026-01-23 02:03:53,790 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [5123.434, 5110.045, 5077.6123, 5099.661, 485.95004, 5114.9863, 5067.2856, 5123.5503, 5104.586, 5122.5186]
2026-01-23 02:03:53,790 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 95.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:03:53,799 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 79/100 (estimated time remaining: 39 minutes, 33 seconds)
2026-01-23 02:05:31,479 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:05:45,319 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 5137.39990 ± 28.882
2026-01-23 02:05:45,319 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [5168.751, 5143.644, 5166.456, 5112.6504, 5153.646, 5116.7656, 5100.6025, 5122.1514, 5186.3877, 5102.948]
2026-01-23 02:05:45,320 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:05:45,320 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (5137.40) for latency DatasetOffice
2026-01-23 02:05:45,327 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 80/100 (estimated time remaining: 38 minutes, 28 seconds)
2026-01-23 02:07:24,933 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:07:32,113 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 2641.29346 ± 2162.810
2026-01-23 02:07:32,113 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [5080.366, 5068.5396, 1680.3528, 5081.8535, 5102.448, 3098.6902, 161.174, 823.1615, 124.50645, 191.84055]
2026-01-23 02:07:32,113 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 332.0, 1000.0, 1000.0, 598.0, 31.0, 159.0, 24.0, 37.0]
2026-01-23 02:07:32,121 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 81/100 (estimated time remaining: 36 minutes, 44 seconds)
2026-01-23 02:09:09,133 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:09:21,097 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 4329.45410 ± 1023.638
2026-01-23 02:09:21,097 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [3615.5977, 5159.153, 5123.9414, 3203.712, 4154.2925, 2013.9725, 5112.842, 4660.792, 5119.2275, 5131.014]
2026-01-23 02:09:21,097 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [710.0, 1000.0, 1000.0, 636.0, 821.0, 396.0, 1000.0, 923.0, 1000.0, 1000.0]
2026-01-23 02:09:21,105 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 82/100 (estimated time remaining: 34 minutes, 26 seconds)
2026-01-23 02:11:00,966 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:11:12,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 4489.51709 ± 1378.906
2026-01-23 02:11:12,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [5163.2393, 5199.2534, 5182.699, 5142.438, 5102.1743, 5192.2227, 5205.667, 2187.5837, 5189.2974, 1330.596]
2026-01-23 02:11:12,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 419.0, 1000.0, 257.0]
2026-01-23 02:11:12,792 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 83/100 (estimated time remaining: 33 minutes, 2 seconds)
2026-01-23 02:12:42,136 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:12:51,404 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 3524.80420 ± 2126.606
2026-01-23 02:12:51,404 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [2686.6924, 5210.6953, 5161.2686, 5246.5605, 5128.63, 5218.4673, 5148.6567, 795.97986, 179.31348, 471.7768]
2026-01-23 02:12:51,404 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [518.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 162.0, 37.0, 92.0]
2026-01-23 02:12:51,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 84/100 (estimated time remaining: 30 minutes, 27 seconds)
2026-01-23 02:14:36,110 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:14:46,955 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 4216.51465 ± 1541.737
2026-01-23 02:14:46,955 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [4813.2686, 3308.2744, 5281.049, 1156.2483, 5168.439, 1546.7855, 5178.188, 5216.9673, 5286.3257, 5209.603]
2026-01-23 02:14:46,955 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [922.0, 636.0, 1000.0, 234.0, 1000.0, 299.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:14:46,965 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 85/100 (estimated time remaining: 28 minutes, 53 seconds)
2026-01-23 02:16:20,136 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:16:32,738 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 4649.39404 ± 1363.412
2026-01-23 02:16:32,738 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [5109.4663, 5121.0747, 5107.3447, 5093.389, 559.4802, 5070.044, 5097.4556, 5098.7715, 5138.9463, 5097.9697]
2026-01-23 02:16:32,738 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 118.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:16:32,747 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 86/100 (estimated time remaining: 27 minutes, 1 second)
2026-01-23 02:18:13,062 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:18:21,052 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 2942.68945 ± 1904.590
2026-01-23 02:18:21,052 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [5074.417, 5071.926, 4376.179, 5053.659, 3408.124, 172.14665, 3161.8455, 177.34448, 2071.6377, 859.61707]
2026-01-23 02:18:21,052 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 858.0, 1000.0, 676.0, 34.0, 616.0, 34.0, 402.0, 165.0]
2026-01-23 02:18:21,060 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 87/100 (estimated time remaining: 25 minutes, 11 seconds)
2026-01-23 02:19:55,447 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:20:08,248 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 4719.97705 ± 1277.017
2026-01-23 02:20:08,248 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [5175.017, 5169.9854, 5131.6616, 5179.338, 5128.1045, 5173.1675, 5120.694, 5188.037, 890.90424, 5042.8647]
2026-01-23 02:20:08,248 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 178.0, 1000.0]
2026-01-23 02:20:08,258 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 88/100 (estimated time remaining: 23 minutes, 12 seconds)
2026-01-23 02:21:49,917 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:22:01,368 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 4188.61230 ± 1414.740
2026-01-23 02:22:01,369 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [5086.487, 5104.6865, 2702.2834, 1756.3914, 5108.0146, 5168.047, 5058.2534, 1726.149, 5124.011, 5051.7993]
2026-01-23 02:22:01,369 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 539.0, 362.0, 1000.0, 1000.0, 1000.0, 342.0, 1000.0, 1000.0]
2026-01-23 02:22:01,377 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 89/100 (estimated time remaining: 21 minutes, 59 seconds)
2026-01-23 02:23:40,250 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:23:53,898 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 4960.34082 ± 542.324
2026-01-23 02:23:53,898 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [5150.006, 5107.001, 5196.5957, 5174.3813, 5092.976, 3335.7378, 5153.5605, 5116.607, 5142.8716, 5133.6685]
2026-01-23 02:23:53,898 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 647.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:23:53,908 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 90/100 (estimated time remaining: 20 minutes, 3 seconds)
2026-01-23 02:25:31,893 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:25:45,842 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 5150.12012 ± 33.212
2026-01-23 02:25:45,842 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [5156.034, 5125.4785, 5130.6196, 5156.6836, 5212.6865, 5192.1104, 5166.7437, 5103.864, 5150.1157, 5106.865]
2026-01-23 02:25:45,842 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:25:45,842 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (5150.12) for latency DatasetOffice
2026-01-23 02:25:45,851 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 91/100 (estimated time remaining: 18 minutes, 26 seconds)
2026-01-23 02:27:19,742 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:27:28,503 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 3161.01904 ± 2028.934
2026-01-23 02:27:28,504 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [5021.8965, 5032.5664, 5009.512, 2111.114, 5003.2095, 5036.0645, 3146.7883, 290.46286, 751.1739, 207.40187]
2026-01-23 02:27:28,504 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 416.0, 1000.0, 1000.0, 616.0, 57.0, 146.0, 40.0]
2026-01-23 02:27:28,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 92/100 (estimated time remaining: 16 minutes, 25 seconds)
2026-01-23 02:29:05,604 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:29:17,198 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 4311.24512 ± 1592.933
2026-01-23 02:29:17,198 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [5099.821, 5116.807, 5051.8335, 5082.9834, 2089.7734, 5096.741, 5017.8037, 354.02087, 5112.6235, 5090.0396]
2026-01-23 02:29:17,198 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 425.0, 1000.0, 1000.0, 71.0, 1000.0, 1000.0]
2026-01-23 02:29:17,208 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 93/100 (estimated time remaining: 14 minutes, 38 seconds)
2026-01-23 02:30:57,078 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:31:10,423 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 4819.35791 ± 836.714
2026-01-23 02:31:10,423 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [5072.2344, 5102.576, 5120.0024, 5120.108, 5085.1597, 5092.556, 5088.1113, 2309.5715, 5103.5356, 5099.7207]
2026-01-23 02:31:10,423 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 448.0, 1000.0, 1000.0]
2026-01-23 02:31:10,433 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 94/100 (estimated time remaining: 12 minutes, 48 seconds)
2026-01-23 02:32:42,391 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:32:50,369 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 2959.93677 ± 2257.009
2026-01-23 02:32:50,369 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [3424.4395, 5051.327, 5088.928, 5138.588, 5116.767, 4748.333, 251.68552, 217.02019, 357.6331, 204.64877]
2026-01-23 02:32:50,369 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [682.0, 1000.0, 1000.0, 1000.0, 1000.0, 930.0, 48.0, 42.0, 74.0, 41.0]
2026-01-23 02:32:50,379 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 43 seconds)
2026-01-23 02:34:35,221 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:34:47,544 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 4454.25586 ± 1354.776
2026-01-23 02:34:47,545 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [5086.0137, 2867.0208, 5069.225, 912.60626, 5120.6763, 5101.848, 5098.102, 5125.3667, 5063.4185, 5098.28]
2026-01-23 02:34:47,545 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 569.0, 1000.0, 181.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:34:47,554 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 1 second)
2026-01-23 02:36:24,153 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:36:37,236 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 4703.03271 ± 1161.793
2026-01-23 02:36:37,236 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [5091.534, 5126.7505, 5103.1597, 5081.022, 5082.579, 5063.086, 5092.6704, 5079.946, 5091.603, 1217.9764]
2026-01-23 02:36:37,236 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 240.0]
2026-01-23 02:36:37,246 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 18 seconds)
2026-01-23 02:38:07,931 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:38:15,557 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 2920.39233 ± 2030.399
2026-01-23 02:38:15,557 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [5197.32, 5260.6865, 5247.6855, 5221.423, 2091.7322, 398.49655, 2857.004, 187.4538, 1886.9801, 855.14215]
2026-01-23 02:38:15,557 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 401.0, 77.0, 551.0, 36.0, 365.0, 157.0]
2026-01-23 02:38:15,566 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 23 seconds)
2026-01-23 02:39:59,883 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:40:12,724 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 4700.63916 ± 999.115
2026-01-23 02:40:12,724 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [5120.9893, 5085.57, 5136.596, 5074.2466, 5085.5444, 5131.9424, 5105.8716, 4360.395, 5125.7183, 1779.5156]
2026-01-23 02:40:12,724 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 863.0, 1000.0, 345.0]
2026-01-23 02:40:12,734 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 36 seconds)
2026-01-23 02:41:45,007 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:41:57,993 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 4773.83740 ± 1131.294
2026-01-23 02:41:57,993 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [5167.649, 5183.038, 5177.6553, 5121.0566, 5208.2627, 5169.771, 1386.747, 5164.067, 5211.207, 4948.9204]
2026-01-23 02:41:57,993 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 267.0, 1000.0, 1000.0, 951.0]
2026-01-23 02:41:58,003 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 49 seconds)
2026-01-23 02:43:37,714 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:43:51,830 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 5074.89355 ± 35.676
2026-01-23 02:43:51,830 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [5063.754, 5137.1465, 5056.7817, 5070.931, 4991.4907, 5100.288, 5083.917, 5071.3164, 5072.292, 5101.0146]
2026-01-23 02:43:51,830 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:43:51,840 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1299 [DEBUG]: Training session finished
