2026-01-23 00:03:39,847 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-humanoid/DatasetOffice-sac-aug-mem2
2026-01-23 00:03:39,847 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-humanoid/DatasetOffice-sac-aug-mem2
2026-01-23 00:03:39,847 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x152df5aed5d0>}
2026-01-23 00:03:39,847 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1159 [DEBUG]: using device: cuda
2026-01-23 00:03:39,847 baseline-sac-noisy-humanoid:77 [WARNING]: args.memorize_actions != args.horizon: 2 != 32
2026-01-23 00:03:39,995 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1181 [INFO]: Creating new trainer
2026-01-23 00:03:40,013 baseline-sac-noisy-humanoid:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=410, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2026-01-23 00:03:40,013 baseline-sac-noisy-humanoid:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=427, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-23 00:03:41,023 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1242 [DEBUG]: Starting training session...
2026-01-23 00:03:41,023 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 1/100
2026-01-23 00:05:11,694 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:05:11,927 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 100.73488 ± 9.894
2026-01-23 00:05:11,927 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [101.58518, 119.84013, 95.12234, 89.009, 100.02148, 95.627365, 114.60738, 94.86142, 107.80123, 88.87326]
2026-01-23 00:05:11,927 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [20.0, 23.0, 19.0, 18.0, 20.0, 19.0, 22.0, 19.0, 21.0, 18.0]
2026-01-23 00:05:11,927 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (100.73) for latency DatasetOffice
2026-01-23 00:05:11,931 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 29 minutes, 59 seconds)
2026-01-23 00:06:51,093 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:06:51,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 314.33957 ± 42.047
2026-01-23 00:06:51,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [283.00006, 305.47595, 288.42755, 253.69481, 320.16083, 347.63385, 308.68604, 391.40723, 372.5765, 272.33292]
2026-01-23 00:06:51,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [53.0, 60.0, 55.0, 48.0, 63.0, 66.0, 59.0, 73.0, 73.0, 52.0]
2026-01-23 00:06:51,808 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (314.34) for latency DatasetOffice
2026-01-23 00:06:51,811 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 35 minutes, 48 seconds)
2026-01-23 00:08:30,389 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:08:31,408 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 400.86316 ± 92.579
2026-01-23 00:08:31,408 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [334.75613, 368.0484, 293.66843, 466.6778, 331.23798, 489.01978, 583.35095, 358.37686, 482.7078, 300.7877]
2026-01-23 00:08:31,408 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [62.0, 73.0, 63.0, 94.0, 73.0, 106.0, 119.0, 72.0, 90.0, 65.0]
2026-01-23 00:08:31,408 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (400.86) for latency DatasetOffice
2026-01-23 00:08:31,411 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 36 minutes, 29 seconds)
2026-01-23 00:10:10,611 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:10:11,508 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 362.64716 ± 53.541
2026-01-23 00:10:11,509 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [490.25183, 332.2011, 378.6062, 369.033, 325.62143, 339.32306, 330.05917, 298.86575, 422.62796, 339.88235]
2026-01-23 00:10:11,509 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [98.0, 74.0, 74.0, 72.0, 62.0, 65.0, 62.0, 56.0, 80.0, 65.0]
2026-01-23 00:10:11,511 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 36 minutes, 11 seconds)
2026-01-23 00:11:51,089 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:11:52,188 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 423.28375 ± 75.708
2026-01-23 00:11:52,188 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [367.0699, 395.5809, 378.3767, 513.45374, 386.66254, 355.42685, 336.05457, 562.45935, 413.37497, 524.378]
2026-01-23 00:11:52,188 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [73.0, 88.0, 85.0, 110.0, 81.0, 78.0, 73.0, 105.0, 80.0, 109.0]
2026-01-23 00:11:52,188 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (423.28) for latency DatasetOffice
2026-01-23 00:11:52,191 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 35 minutes, 32 seconds)
2026-01-23 00:13:31,250 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:13:32,367 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 452.41913 ± 105.262
2026-01-23 00:13:32,367 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [451.8725, 360.30206, 439.81607, 378.8594, 395.64374, 417.272, 494.1318, 741.6355, 472.77264, 371.88568]
2026-01-23 00:13:32,368 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [84.0, 67.0, 84.0, 86.0, 78.0, 95.0, 102.0, 156.0, 99.0, 69.0]
2026-01-23 00:13:32,368 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (452.42) for latency DatasetOffice
2026-01-23 00:13:32,371 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 36 minutes, 48 seconds)
2026-01-23 00:15:11,789 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:15:12,645 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 336.83868 ± 68.143
2026-01-23 00:15:12,645 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [334.14658, 297.58383, 266.2868, 437.92166, 294.4408, 473.37494, 350.38727, 321.32587, 243.70454, 349.21445]
2026-01-23 00:15:12,645 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [63.0, 63.0, 58.0, 94.0, 64.0, 95.0, 70.0, 67.0, 53.0, 72.0]
2026-01-23 00:15:12,649 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 35 minutes, 15 seconds)
2026-01-23 00:16:52,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:16:52,969 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 380.84006 ± 74.916
2026-01-23 00:16:52,969 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [409.86502, 374.3829, 440.17007, 345.9989, 317.7614, 406.5267, 548.10406, 254.69612, 373.99408, 336.90143]
2026-01-23 00:16:52,969 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [79.0, 72.0, 83.0, 65.0, 61.0, 79.0, 120.0, 49.0, 70.0, 63.0]
2026-01-23 00:16:52,972 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 33 minutes, 48 seconds)
2026-01-23 00:18:32,169 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:18:33,134 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 411.97266 ± 67.836
2026-01-23 00:18:33,134 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [448.43787, 394.20093, 418.0474, 498.64636, 435.2417, 490.33035, 389.77777, 352.8857, 439.55045, 252.60834]
2026-01-23 00:18:33,134 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [86.0, 76.0, 89.0, 92.0, 82.0, 105.0, 72.0, 64.0, 84.0, 48.0]
2026-01-23 00:18:33,139 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 32 minutes, 9 seconds)
2026-01-23 00:20:13,020 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:20:14,168 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 480.67032 ± 206.723
2026-01-23 00:20:14,168 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [463.1682, 479.5144, 365.78915, 912.23676, 364.0253, 431.58206, 327.20163, 287.937, 840.6012, 334.64737]
2026-01-23 00:20:14,168 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [88.0, 88.0, 69.0, 184.0, 70.0, 79.0, 61.0, 53.0, 167.0, 75.0]
2026-01-23 00:20:14,168 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (480.67) for latency DatasetOffice
2026-01-23 00:20:14,172 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 30 minutes, 35 seconds)
2026-01-23 00:21:54,076 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:21:55,075 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 429.80908 ± 69.294
2026-01-23 00:21:55,076 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [404.71396, 514.0456, 380.91666, 458.03262, 424.9765, 392.81158, 589.85803, 388.47867, 401.5743, 342.68323]
2026-01-23 00:21:55,076 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [75.0, 109.0, 70.0, 85.0, 77.0, 74.0, 115.0, 70.0, 76.0, 63.0]
2026-01-23 00:21:55,079 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 29 minutes, 8 seconds)
2026-01-23 00:23:34,012 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:23:35,119 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 466.00894 ± 117.089
2026-01-23 00:23:35,119 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [501.6998, 476.89026, 384.71887, 421.81265, 570.37573, 582.8253, 192.45818, 487.2096, 417.9285, 624.17053]
2026-01-23 00:23:35,119 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [92.0, 89.0, 82.0, 79.0, 107.0, 107.0, 37.0, 93.0, 91.0, 120.0]
2026-01-23 00:23:35,123 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 27 minutes, 23 seconds)
2026-01-23 00:25:15,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:25:16,362 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 500.41797 ± 98.115
2026-01-23 00:25:16,362 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [347.93695, 635.0727, 398.6718, 389.4829, 603.7395, 441.0589, 470.97617, 580.63324, 540.85004, 595.75806]
2026-01-23 00:25:16,362 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [65.0, 119.0, 78.0, 76.0, 121.0, 83.0, 102.0, 121.0, 108.0, 122.0]
2026-01-23 00:25:16,362 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (500.42) for latency DatasetOffice
2026-01-23 00:25:16,366 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 25 minutes, 59 seconds)
2026-01-23 00:26:56,746 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:26:58,072 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 525.15344 ± 140.284
2026-01-23 00:26:58,072 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [372.34277, 436.16678, 517.97107, 625.1073, 739.3026, 239.6621, 595.02057, 629.8376, 473.1317, 622.9923]
2026-01-23 00:26:58,072 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [68.0, 93.0, 112.0, 137.0, 145.0, 51.0, 130.0, 118.0, 89.0, 133.0]
2026-01-23 00:26:58,072 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (525.15) for latency DatasetOffice
2026-01-23 00:26:58,077 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 24 minutes, 44 seconds)
2026-01-23 00:28:37,480 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:28:38,686 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 490.78937 ± 103.495
2026-01-23 00:28:38,687 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [482.6957, 440.80252, 451.3898, 518.8973, 451.77536, 675.4932, 562.7167, 328.2897, 629.9895, 365.8435]
2026-01-23 00:28:38,687 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [88.0, 78.0, 85.0, 98.0, 93.0, 142.0, 119.0, 71.0, 117.0, 82.0]
2026-01-23 00:28:38,691 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 22 minutes, 56 seconds)
2026-01-23 00:30:18,492 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:30:19,577 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 438.64371 ± 82.406
2026-01-23 00:30:19,577 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [379.78076, 506.88837, 570.1541, 381.37915, 402.36798, 337.58514, 526.87805, 352.27182, 390.08936, 539.04236]
2026-01-23 00:30:19,578 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [69.0, 108.0, 107.0, 70.0, 88.0, 72.0, 99.0, 67.0, 86.0, 101.0]
2026-01-23 00:30:19,582 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 21 minutes, 15 seconds)
2026-01-23 00:31:59,444 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:32:00,657 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 499.93277 ± 126.358
2026-01-23 00:32:00,658 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [435.7216, 746.54083, 381.801, 456.30884, 388.99396, 705.63184, 426.80133, 381.81967, 570.0379, 505.67026]
2026-01-23 00:32:00,658 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [81.0, 143.0, 81.0, 90.0, 75.0, 134.0, 82.0, 75.0, 120.0, 107.0]
2026-01-23 00:32:00,662 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 19 minutes, 51 seconds)
2026-01-23 00:33:40,901 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:33:42,230 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 527.93219 ± 125.515
2026-01-23 00:33:42,230 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [560.0246, 395.32175, 483.82217, 409.35538, 713.7398, 473.99176, 406.51004, 679.2069, 724.8038, 432.5452]
2026-01-23 00:33:42,230 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [107.0, 75.0, 94.0, 79.0, 140.0, 91.0, 77.0, 143.0, 146.0, 80.0]
2026-01-23 00:33:42,231 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (527.93) for latency DatasetOffice
2026-01-23 00:33:42,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 18 minutes, 16 seconds)
2026-01-23 00:35:22,224 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:35:23,525 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 527.43622 ± 110.614
2026-01-23 00:35:23,525 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [383.8697, 386.39847, 531.2993, 772.51434, 541.5768, 574.7792, 583.4008, 406.45743, 566.9741, 527.09186]
2026-01-23 00:35:23,525 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [72.0, 72.0, 111.0, 151.0, 102.0, 127.0, 111.0, 74.0, 123.0, 99.0]
2026-01-23 00:35:23,529 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 16 minutes, 28 seconds)
2026-01-23 00:37:03,798 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:37:05,047 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 520.14960 ± 131.559
2026-01-23 00:37:05,047 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [393.38684, 467.62415, 493.61923, 488.25012, 555.46106, 543.7468, 424.5462, 835.07635, 644.0908, 355.69443]
2026-01-23 00:37:05,047 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [74.0, 101.0, 102.0, 89.0, 105.0, 100.0, 86.0, 171.0, 121.0, 67.0]
2026-01-23 00:37:05,052 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 15 minutes, 1 second)
2026-01-23 00:38:45,501 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:38:47,042 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 597.40259 ± 170.200
2026-01-23 00:38:47,042 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [440.38382, 505.61185, 598.30707, 428.66177, 457.29144, 513.65063, 963.67145, 570.9701, 644.23474, 851.2425]
2026-01-23 00:38:47,042 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [97.0, 95.0, 131.0, 83.0, 97.0, 96.0, 187.0, 106.0, 120.0, 169.0]
2026-01-23 00:38:47,042 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (597.40) for latency DatasetOffice
2026-01-23 00:38:47,049 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 13 minutes, 37 seconds)
2026-01-23 00:40:26,109 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:40:27,203 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 446.30380 ± 82.652
2026-01-23 00:40:27,203 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [569.53436, 345.6622, 469.80463, 411.7842, 384.32178, 360.43365, 353.0909, 506.81964, 572.0938, 489.49283]
2026-01-23 00:40:27,203 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [105.0, 64.0, 98.0, 76.0, 71.0, 68.0, 66.0, 92.0, 105.0, 102.0]
2026-01-23 00:40:27,207 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 11 minutes, 42 seconds)
2026-01-23 00:42:08,356 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:42:09,654 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 503.31284 ± 95.906
2026-01-23 00:42:09,654 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [548.3445, 370.24622, 546.50323, 564.90497, 572.882, 655.4621, 425.30148, 560.616, 446.95877, 341.90863]
2026-01-23 00:42:09,654 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [105.0, 81.0, 100.0, 109.0, 122.0, 139.0, 94.0, 102.0, 95.0, 77.0]
2026-01-23 00:42:09,659 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 10 minutes, 14 seconds)
2026-01-23 00:43:49,214 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:43:50,794 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 601.80975 ± 108.811
2026-01-23 00:43:50,794 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [703.26337, 537.8302, 609.5205, 659.7225, 650.0664, 650.029, 476.5909, 393.35883, 550.91327, 786.8026]
2026-01-23 00:43:50,794 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [134.0, 100.0, 131.0, 133.0, 143.0, 138.0, 100.0, 86.0, 123.0, 147.0]
2026-01-23 00:43:50,794 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (601.81) for latency DatasetOffice
2026-01-23 00:43:50,801 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 8 minutes, 30 seconds)
2026-01-23 00:45:31,506 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:45:32,877 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 556.73651 ± 155.787
2026-01-23 00:45:32,878 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [478.7477, 601.36255, 338.45358, 553.9108, 957.4943, 612.24164, 467.18173, 587.7294, 440.1691, 530.0746]
2026-01-23 00:45:32,878 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [93.0, 110.0, 74.0, 102.0, 190.0, 112.0, 99.0, 115.0, 83.0, 103.0]
2026-01-23 00:45:32,884 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 6 minutes, 57 seconds)
2026-01-23 00:47:12,905 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:47:14,117 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 483.60458 ± 86.311
2026-01-23 00:47:14,117 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [375.41068, 530.70483, 463.34128, 699.446, 473.25754, 415.26636, 434.04248, 472.90613, 430.0887, 541.5815]
2026-01-23 00:47:14,117 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [70.0, 100.0, 87.0, 148.0, 89.0, 80.0, 80.0, 100.0, 88.0, 113.0]
2026-01-23 00:47:14,121 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 5 minutes, 4 seconds)
2026-01-23 00:48:55,208 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:48:56,657 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 591.80420 ± 227.251
2026-01-23 00:48:56,657 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [463.77872, 504.01144, 945.12, 452.1553, 558.16675, 508.15762, 389.34525, 444.1261, 537.7788, 1115.4016]
2026-01-23 00:48:56,657 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [86.0, 95.0, 182.0, 85.0, 103.0, 94.0, 74.0, 83.0, 98.0, 220.0]
2026-01-23 00:48:56,661 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 3 minutes, 58 seconds)
2026-01-23 00:50:36,020 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:50:37,267 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 502.60919 ± 117.910
2026-01-23 00:50:37,267 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [410.32538, 379.972, 343.96454, 682.6358, 596.1192, 408.52307, 489.85815, 695.7909, 472.56305, 546.3396]
2026-01-23 00:50:37,267 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [86.0, 74.0, 66.0, 143.0, 115.0, 80.0, 93.0, 134.0, 95.0, 108.0]
2026-01-23 00:50:37,273 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 1 minute, 49 seconds)
2026-01-23 00:52:17,643 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:52:19,158 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 600.80042 ± 249.435
2026-01-23 00:52:19,158 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [892.82666, 682.81854, 528.31024, 516.6231, 154.99777, 359.76828, 538.6647, 613.6384, 612.7751, 1107.5813]
2026-01-23 00:52:19,158 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [194.0, 130.0, 117.0, 97.0, 30.0, 69.0, 101.0, 120.0, 116.0, 216.0]
2026-01-23 00:52:19,164 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 18 seconds)
2026-01-23 00:54:00,299 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:54:01,823 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 639.15082 ± 217.667
2026-01-23 00:54:01,823 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [450.57938, 749.38275, 907.0346, 505.60526, 557.0873, 451.21555, 1113.6155, 723.9542, 501.9729, 431.06097]
2026-01-23 00:54:01,823 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [83.0, 142.0, 181.0, 94.0, 103.0, 83.0, 213.0, 135.0, 95.0, 82.0]
2026-01-23 00:54:01,823 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (639.15) for latency DatasetOffice
2026-01-23 00:54:01,828 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 58 minutes, 45 seconds)
2026-01-23 00:55:42,395 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:55:43,776 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 550.07062 ± 66.986
2026-01-23 00:55:43,776 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [550.10834, 611.86194, 554.6595, 620.2689, 496.97464, 436.33463, 460.12122, 561.5393, 660.183, 548.65485]
2026-01-23 00:55:43,776 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [119.0, 111.0, 120.0, 113.0, 105.0, 81.0, 84.0, 121.0, 124.0, 101.0]
2026-01-23 00:55:43,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 57 minutes, 13 seconds)
2026-01-23 00:57:22,725 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:57:23,883 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 488.99567 ± 105.202
2026-01-23 00:57:23,883 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [341.2713, 440.29892, 673.40356, 544.3969, 497.94553, 374.00366, 489.48145, 659.0534, 461.1375, 408.96375]
2026-01-23 00:57:23,883 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [62.0, 81.0, 129.0, 103.0, 94.0, 68.0, 89.0, 122.0, 84.0, 79.0]
2026-01-23 00:57:23,889 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 54 minutes, 58 seconds)
2026-01-23 00:59:04,184 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:59:05,451 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 533.53375 ± 110.156
2026-01-23 00:59:05,451 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [394.5741, 592.99194, 483.858, 549.8647, 531.0656, 539.0063, 359.7807, 604.40247, 777.5372, 502.25647]
2026-01-23 00:59:05,451 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [72.0, 109.0, 89.0, 102.0, 97.0, 98.0, 66.0, 130.0, 150.0, 92.0]
2026-01-23 00:59:05,456 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 53 minutes, 29 seconds)
2026-01-23 01:00:45,068 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:00:47,019 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 764.85706 ± 180.659
2026-01-23 01:00:47,019 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [626.0698, 539.2028, 642.6239, 748.59094, 613.8721, 1097.7043, 653.58356, 773.3007, 1027.3264, 926.29596]
2026-01-23 01:00:47,019 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [117.0, 116.0, 129.0, 154.0, 130.0, 211.0, 129.0, 147.0, 197.0, 181.0]
2026-01-23 01:00:47,019 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (764.86) for latency DatasetOffice
2026-01-23 01:00:47,027 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 51 minutes, 43 seconds)
2026-01-23 01:02:25,801 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:02:27,611 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 741.61359 ± 265.616
2026-01-23 01:02:27,611 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1125.4907, 580.0778, 658.3751, 436.4392, 636.99744, 515.1624, 715.0179, 687.92163, 714.5107, 1346.1426]
2026-01-23 01:02:27,611 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [222.0, 106.0, 119.0, 80.0, 117.0, 94.0, 130.0, 127.0, 152.0, 279.0]
2026-01-23 01:02:27,617 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 49 minutes, 35 seconds)
2026-01-23 01:04:06,204 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:04:07,908 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 683.36792 ± 240.393
2026-01-23 01:04:07,908 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [815.648, 481.89136, 609.0602, 660.47894, 407.4229, 666.6746, 724.7452, 691.1738, 1307.5565, 469.02795]
2026-01-23 01:04:07,908 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [164.0, 90.0, 119.0, 124.0, 74.0, 127.0, 157.0, 138.0, 259.0, 103.0]
2026-01-23 01:04:07,919 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 47 minutes, 32 seconds)
2026-01-23 01:05:47,197 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:05:48,358 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 511.64795 ± 135.177
2026-01-23 01:05:48,358 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [506.42862, 431.2961, 355.25745, 401.0639, 447.1185, 390.05807, 745.933, 762.7539, 563.2612, 513.30853]
2026-01-23 01:05:48,358 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [93.0, 77.0, 65.0, 72.0, 81.0, 71.0, 140.0, 141.0, 102.0, 92.0]
2026-01-23 01:05:48,363 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 45 minutes, 56 seconds)
2026-01-23 01:07:26,660 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:07:28,172 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 617.13776 ± 104.806
2026-01-23 01:07:28,172 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [513.0802, 672.80054, 622.9893, 787.7162, 590.44196, 561.8957, 492.03958, 817.00214, 564.53894, 548.8725]
2026-01-23 01:07:28,172 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [111.0, 129.0, 115.0, 145.0, 108.0, 109.0, 90.0, 174.0, 104.0, 100.0]
2026-01-23 01:07:28,178 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 43 minutes, 53 seconds)
2026-01-23 01:09:07,435 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:09:09,209 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 722.88782 ± 117.601
2026-01-23 01:09:09,210 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [615.76917, 874.709, 798.0826, 722.36395, 947.0027, 725.90265, 717.7884, 639.56323, 655.54333, 532.1527]
2026-01-23 01:09:09,210 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [114.0, 164.0, 160.0, 146.0, 184.0, 136.0, 135.0, 126.0, 123.0, 116.0]
2026-01-23 01:09:09,218 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 42 minutes, 6 seconds)
2026-01-23 01:10:49,090 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:10:50,721 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 645.30701 ± 156.057
2026-01-23 01:10:50,721 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [434.66452, 749.77783, 718.03595, 649.5853, 380.40683, 720.15344, 688.1011, 510.31427, 939.987, 662.04425]
2026-01-23 01:10:50,721 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [79.0, 138.0, 154.0, 119.0, 71.0, 137.0, 129.0, 110.0, 179.0, 141.0]
2026-01-23 01:10:50,728 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 40 minutes, 37 seconds)
2026-01-23 01:12:28,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:12:30,906 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 848.24835 ± 261.255
2026-01-23 01:12:30,906 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [744.2945, 1095.2126, 615.9061, 678.7414, 819.5796, 1525.3169, 876.4279, 638.1897, 710.512, 778.30206]
2026-01-23 01:12:30,906 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [138.0, 207.0, 120.0, 123.0, 157.0, 299.0, 168.0, 139.0, 133.0, 148.0]
2026-01-23 01:12:30,906 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (848.25) for latency DatasetOffice
2026-01-23 01:12:30,914 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 38 minutes, 55 seconds)
2026-01-23 01:14:09,913 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:14:11,926 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 768.90692 ± 296.374
2026-01-23 01:14:11,926 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [609.4407, 859.5589, 978.53503, 1220.9911, 553.435, 492.58472, 476.5735, 586.267, 578.40393, 1333.28]
2026-01-23 01:14:11,926 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [135.0, 171.0, 201.0, 228.0, 118.0, 91.0, 102.0, 123.0, 127.0, 262.0]
2026-01-23 01:14:11,936 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 37 minutes, 21 seconds)
2026-01-23 01:15:50,955 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:15:53,029 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 829.16357 ± 242.783
2026-01-23 01:15:53,029 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [626.708, 1433.0159, 685.10974, 892.2232, 861.89215, 1045.3595, 594.48004, 815.84247, 621.6286, 715.3764]
2026-01-23 01:15:53,029 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [126.0, 280.0, 126.0, 168.0, 165.0, 209.0, 132.0, 160.0, 115.0, 149.0]
2026-01-23 01:15:53,035 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 35 minutes, 55 seconds)
2026-01-23 01:17:31,715 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:17:33,354 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 639.76093 ± 206.352
2026-01-23 01:17:33,354 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [481.7977, 638.48175, 1011.01276, 1055.819, 500.4506, 481.8556, 497.4441, 653.6128, 490.30396, 586.831]
2026-01-23 01:17:33,354 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [88.0, 119.0, 196.0, 202.0, 105.0, 105.0, 108.0, 122.0, 108.0, 128.0]
2026-01-23 01:17:33,360 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 34 minutes, 6 seconds)
2026-01-23 01:19:12,745 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:19:14,593 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 746.95929 ± 97.012
2026-01-23 01:19:14,594 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [714.3215, 703.19293, 607.3912, 966.51404, 759.3532, 735.51, 836.8405, 625.70605, 756.9337, 763.82983]
2026-01-23 01:19:14,594 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [139.0, 136.0, 118.0, 208.0, 150.0, 137.0, 165.0, 117.0, 151.0, 150.0]
2026-01-23 01:19:14,600 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 32 minutes, 22 seconds)
2026-01-23 01:20:55,635 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:20:57,662 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 792.13141 ± 190.608
2026-01-23 01:20:57,662 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1026.9255, 1224.4957, 836.99176, 740.5436, 500.86816, 739.47327, 697.2185, 699.69666, 688.3531, 766.74774]
2026-01-23 01:20:57,662 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [205.0, 241.0, 159.0, 140.0, 109.0, 145.0, 153.0, 152.0, 129.0, 145.0]
2026-01-23 01:20:57,667 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 31 minutes, 12 seconds)
2026-01-23 01:22:36,011 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:22:38,819 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1030.31189 ± 434.469
2026-01-23 01:22:38,819 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [832.7896, 642.7404, 2065.8645, 1378.9626, 768.60675, 1031.765, 1000.09534, 721.43506, 1330.2544, 530.60596]
2026-01-23 01:22:38,819 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [174.0, 139.0, 416.0, 273.0, 165.0, 206.0, 198.0, 150.0, 282.0, 112.0]
2026-01-23 01:22:38,819 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (1030.31) for latency DatasetOffice
2026-01-23 01:22:38,826 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 29 minutes, 33 seconds)
2026-01-23 01:24:17,796 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:24:19,547 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 694.55261 ± 285.450
2026-01-23 01:24:19,548 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1197.6671, 1011.90265, 626.94434, 135.75275, 901.08466, 781.76794, 431.63126, 691.5034, 592.3277, 574.94415]
2026-01-23 01:24:19,548 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [235.0, 215.0, 136.0, 26.0, 171.0, 159.0, 79.0, 128.0, 110.0, 105.0]
2026-01-23 01:24:19,555 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 27 minutes, 47 seconds)
2026-01-23 01:25:58,606 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:26:00,649 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 823.62189 ± 260.480
2026-01-23 01:26:00,650 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [776.8789, 997.4323, 600.7897, 716.9233, 710.5823, 559.64, 514.02844, 1400.2258, 867.03326, 1092.685]
2026-01-23 01:26:00,650 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [164.0, 184.0, 110.0, 132.0, 134.0, 102.0, 93.0, 282.0, 161.0, 217.0]
2026-01-23 01:26:00,656 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 26 minutes, 14 seconds)
2026-01-23 01:27:41,209 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:27:43,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 840.85144 ± 289.749
2026-01-23 01:27:43,304 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1477.0665, 564.831, 770.1395, 609.2942, 892.06586, 631.58594, 719.364, 672.60486, 776.46124, 1295.1016]
2026-01-23 01:27:43,304 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [282.0, 103.0, 146.0, 112.0, 174.0, 116.0, 132.0, 123.0, 147.0, 264.0]
2026-01-23 01:27:43,311 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 24 minutes, 47 seconds)
2026-01-23 01:29:21,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:29:23,276 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 763.05255 ± 220.400
2026-01-23 01:29:23,276 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1332.7449, 799.7967, 632.3049, 811.6586, 859.75775, 732.30084, 672.6068, 522.8268, 756.8113, 509.7168]
2026-01-23 01:29:23,276 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [254.0, 148.0, 119.0, 156.0, 166.0, 154.0, 145.0, 94.0, 146.0, 110.0]
2026-01-23 01:29:23,283 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 22 minutes, 35 seconds)
2026-01-23 01:31:04,027 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:31:06,175 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 856.38867 ± 406.244
2026-01-23 01:31:06,175 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [633.1027, 1108.7067, 831.3227, 1168.3032, 1047.7285, 1475.3365, 1224.7278, 646.2759, 128.93732, 299.4453]
2026-01-23 01:31:06,175 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [116.0, 212.0, 165.0, 223.0, 196.0, 280.0, 236.0, 127.0, 25.0, 58.0]
2026-01-23 01:31:06,182 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 21 minutes, 10 seconds)
2026-01-23 01:32:44,336 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:32:46,708 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 921.72638 ± 415.992
2026-01-23 01:32:46,708 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [689.3, 796.07654, 1852.1348, 1302.3109, 783.42114, 1325.1222, 487.76367, 459.8909, 674.61304, 846.6297]
2026-01-23 01:32:46,708 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [133.0, 151.0, 366.0, 256.0, 148.0, 258.0, 107.0, 81.0, 135.0, 157.0]
2026-01-23 01:32:46,715 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 19 minutes, 27 seconds)
2026-01-23 01:34:26,463 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:34:28,433 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 779.82172 ± 199.094
2026-01-23 01:34:28,433 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [634.30115, 721.2284, 684.1162, 525.94135, 477.7783, 814.027, 1143.4259, 976.4943, 929.70294, 891.2017]
2026-01-23 01:34:28,433 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [117.0, 136.0, 131.0, 97.0, 104.0, 156.0, 221.0, 184.0, 188.0, 187.0]
2026-01-23 01:34:28,440 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 17 minutes, 51 seconds)
2026-01-23 01:36:08,181 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:36:10,675 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 957.72626 ± 219.178
2026-01-23 01:36:10,675 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1435.3773, 847.7531, 1191.483, 810.197, 888.52313, 1028.3274, 576.6004, 888.3836, 911.22296, 999.3949]
2026-01-23 01:36:10,675 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [303.0, 182.0, 224.0, 155.0, 169.0, 206.0, 125.0, 174.0, 171.0, 192.0]
2026-01-23 01:36:10,683 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 16 minutes, 6 seconds)
2026-01-23 01:37:50,126 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:37:52,268 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 828.50439 ± 217.878
2026-01-23 01:37:52,268 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [845.4101, 905.70905, 1086.5779, 746.53345, 703.6942, 1186.725, 848.5672, 956.94916, 606.6101, 398.26727]
2026-01-23 01:37:52,268 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [172.0, 176.0, 212.0, 149.0, 148.0, 231.0, 167.0, 196.0, 120.0, 80.0]
2026-01-23 01:37:52,275 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 14 minutes, 39 seconds)
2026-01-23 01:39:30,442 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:39:32,864 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 979.21320 ± 231.484
2026-01-23 01:39:32,864 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [866.60394, 1351.9861, 1117.3245, 678.94464, 1225.9653, 1059.7373, 645.0824, 916.3071, 751.08, 1179.101]
2026-01-23 01:39:32,864 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [162.0, 255.0, 211.0, 133.0, 236.0, 200.0, 121.0, 174.0, 142.0, 252.0]
2026-01-23 01:39:32,870 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 12 minutes, 37 seconds)
2026-01-23 01:41:12,592 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:41:15,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 991.33679 ± 234.637
2026-01-23 01:41:15,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [996.3106, 1460.278, 983.8799, 682.4357, 882.89594, 1385.6581, 974.4911, 821.0969, 916.8509, 809.47095]
2026-01-23 01:41:15,125 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [197.0, 296.0, 194.0, 136.0, 167.0, 277.0, 184.0, 163.0, 179.0, 153.0]
2026-01-23 01:41:15,132 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 11 minutes, 10 seconds)
2026-01-23 01:42:53,747 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:42:56,164 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 940.05127 ± 173.128
2026-01-23 01:42:56,164 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [617.7303, 898.0616, 827.7747, 835.8514, 876.54895, 1193.5974, 1138.383, 1065.036, 823.6507, 1123.8788]
2026-01-23 01:42:56,164 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [128.0, 168.0, 176.0, 167.0, 166.0, 229.0, 224.0, 210.0, 155.0, 228.0]
2026-01-23 01:42:56,172 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 9 minutes, 23 seconds)
2026-01-23 01:44:36,500 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:44:38,557 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 804.57800 ± 209.705
2026-01-23 01:44:38,557 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [729.1969, 831.7144, 240.86406, 894.99, 742.2546, 920.0016, 805.47015, 945.79034, 875.55695, 1059.9407]
2026-01-23 01:44:38,557 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [152.0, 157.0, 46.0, 171.0, 139.0, 172.0, 153.0, 178.0, 165.0, 215.0]
2026-01-23 01:44:38,565 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 7 minutes, 43 seconds)
2026-01-23 01:46:20,821 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:46:22,791 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 763.68628 ± 181.023
2026-01-23 01:46:22,791 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [459.70374, 899.6631, 596.01434, 1020.837, 663.35583, 1064.0479, 635.7722, 704.7012, 808.5025, 784.2653]
2026-01-23 01:46:22,791 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [93.0, 171.0, 129.0, 196.0, 148.0, 203.0, 133.0, 147.0, 165.0, 170.0]
2026-01-23 01:46:22,800 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 6 minutes, 22 seconds)
2026-01-23 01:48:02,553 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:48:06,008 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1328.42297 ± 500.142
2026-01-23 01:48:06,008 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [872.04285, 1643.8485, 620.206, 2069.7588, 1313.4081, 1391.8473, 1361.1812, 514.9829, 1996.0641, 1500.89]
2026-01-23 01:48:06,008 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [176.0, 320.0, 115.0, 397.0, 254.0, 280.0, 262.0, 112.0, 412.0, 287.0]
2026-01-23 01:48:06,008 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (1328.42) for latency DatasetOffice
2026-01-23 01:48:06,017 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 4 minutes, 59 seconds)
2026-01-23 01:49:42,087 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:49:44,853 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1071.22437 ± 587.307
2026-01-23 01:49:44,853 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [727.27704, 1127.8544, 1365.5309, 738.91815, 1007.89935, 718.8009, 791.56647, 2706.9133, 590.89124, 936.5916]
2026-01-23 01:49:44,853 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [148.0, 228.0, 267.0, 138.0, 200.0, 135.0, 149.0, 546.0, 126.0, 176.0]
2026-01-23 01:49:44,860 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 2 minutes, 51 seconds)
2026-01-23 01:51:24,001 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:51:26,150 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 888.36884 ± 155.927
2026-01-23 01:51:26,150 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [725.77765, 698.9607, 825.5699, 1061.9474, 773.1756, 1048.6948, 997.00574, 1077.3765, 1003.55695, 671.6227]
2026-01-23 01:51:26,150 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [136.0, 130.0, 160.0, 196.0, 145.0, 198.0, 187.0, 204.0, 188.0, 125.0]
2026-01-23 01:51:26,157 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 1 minute, 11 seconds)
2026-01-23 01:53:05,469 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:53:07,845 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 919.10693 ± 298.739
2026-01-23 01:53:07,845 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [970.9336, 1262.3507, 531.84216, 757.68896, 1015.94775, 1212.4318, 482.6607, 1306.9335, 1091.5479, 558.73206]
2026-01-23 01:53:07,845 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [206.0, 240.0, 117.0, 143.0, 190.0, 228.0, 107.0, 250.0, 210.0, 112.0]
2026-01-23 01:53:07,853 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 66/100 (estimated time remaining: 59 minutes, 25 seconds)
2026-01-23 01:54:49,902 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:54:53,828 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1581.27698 ± 364.692
2026-01-23 01:54:53,828 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1780.019, 1173.2303, 1315.1736, 1626.7278, 2160.7925, 1203.3898, 2160.4285, 1821.4706, 1334.3955, 1237.1418]
2026-01-23 01:54:53,828 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [336.0, 218.0, 255.0, 308.0, 428.0, 239.0, 422.0, 361.0, 258.0, 242.0]
2026-01-23 01:54:53,828 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (1581.28) for latency DatasetOffice
2026-01-23 01:55:35,888 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 2 minutes, 40 seconds)
2026-01-23 01:57:12,746 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:57:15,494 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1039.31958 ± 231.784
2026-01-23 01:57:15,494 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [800.3598, 1104.5483, 1340.9469, 951.5118, 1366.7089, 1365.0328, 903.55804, 945.05426, 928.8598, 686.615]
2026-01-23 01:57:15,494 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [162.0, 208.0, 260.0, 200.0, 277.0, 258.0, 171.0, 200.0, 183.0, 136.0]
2026-01-23 01:57:15,502 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 26 seconds)
2026-01-23 01:58:57,182 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:59:00,423 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1218.49805 ± 714.019
2026-01-23 01:59:00,424 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [547.1143, 1729.9342, 691.7841, 1839.2908, 442.7929, 933.44727, 2549.4229, 709.3567, 2064.8474, 676.98975]
2026-01-23 01:59:00,424 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [118.0, 336.0, 142.0, 359.0, 96.0, 188.0, 498.0, 146.0, 417.0, 146.0]
2026-01-23 01:59:00,430 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 69/100 (estimated time remaining: 59 minutes, 15 seconds)
2026-01-23 02:00:42,743 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:00:45,531 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1121.61292 ± 306.269
2026-01-23 02:00:45,531 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1225.1215, 1540.851, 611.97296, 1200.8182, 969.0117, 1202.6123, 694.20483, 1558.1031, 904.18915, 1309.2437]
2026-01-23 02:00:45,531 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [232.0, 290.0, 125.0, 226.0, 186.0, 232.0, 131.0, 306.0, 183.0, 249.0]
2026-01-23 02:00:45,538 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 70/100 (estimated time remaining: 57 minutes, 48 seconds)
2026-01-23 02:02:23,826 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:02:26,679 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1105.90723 ± 447.891
2026-01-23 02:02:26,679 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [799.7666, 2038.3992, 549.79626, 1647.4135, 611.1263, 1153.4934, 1289.1655, 1116.2147, 1145.2821, 708.41473]
2026-01-23 02:02:26,680 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [172.0, 400.0, 116.0, 316.0, 134.0, 217.0, 251.0, 211.0, 225.0, 147.0]
2026-01-23 02:02:26,688 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 71/100 (estimated time remaining: 55 minutes, 53 seconds)
2026-01-23 02:04:09,263 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:04:12,336 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1194.86694 ± 382.772
2026-01-23 02:04:12,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1486.3396, 2066.1704, 1160.0806, 1165.8887, 817.6032, 1082.7136, 1394.1346, 766.52576, 714.3643, 1294.8486]
2026-01-23 02:04:12,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [298.0, 399.0, 225.0, 228.0, 164.0, 207.0, 267.0, 159.0, 149.0, 242.0]
2026-01-23 02:04:12,345 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 72/100 (estimated time remaining: 49 minutes, 55 seconds)
2026-01-23 02:05:50,365 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:05:54,215 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1467.85510 ± 1306.763
2026-01-23 02:05:54,215 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1480.7909, 1381.2998, 4999.953, 511.10477, 189.90349, 828.01886, 1397.9048, 1084.7468, 543.70166, 2261.1274]
2026-01-23 02:05:54,215 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [287.0, 269.0, 1000.0, 97.0, 37.0, 150.0, 270.0, 202.0, 105.0, 452.0]
2026-01-23 02:05:54,227 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 73/100 (estimated time remaining: 48 minutes, 24 seconds)
2026-01-23 02:07:36,013 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:07:38,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1083.84424 ± 387.466
2026-01-23 02:07:38,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [883.63574, 1688.248, 1033.8658, 1404.655, 834.0927, 984.8523, 910.167, 589.747, 1795.7523, 713.4267]
2026-01-23 02:07:38,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [172.0, 342.0, 192.0, 275.0, 164.0, 183.0, 172.0, 107.0, 352.0, 129.0]
2026-01-23 02:07:38,678 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 74/100 (estimated time remaining: 46 minutes, 38 seconds)
2026-01-23 02:09:14,707 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:09:18,399 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1413.23267 ± 581.161
2026-01-23 02:09:18,399 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [591.8714, 1817.5774, 587.76636, 2037.0573, 1882.4613, 1020.0917, 998.8378, 1621.0538, 2334.8625, 1240.7478]
2026-01-23 02:09:18,399 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [120.0, 366.0, 116.0, 387.0, 366.0, 212.0, 213.0, 322.0, 448.0, 266.0]
2026-01-23 02:09:18,409 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 75/100 (estimated time remaining: 44 minutes, 26 seconds)
2026-01-23 02:10:57,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:11:00,447 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1169.81128 ± 449.127
2026-01-23 02:11:00,447 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [927.0685, 664.5779, 1088.0673, 619.3404, 1636.877, 1092.9421, 1106.874, 977.0734, 2212.317, 1372.9761]
2026-01-23 02:11:00,447 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [175.0, 125.0, 223.0, 130.0, 319.0, 210.0, 210.0, 184.0, 419.0, 261.0]
2026-01-23 02:11:00,460 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 76/100 (estimated time remaining: 42 minutes, 48 seconds)
2026-01-23 02:12:40,394 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:12:45,770 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1977.08459 ± 1447.697
2026-01-23 02:12:45,770 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1042.7106, 687.8075, 4943.349, 875.8774, 946.1736, 1932.9546, 615.677, 4167.7856, 2785.0518, 1773.4585]
2026-01-23 02:12:45,770 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [214.0, 144.0, 989.0, 165.0, 191.0, 369.0, 117.0, 811.0, 557.0, 363.0]
2026-01-23 02:12:45,770 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (1977.08) for latency DatasetOffice
2026-01-23 02:12:45,778 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 77/100 (estimated time remaining: 41 minutes, 4 seconds)
2026-01-23 02:14:28,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:14:34,401 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 2280.74072 ± 1257.742
2026-01-23 02:14:34,402 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1283.349, 1693.2153, 2856.6965, 2075.8467, 1168.4891, 1022.3548, 4079.7712, 4823.8813, 1097.6262, 2706.1772]
2026-01-23 02:14:34,402 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [252.0, 334.0, 579.0, 395.0, 220.0, 204.0, 830.0, 966.0, 219.0, 535.0]
2026-01-23 02:14:34,402 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (2280.74) for latency DatasetOffice
2026-01-23 02:14:34,410 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 78/100 (estimated time remaining: 39 minutes, 52 seconds)
2026-01-23 02:16:13,768 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:16:20,007 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 2303.88354 ± 1910.212
2026-01-23 02:16:20,007 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1009.8698, 1686.608, 753.2395, 5036.2183, 308.45007, 3413.4153, 451.6979, 4573.109, 5145.044, 661.18304]
2026-01-23 02:16:20,007 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [198.0, 325.0, 155.0, 1000.0, 62.0, 661.0, 100.0, 900.0, 1000.0, 133.0]
2026-01-23 02:16:20,007 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (2303.88) for latency DatasetOffice
2026-01-23 02:16:20,017 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 79/100 (estimated time remaining: 38 minutes, 13 seconds)
2026-01-23 02:18:00,791 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:18:05,306 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1753.11853 ± 615.740
2026-01-23 02:18:05,306 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1222.7249, 2364.604, 1326.4204, 1733.7301, 2210.7256, 1404.528, 1347.1425, 978.4281, 1828.858, 3114.0232]
2026-01-23 02:18:05,306 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [230.0, 463.0, 250.0, 325.0, 424.0, 269.0, 271.0, 188.0, 360.0, 623.0]
2026-01-23 02:18:05,315 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 80/100 (estimated time remaining: 36 minutes, 53 seconds)
2026-01-23 02:19:43,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:19:47,288 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1554.79175 ± 671.879
2026-01-23 02:19:47,288 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1746.1312, 995.7048, 2346.855, 2640.1406, 2005.7515, 1193.4675, 978.33295, 1619.3881, 1752.0267, 270.1194]
2026-01-23 02:19:47,288 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [341.0, 188.0, 469.0, 541.0, 408.0, 232.0, 187.0, 320.0, 345.0, 51.0]
2026-01-23 02:19:47,296 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 81/100 (estimated time remaining: 35 minutes, 7 seconds)
2026-01-23 02:21:23,269 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:21:29,778 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 2422.64087 ± 1359.509
2026-01-23 02:21:29,778 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1634.4647, 1517.0988, 1436.3053, 1818.7756, 1333.429, 2769.4802, 1721.3242, 1924.6648, 5080.3887, 4990.476]
2026-01-23 02:21:29,778 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [336.0, 285.0, 275.0, 351.0, 250.0, 536.0, 331.0, 378.0, 1000.0, 1000.0]
2026-01-23 02:21:29,778 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (2422.64) for latency DatasetOffice
2026-01-23 02:21:29,788 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 82/100 (estimated time remaining: 33 minutes, 11 seconds)
2026-01-23 02:23:08,105 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:23:14,505 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 2393.95361 ± 1074.036
2026-01-23 02:23:14,505 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1542.8761, 1958.683, 3299.8132, 1512.919, 1588.176, 1846.8778, 2012.1826, 3318.572, 4999.7637, 1859.672]
2026-01-23 02:23:14,505 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [312.0, 367.0, 652.0, 284.0, 308.0, 352.0, 387.0, 645.0, 1000.0, 356.0]
2026-01-23 02:23:14,515 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 83/100 (estimated time remaining: 31 minutes, 12 seconds)
2026-01-23 02:24:54,097 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:24:57,925 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1502.39722 ± 497.599
2026-01-23 02:24:57,925 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1017.3475, 1402.9991, 1822.257, 918.22754, 1489.6063, 1704.6085, 2543.7349, 751.4764, 1809.2538, 1564.4612]
2026-01-23 02:24:57,925 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [205.0, 272.0, 352.0, 173.0, 279.0, 319.0, 495.0, 144.0, 348.0, 310.0]
2026-01-23 02:24:57,940 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 84/100 (estimated time remaining: 29 minutes, 20 seconds)
2026-01-23 02:26:39,736 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:26:44,035 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1630.83142 ± 1417.958
2026-01-23 02:26:44,036 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [648.6229, 4327.69, 1789.0046, 692.61707, 374.2472, 580.4168, 202.5285, 3720.8826, 2886.0828, 1086.2219]
2026-01-23 02:26:44,036 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [129.0, 831.0, 349.0, 137.0, 69.0, 110.0, 40.0, 728.0, 563.0, 209.0]
2026-01-23 02:26:44,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 85/100 (estimated time remaining: 27 minutes, 39 seconds)
2026-01-23 02:28:27,232 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:28:33,981 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 2438.39722 ± 967.954
2026-01-23 02:28:33,981 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [729.0874, 1242.9038, 3715.72, 2224.181, 3156.2383, 3104.5884, 1691.3068, 3111.8408, 3487.9639, 1920.1401]
2026-01-23 02:28:33,981 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [159.0, 262.0, 746.0, 427.0, 629.0, 617.0, 327.0, 633.0, 695.0, 393.0]
2026-01-23 02:28:33,981 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (2438.40) for latency DatasetOffice
2026-01-23 02:28:33,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 86/100 (estimated time remaining: 26 minutes, 20 seconds)
2026-01-23 02:30:12,015 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:30:17,727 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 2197.71948 ± 758.192
2026-01-23 02:30:17,727 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1582.0421, 1436.049, 2577.201, 1238.394, 1900.5516, 1801.8317, 3169.3071, 1963.4897, 2583.3867, 3724.944]
2026-01-23 02:30:17,727 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [311.0, 274.0, 500.0, 235.0, 366.0, 358.0, 608.0, 385.0, 495.0, 721.0]
2026-01-23 02:30:17,736 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 87/100 (estimated time remaining: 24 minutes, 38 seconds)
2026-01-23 02:31:59,101 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:32:04,018 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1868.94495 ± 970.316
2026-01-23 02:32:04,018 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [955.9477, 1823.4481, 1639.906, 2606.4617, 4447.448, 985.636, 1409.4777, 1409.9923, 1904.3934, 1506.739]
2026-01-23 02:32:04,018 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [194.0, 358.0, 325.0, 494.0, 852.0, 190.0, 279.0, 287.0, 375.0, 293.0]
2026-01-23 02:32:04,026 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 88/100 (estimated time remaining: 22 minutes, 56 seconds)
2026-01-23 02:33:40,509 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:33:44,784 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1634.26196 ± 627.473
2026-01-23 02:33:44,784 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1614.37, 1348.4807, 1763.9839, 1355.6431, 3138.998, 1264.9503, 2167.6472, 727.04736, 1801.226, 1160.273]
2026-01-23 02:33:44,784 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [326.0, 268.0, 332.0, 263.0, 646.0, 237.0, 411.0, 154.0, 351.0, 228.0]
2026-01-23 02:33:44,794 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 89/100 (estimated time remaining: 21 minutes, 4 seconds)
2026-01-23 02:35:29,279 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:35:36,420 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 2681.60278 ± 1496.641
2026-01-23 02:35:36,420 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1632.8666, 1474.1793, 1916.5134, 1875.3334, 4621.406, 1035.7013, 5049.1997, 4923.8003, 2822.5967, 1464.4296]
2026-01-23 02:35:36,420 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [329.0, 283.0, 373.0, 360.0, 921.0, 193.0, 1000.0, 974.0, 548.0, 283.0]
2026-01-23 02:35:36,420 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (2681.60) for latency DatasetOffice
2026-01-23 02:35:36,429 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 90/100 (estimated time remaining: 19 minutes, 31 seconds)
2026-01-23 02:37:10,714 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:37:18,093 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 2741.19751 ± 1173.928
2026-01-23 02:37:18,093 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1871.2327, 2047.2206, 3588.1797, 1602.0387, 4982.4766, 1293.6927, 2226.5845, 2903.8706, 2419.3652, 4477.3125]
2026-01-23 02:37:18,093 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [373.0, 392.0, 700.0, 316.0, 1000.0, 251.0, 460.0, 585.0, 470.0, 860.0]
2026-01-23 02:37:18,093 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (2741.20) for latency DatasetOffice
2026-01-23 02:37:18,101 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 91/100 (estimated time remaining: 17 minutes, 28 seconds)
2026-01-23 02:38:58,077 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:39:00,619 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1015.20667 ± 1035.200
2026-01-23 02:39:00,619 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [2657.167, 2867.567, 1840.5645, 1333.8529, 84.32868, 359.87994, 615.0969, 80.595314, 188.60007, 124.413994]
2026-01-23 02:39:00,620 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [514.0, 558.0, 368.0, 258.0, 18.0, 71.0, 112.0, 17.0, 37.0, 24.0]
2026-01-23 02:39:00,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 92/100 (estimated time remaining: 15 minutes, 41 seconds)
2026-01-23 02:40:38,866 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:40:48,187 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 3249.17627 ± 1045.456
2026-01-23 02:40:48,187 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [2874.0923, 3086.2786, 1764.8636, 4207.678, 3541.0017, 2939.5835, 4784.09, 2566.7874, 1845.1758, 4882.2134]
2026-01-23 02:40:48,187 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [594.0, 608.0, 372.0, 868.0, 732.0, 599.0, 1000.0, 530.0, 389.0, 1000.0]
2026-01-23 02:40:48,187 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (3249.18) for latency DatasetOffice
2026-01-23 02:40:48,196 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 58 seconds)
2026-01-23 02:42:32,182 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:42:36,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1750.86499 ± 874.197
2026-01-23 02:42:36,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [937.7183, 1021.86475, 1775.7605, 1882.2751, 3852.9534, 1382.1554, 2544.4526, 1044.9193, 2105.1611, 961.3895]
2026-01-23 02:42:36,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [184.0, 213.0, 354.0, 366.0, 743.0, 260.0, 502.0, 196.0, 409.0, 199.0]
2026-01-23 02:42:36,699 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 94/100 (estimated time remaining: 12 minutes, 24 seconds)
2026-01-23 02:44:17,482 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:44:25,079 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 2815.61768 ± 1280.759
2026-01-23 02:44:25,079 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [3115.2114, 4124.0757, 1237.2391, 3592.564, 2489.9185, 2899.9768, 3018.5752, 4854.414, 2688.013, 136.18921]
2026-01-23 02:44:25,079 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [649.0, 797.0, 250.0, 745.0, 513.0, 557.0, 576.0, 1000.0, 529.0, 27.0]
2026-01-23 02:44:25,089 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 34 seconds)
2026-01-23 02:46:00,345 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:46:08,565 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 2981.66382 ± 1144.570
2026-01-23 02:46:08,565 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [2346.4204, 1731.5862, 2122.1467, 4967.206, 2540.9092, 4410.9697, 2627.2395, 4602.763, 1819.347, 2648.0493]
2026-01-23 02:46:08,565 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [458.0, 353.0, 438.0, 1000.0, 519.0, 895.0, 542.0, 943.0, 354.0, 528.0]
2026-01-23 02:46:08,576 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 50 seconds)
2026-01-23 02:47:45,226 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:47:55,734 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 3723.39722 ± 1344.340
2026-01-23 02:47:55,735 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1851.9274, 4574.1626, 2287.146, 2193.9824, 4902.6885, 4610.0947, 2035.2605, 4831.321, 4795.32, 5152.071]
2026-01-23 02:47:55,735 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [359.0, 952.0, 447.0, 455.0, 1000.0, 954.0, 405.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:47:55,735 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (3723.40) for latency DatasetOffice
2026-01-23 02:47:55,746 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 8 seconds)
2026-01-23 02:49:29,591 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:49:43,192 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 4714.06592 ± 578.497
2026-01-23 02:49:43,192 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [4951.6665, 4974.4873, 4916.922, 2992.9507, 4932.2383, 4901.881, 4723.4917, 4875.136, 5011.9575, 4859.929]
2026-01-23 02:49:43,192 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 589.0, 1000.0, 1000.0, 953.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:49:43,192 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (4714.07) for latency DatasetOffice
2026-01-23 02:49:43,202 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 21 seconds)
2026-01-23 02:51:30,004 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:51:39,232 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 3363.55225 ± 1273.030
2026-01-23 02:51:39,233 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [3939.4875, 4848.749, 4811.521, 3281.817, 2633.5369, 2827.595, 3056.1785, 647.9818, 2589.1008, 4999.5557]
2026-01-23 02:51:39,233 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [767.0, 1000.0, 977.0, 642.0, 539.0, 586.0, 585.0, 127.0, 531.0, 1000.0]
2026-01-23 02:51:39,244 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 37 seconds)
2026-01-23 02:53:18,444 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:53:27,368 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 3288.88428 ± 1259.410
2026-01-23 02:53:27,368 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [2769.3708, 1862.6461, 1738.3589, 5037.812, 2337.1875, 3923.0203, 1872.626, 4479.917, 5042.085, 3825.8186]
2026-01-23 02:53:27,368 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [551.0, 374.0, 340.0, 1000.0, 462.0, 761.0, 393.0, 896.0, 1000.0, 769.0]
2026-01-23 02:53:27,381 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 48 seconds)
2026-01-23 02:54:58,412 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:55:04,441 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 2322.73022 ± 1942.249
2026-01-23 02:55:04,441 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [5115.8735, 2716.114, 413.76773, 1204.6504, 5057.069, 2164.4995, 5040.877, 714.5701, 433.75693, 366.1236]
2026-01-23 02:55:04,441 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 514.0, 79.0, 227.0, 1000.0, 422.0, 1000.0, 130.0, 86.0, 75.0]
2026-01-23 02:55:04,452 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1299 [DEBUG]: Training session finished
