2026-01-22 23:54:16,951 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-humanoid/DatasetOffice-sac-aug-mem5 
2026-01-22 23:54:16,951 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-humanoid/DatasetOffice-sac-aug-mem5 
2026-01-22 23:54:16,951 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x14fa32213c90>}
2026-01-22 23:54:16,951 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1159 [DEBUG]: using device: cuda
2026-01-22 23:54:16,952 baseline-sac-noisy-humanoid:77 [WARNING]: args.memorize_actions != args.horizon: 5 != 32
2026-01-22 23:54:17,100 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1181 [INFO]: Creating new trainer
2026-01-22 23:54:17,118 baseline-sac-noisy-humanoid:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=461, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2026-01-22 23:54:17,119 baseline-sac-noisy-humanoid:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=478, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-22 23:54:18,143 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1242 [DEBUG]: Starting training session...
2026-01-22 23:54:18,143 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 1/100
2026-01-22 23:55:49,792 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:55:50,531 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 300.58029 ± 14.410
2026-01-22 23:55:50,531 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [292.24133, 300.81207, 292.1221, 297.72937, 302.88364, 334.07785, 291.1861, 281.16852, 318.14316, 295.43903]
2026-01-22 23:55:50,531 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [53.0, 55.0, 53.0, 54.0, 55.0, 61.0, 53.0, 51.0, 58.0, 54.0]
2026-01-22 23:55:50,531 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (300.58) for latency DatasetOffice
2026-01-22 23:55:50,537 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 32 minutes, 27 seconds)
2026-01-22 23:57:29,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:57:29,779 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 121.79021 ± 8.746
2026-01-22 23:57:29,779 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [112.579285, 109.944565, 116.15757, 121.75207, 128.14732, 119.63276, 121.189865, 126.90557, 119.088844, 142.5043]
2026-01-22 23:57:29,779 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [25.0, 24.0, 26.0, 27.0, 28.0, 27.0, 27.0, 28.0, 26.0, 31.0]
2026-01-22 23:57:29,783 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 36 minutes, 30 seconds)
2026-01-22 23:59:08,148 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:59:08,939 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 339.54538 ± 67.338
2026-01-22 23:59:08,939 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [321.615, 319.38104, 226.86073, 423.79413, 327.35526, 402.12363, 388.38727, 230.4749, 419.90753, 335.55444]
2026-01-22 23:59:08,939 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [59.0, 59.0, 43.0, 82.0, 69.0, 75.0, 73.0, 45.0, 79.0, 62.0]
2026-01-22 23:59:08,939 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (339.55) for latency DatasetOffice
2026-01-22 23:59:08,942 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 36 minutes, 42 seconds)
2026-01-23 00:00:48,117 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:00:48,733 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 275.80661 ± 27.433
2026-01-23 00:00:48,733 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [310.41464, 265.7418, 258.8419, 258.29797, 270.04398, 322.745, 314.15363, 241.6873, 249.94875, 266.19107]
2026-01-23 00:00:48,733 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [55.0, 49.0, 47.0, 47.0, 50.0, 59.0, 57.0, 44.0, 46.0, 49.0]
2026-01-23 00:00:48,744 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 36 minutes, 14 seconds)
2026-01-23 00:02:28,152 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:02:29,359 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 455.35162 ± 121.760
2026-01-23 00:02:29,359 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [751.54266, 331.9259, 446.797, 394.70285, 425.628, 488.2912, 540.4355, 507.02924, 335.3384, 331.82516]
2026-01-23 00:02:29,359 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [157.0, 67.0, 85.0, 77.0, 81.0, 105.0, 106.0, 111.0, 74.0, 73.0]
2026-01-23 00:02:29,359 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (455.35) for latency DatasetOffice
2026-01-23 00:02:29,362 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 35 minutes, 33 seconds)
2026-01-23 00:04:08,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:04:09,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 354.60117 ± 64.452
2026-01-23 00:04:09,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [312.16025, 388.19897, 377.82983, 279.47116, 319.1814, 519.2495, 339.94473, 300.57233, 376.2482, 333.15503]
2026-01-23 00:04:09,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [57.0, 72.0, 70.0, 54.0, 59.0, 100.0, 63.0, 55.0, 71.0, 61.0]
2026-01-23 00:04:09,334 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 36 minutes, 17 seconds)
2026-01-23 00:05:49,005 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:05:49,871 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 368.10602 ± 72.561
2026-01-23 00:05:49,871 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [282.0153, 429.4132, 369.89746, 477.00507, 435.81955, 428.46368, 252.41669, 332.80426, 385.47925, 287.74585]
2026-01-23 00:05:49,871 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [52.0, 81.0, 69.0, 91.0, 81.0, 79.0, 48.0, 62.0, 86.0, 53.0]
2026-01-23 00:05:49,876 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 35 minutes, 1 second)
2026-01-23 00:07:28,726 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:07:29,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 451.50626 ± 73.069
2026-01-23 00:07:29,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [463.68466, 513.75464, 548.72217, 425.46176, 476.26215, 357.45374, 447.65463, 412.44446, 552.4601, 317.16418]
2026-01-23 00:07:29,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [87.0, 99.0, 109.0, 87.0, 95.0, 75.0, 83.0, 76.0, 104.0, 59.0]
2026-01-23 00:07:29,815 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 33 minutes, 36 seconds)
2026-01-23 00:09:09,923 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:09:11,044 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 461.50937 ± 165.857
2026-01-23 00:09:11,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [385.50668, 504.26932, 403.82864, 428.16495, 423.91815, 939.6949, 354.1157, 337.60696, 454.12082, 383.86786]
2026-01-23 00:09:11,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [70.0, 106.0, 78.0, 82.0, 79.0, 187.0, 65.0, 71.0, 86.0, 79.0]
2026-01-23 00:09:11,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (461.51) for latency DatasetOffice
2026-01-23 00:09:11,049 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 32 minutes, 21 seconds)
2026-01-23 00:10:50,560 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:10:51,567 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 423.53687 ± 74.998
2026-01-23 00:10:51,567 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [423.17133, 366.51917, 414.8828, 564.3088, 425.64606, 300.4383, 453.59702, 348.54626, 408.87402, 529.38477]
2026-01-23 00:10:51,567 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [80.0, 71.0, 79.0, 106.0, 79.0, 64.0, 85.0, 65.0, 89.0, 99.0]
2026-01-23 00:10:51,571 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 30 minutes, 39 seconds)
2026-01-23 00:12:30,467 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:12:31,369 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 391.15326 ± 107.151
2026-01-23 00:12:31,369 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [459.4965, 537.2849, 347.84515, 284.9585, 270.22513, 396.70947, 461.47607, 191.46838, 479.5447, 482.5239]
2026-01-23 00:12:31,369 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [87.0, 99.0, 65.0, 56.0, 54.0, 77.0, 88.0, 39.0, 90.0, 90.0]
2026-01-23 00:12:31,374 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 28 minutes, 56 seconds)
2026-01-23 00:14:11,837 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:14:13,066 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 475.58041 ± 101.869
2026-01-23 00:14:13,066 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [574.4771, 580.42303, 416.4164, 557.6454, 348.16385, 582.03766, 436.8823, 569.87335, 326.56668, 363.3182]
2026-01-23 00:14:13,066 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [124.0, 119.0, 81.0, 112.0, 77.0, 121.0, 85.0, 110.0, 68.0, 77.0]
2026-01-23 00:14:13,066 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (475.58) for latency DatasetOffice
2026-01-23 00:14:13,069 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 27 minutes, 36 seconds)
2026-01-23 00:15:51,692 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:15:52,747 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 449.02557 ± 117.743
2026-01-23 00:15:52,747 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [317.05585, 250.89876, 490.67932, 451.54797, 416.09323, 343.82544, 503.9689, 570.6493, 473.68356, 671.85364]
2026-01-23 00:15:52,747 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [60.0, 49.0, 90.0, 92.0, 77.0, 77.0, 96.0, 105.0, 91.0, 130.0]
2026-01-23 00:15:52,751 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 25 minutes, 51 seconds)
2026-01-23 00:17:31,482 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:17:32,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 438.35953 ± 140.243
2026-01-23 00:17:32,586 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [362.54672, 443.12845, 365.29782, 820.01666, 384.6514, 416.7263, 394.6612, 533.7673, 367.90128, 294.89798]
2026-01-23 00:17:32,586 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [68.0, 94.0, 71.0, 173.0, 83.0, 94.0, 74.0, 105.0, 81.0, 58.0]
2026-01-23 00:17:32,591 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 23 minutes, 46 seconds)
2026-01-23 00:19:13,040 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:19:14,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 482.67725 ± 167.499
2026-01-23 00:19:14,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [505.2835, 348.55963, 930.4252, 474.82935, 352.7821, 421.45468, 380.00305, 612.43854, 397.86563, 403.13092]
2026-01-23 00:19:14,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [98.0, 71.0, 186.0, 88.0, 75.0, 91.0, 69.0, 119.0, 84.0, 84.0]
2026-01-23 00:19:14,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (482.68) for latency DatasetOffice
2026-01-23 00:19:14,232 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 22 minutes, 25 seconds)
2026-01-23 00:20:53,435 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:20:54,447 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 410.71881 ± 112.665
2026-01-23 00:20:54,448 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [456.82538, 219.25621, 459.52277, 639.26874, 425.06616, 521.2644, 313.0069, 357.91504, 396.9828, 318.07953]
2026-01-23 00:20:54,448 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [87.0, 46.0, 90.0, 122.0, 95.0, 104.0, 59.0, 81.0, 76.0, 60.0]
2026-01-23 00:20:54,453 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 20 minutes, 51 seconds)
2026-01-23 00:22:34,295 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:22:35,099 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 349.48187 ± 75.852
2026-01-23 00:22:35,099 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [447.27014, 394.23505, 345.55817, 515.0934, 302.65018, 305.25107, 286.3633, 256.58475, 312.80023, 329.0125]
2026-01-23 00:22:35,099 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [87.0, 76.0, 66.0, 100.0, 59.0, 60.0, 56.0, 50.0, 60.0, 63.0]
2026-01-23 00:22:35,103 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 18 minutes, 53 seconds)
2026-01-23 00:24:15,169 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:24:16,441 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 484.89069 ± 98.688
2026-01-23 00:24:16,441 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [550.51385, 488.9934, 406.38162, 534.63525, 300.39896, 541.40283, 603.0839, 478.2537, 344.39136, 600.85187]
2026-01-23 00:24:16,441 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [109.0, 91.0, 92.0, 106.0, 58.0, 114.0, 132.0, 108.0, 76.0, 125.0]
2026-01-23 00:24:16,441 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (484.89) for latency DatasetOffice
2026-01-23 00:24:16,446 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 17 minutes, 40 seconds)
2026-01-23 00:25:55,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:25:56,766 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 451.09732 ± 141.507
2026-01-23 00:25:56,766 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [277.87408, 383.0571, 457.7107, 540.9767, 801.6701, 335.1598, 377.36035, 543.83826, 420.9278, 372.39832]
2026-01-23 00:25:56,766 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [55.0, 75.0, 101.0, 104.0, 156.0, 68.0, 70.0, 106.0, 81.0, 68.0]
2026-01-23 00:25:56,774 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 16 minutes, 7 seconds)
2026-01-23 00:27:37,299 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:27:38,352 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 436.05396 ± 79.042
2026-01-23 00:27:38,353 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [494.3062, 325.5956, 391.79556, 389.88593, 468.62756, 602.9537, 463.0581, 342.4234, 487.4822, 394.4113]
2026-01-23 00:27:38,353 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [92.0, 63.0, 74.0, 71.0, 91.0, 115.0, 88.0, 63.0, 89.0, 72.0]
2026-01-23 00:27:38,359 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 14 minutes, 26 seconds)
2026-01-23 00:29:18,136 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:29:19,038 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 383.09860 ± 52.747
2026-01-23 00:29:19,038 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [490.06293, 334.85876, 340.77512, 362.5345, 372.40973, 367.8122, 366.7442, 443.8, 436.9833, 315.0053]
2026-01-23 00:29:19,038 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [97.0, 63.0, 69.0, 69.0, 74.0, 71.0, 77.0, 83.0, 82.0, 61.0]
2026-01-23 00:29:19,042 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 12 minutes, 52 seconds)
2026-01-23 00:30:58,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:30:59,213 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 420.52670 ± 92.290
2026-01-23 00:30:59,213 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [374.584, 311.52283, 422.5724, 576.274, 428.622, 328.38104, 381.30338, 466.0983, 584.3407, 331.5684]
2026-01-23 00:30:59,213 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [82.0, 66.0, 85.0, 111.0, 85.0, 75.0, 85.0, 93.0, 129.0, 66.0]
2026-01-23 00:30:59,217 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 11 minutes, 4 seconds)
2026-01-23 00:32:38,809 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:32:39,682 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 384.93823 ± 61.770
2026-01-23 00:32:39,683 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [383.89926, 358.66638, 312.95453, 393.18497, 330.64267, 325.49194, 441.59467, 517.5971, 441.68872, 343.66187]
2026-01-23 00:32:39,683 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [71.0, 65.0, 57.0, 73.0, 61.0, 61.0, 81.0, 97.0, 81.0, 63.0]
2026-01-23 00:32:39,686 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 9 minutes, 9 seconds)
2026-01-23 00:34:17,961 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:34:18,891 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 389.31955 ± 53.131
2026-01-23 00:34:18,891 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [392.4512, 377.76968, 275.18344, 415.69342, 427.91037, 458.4789, 387.38, 431.75217, 313.63452, 412.94183]
2026-01-23 00:34:18,891 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [74.0, 70.0, 54.0, 75.0, 81.0, 87.0, 72.0, 90.0, 64.0, 83.0]
2026-01-23 00:34:18,896 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 7 minutes, 12 seconds)
2026-01-23 00:35:57,747 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:35:58,855 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 466.05072 ± 114.072
2026-01-23 00:35:58,855 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [638.52716, 369.9491, 327.8784, 456.63388, 554.29315, 384.25415, 364.41, 426.57254, 679.44293, 458.5457]
2026-01-23 00:35:58,855 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [127.0, 69.0, 63.0, 85.0, 103.0, 73.0, 68.0, 78.0, 129.0, 85.0]
2026-01-23 00:35:58,860 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 5 minutes, 7 seconds)
2026-01-23 00:37:37,414 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:37:38,435 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 423.29135 ± 68.923
2026-01-23 00:37:38,435 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [275.19418, 396.15668, 457.50607, 423.40027, 446.97513, 480.8675, 553.372, 387.2524, 428.19785, 383.9919]
2026-01-23 00:37:38,435 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [58.0, 79.0, 88.0, 78.0, 83.0, 102.0, 105.0, 72.0, 82.0, 74.0]
2026-01-23 00:37:38,441 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 3 minutes, 11 seconds)
2026-01-23 00:39:17,527 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:39:18,712 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 452.44370 ± 84.667
2026-01-23 00:39:18,712 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [475.44385, 306.99045, 445.8546, 554.3958, 360.67337, 431.99014, 589.0989, 542.7683, 413.388, 403.8334]
2026-01-23 00:39:18,712 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [105.0, 67.0, 96.0, 106.0, 77.0, 92.0, 111.0, 101.0, 97.0, 82.0]
2026-01-23 00:39:18,717 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 1 minute, 32 seconds)
2026-01-23 00:40:57,418 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:40:58,302 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 370.68231 ± 167.951
2026-01-23 00:40:58,302 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [542.03284, 341.331, 459.5939, 551.90515, 603.8363, 140.59439, 130.83815, 162.7318, 351.5911, 422.3687]
2026-01-23 00:40:58,302 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [108.0, 63.0, 89.0, 106.0, 134.0, 27.0, 25.0, 31.0, 65.0, 81.0]
2026-01-23 00:40:58,307 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 29/100 (estimated time remaining: 1 hour, 59 minutes, 40 seconds)
2026-01-23 00:42:37,085 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:42:37,905 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 360.62680 ± 39.791
2026-01-23 00:42:37,905 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [372.83545, 339.08524, 329.4843, 338.87338, 310.5664, 385.42096, 367.3956, 387.1809, 452.66605, 322.7597]
2026-01-23 00:42:37,905 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [81.0, 62.0, 60.0, 61.0, 57.0, 69.0, 66.0, 71.0, 84.0, 58.0]
2026-01-23 00:42:37,910 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 30/100 (estimated time remaining: 1 hour, 58 minutes, 5 seconds)
2026-01-23 00:44:16,253 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:44:17,581 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 537.36615 ± 59.788
2026-01-23 00:44:17,581 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [545.2748, 475.75064, 547.4149, 573.74176, 551.02313, 496.19168, 405.608, 561.7751, 592.2951, 624.58624]
2026-01-23 00:44:17,581 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [107.0, 94.0, 105.0, 108.0, 117.0, 98.0, 76.0, 114.0, 124.0, 120.0]
2026-01-23 00:44:17,581 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (537.37) for latency DatasetOffice
2026-01-23 00:44:17,587 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 56 minutes, 22 seconds)
2026-01-23 00:45:55,635 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:45:56,799 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 474.47369 ± 149.000
2026-01-23 00:45:56,799 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [384.5105, 356.68362, 490.0546, 239.14124, 383.42682, 531.58044, 737.0387, 362.6965, 589.92865, 669.6756]
2026-01-23 00:45:56,799 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [85.0, 72.0, 92.0, 49.0, 73.0, 104.0, 145.0, 76.0, 114.0, 130.0]
2026-01-23 00:45:56,805 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 54 minutes, 37 seconds)
2026-01-23 00:47:35,364 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:47:36,479 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 450.58124 ± 55.470
2026-01-23 00:47:36,479 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [362.02, 518.37, 464.9844, 379.81577, 384.78812, 483.17087, 521.3389, 431.28333, 462.5188, 497.52246]
2026-01-23 00:47:36,479 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [68.0, 98.0, 93.0, 82.0, 75.0, 90.0, 98.0, 86.0, 99.0, 101.0]
2026-01-23 00:47:36,485 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 52 minutes, 49 seconds)
2026-01-23 00:49:16,187 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:49:17,501 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 548.93341 ± 104.291
2026-01-23 00:49:17,501 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [569.846, 404.79755, 429.46857, 696.587, 612.0813, 441.95697, 464.29123, 553.706, 608.59924, 707.99976]
2026-01-23 00:49:17,501 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [108.0, 76.0, 78.0, 131.0, 119.0, 81.0, 88.0, 103.0, 118.0, 145.0]
2026-01-23 00:49:17,502 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (548.93) for latency DatasetOffice
2026-01-23 00:49:17,506 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 51 minutes, 29 seconds)
2026-01-23 00:50:55,274 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:50:56,597 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 509.73300 ± 113.320
2026-01-23 00:50:56,597 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [520.2664, 730.6205, 614.9733, 471.5579, 539.2746, 528.96783, 583.01666, 348.8773, 360.24808, 399.5275]
2026-01-23 00:50:56,597 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [99.0, 155.0, 137.0, 89.0, 113.0, 108.0, 126.0, 67.0, 82.0, 87.0]
2026-01-23 00:50:56,605 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 49 minutes, 42 seconds)
2026-01-23 00:52:35,186 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:52:36,218 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 407.69562 ± 130.397
2026-01-23 00:52:36,218 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [763.30505, 398.36285, 289.4637, 349.591, 286.76135, 370.00626, 351.01007, 483.86768, 375.0088, 409.57947]
2026-01-23 00:52:36,218 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [168.0, 76.0, 57.0, 64.0, 54.0, 72.0, 67.0, 95.0, 73.0, 80.0]
2026-01-23 00:52:36,223 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 48 minutes, 2 seconds)
2026-01-23 00:54:16,164 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:54:17,426 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 488.45419 ± 105.954
2026-01-23 00:54:17,427 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [292.95117, 471.11642, 550.9014, 587.341, 291.4633, 569.9516, 530.9159, 595.89984, 473.3287, 520.6728]
2026-01-23 00:54:17,427 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [64.0, 95.0, 102.0, 114.0, 63.0, 123.0, 111.0, 115.0, 96.0, 106.0]
2026-01-23 00:54:17,432 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 46 minutes, 48 seconds)
2026-01-23 00:55:57,293 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:55:58,529 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 478.13495 ± 71.084
2026-01-23 00:55:58,529 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [513.7027, 600.1199, 508.93076, 551.5116, 483.38602, 456.35046, 477.59222, 449.1451, 328.74292, 411.86807]
2026-01-23 00:55:58,529 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [96.0, 127.0, 110.0, 106.0, 91.0, 90.0, 99.0, 83.0, 70.0, 87.0]
2026-01-23 00:55:58,537 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 45 minutes, 25 seconds)
2026-01-23 00:57:38,112 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:57:39,104 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 414.95453 ± 71.850
2026-01-23 00:57:39,104 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [403.2418, 424.395, 388.2447, 324.9936, 386.19565, 401.36823, 465.40817, 328.09592, 591.0953, 436.50732]
2026-01-23 00:57:39,104 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [74.0, 78.0, 73.0, 60.0, 71.0, 74.0, 86.0, 61.0, 118.0, 79.0]
2026-01-23 00:57:39,110 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 43 minutes, 39 seconds)
2026-01-23 00:59:18,593 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:59:19,541 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 407.69034 ± 64.506
2026-01-23 00:59:19,541 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [330.4323, 490.5208, 340.3236, 471.8429, 359.5078, 361.84506, 394.93594, 355.0189, 500.85135, 471.6248]
2026-01-23 00:59:19,541 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [62.0, 93.0, 66.0, 87.0, 66.0, 67.0, 74.0, 67.0, 94.0, 88.0]
2026-01-23 00:59:19,552 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 42 minutes, 15 seconds)
2026-01-23 01:00:59,384 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:01:00,292 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 381.45935 ± 73.218
2026-01-23 01:01:00,292 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [577.09814, 436.8232, 398.52695, 351.30496, 346.3791, 367.54633, 322.5336, 354.38806, 338.44943, 321.5436]
2026-01-23 01:01:00,292 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [116.0, 79.0, 74.0, 65.0, 65.0, 67.0, 61.0, 65.0, 62.0, 60.0]
2026-01-23 01:01:00,298 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 40 minutes, 48 seconds)
2026-01-23 01:02:39,903 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:02:41,138 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 504.13467 ± 126.700
2026-01-23 01:02:41,138 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [404.69962, 802.45654, 351.71338, 643.11945, 514.09204, 464.32788, 426.40652, 400.24966, 528.0975, 506.18396]
2026-01-23 01:02:41,138 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [74.0, 168.0, 67.0, 128.0, 99.0, 101.0, 81.0, 77.0, 100.0, 97.0]
2026-01-23 01:02:41,143 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 39 minutes, 3 seconds)
2026-01-23 01:04:21,008 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:04:22,478 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 565.91980 ± 121.064
2026-01-23 01:04:22,479 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [762.92255, 387.7599, 565.005, 511.13806, 568.6592, 704.6449, 605.5104, 354.22174, 650.1523, 549.1846]
2026-01-23 01:04:22,479 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [161.0, 73.0, 107.0, 95.0, 114.0, 150.0, 114.0, 68.0, 145.0, 114.0]
2026-01-23 01:04:22,479 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (565.92) for latency DatasetOffice
2026-01-23 01:04:22,491 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 37 minutes, 25 seconds)
2026-01-23 01:06:01,370 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:06:03,052 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 627.93604 ± 156.165
2026-01-23 01:06:03,052 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [546.5384, 914.8335, 542.20734, 674.29193, 474.68854, 555.5156, 515.6239, 628.0186, 502.46436, 925.1781]
2026-01-23 01:06:03,052 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [117.0, 179.0, 111.0, 132.0, 95.0, 116.0, 108.0, 134.0, 105.0, 182.0]
2026-01-23 01:06:03,052 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (627.94) for latency DatasetOffice
2026-01-23 01:06:03,058 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 35 minutes, 44 seconds)
2026-01-23 01:07:43,164 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:07:44,657 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 580.63348 ± 180.420
2026-01-23 01:07:44,657 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [398.80515, 507.04092, 680.89606, 1065.7642, 551.9399, 491.56754, 482.13696, 430.589, 597.71106, 599.88416]
2026-01-23 01:07:44,657 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [80.0, 97.0, 133.0, 210.0, 107.0, 102.0, 99.0, 83.0, 116.0, 114.0]
2026-01-23 01:07:44,662 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 34 minutes, 17 seconds)
2026-01-23 01:09:24,317 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:09:25,636 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 514.44598 ± 149.329
2026-01-23 01:09:25,636 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [878.5814, 516.6564, 368.338, 401.1241, 681.9305, 547.07245, 435.23825, 472.01804, 451.97205, 391.52887]
2026-01-23 01:09:25,636 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [173.0, 106.0, 70.0, 74.0, 130.0, 108.0, 81.0, 91.0, 95.0, 73.0]
2026-01-23 01:09:25,643 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 32 minutes, 38 seconds)
2026-01-23 01:11:05,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:11:07,152 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 568.06793 ± 173.758
2026-01-23 01:11:07,153 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [302.20676, 729.36346, 446.51993, 442.8855, 507.39185, 578.47144, 617.3221, 550.8199, 976.4254, 529.27295]
2026-01-23 01:11:07,153 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [64.0, 142.0, 90.0, 87.0, 106.0, 124.0, 122.0, 113.0, 202.0, 103.0]
2026-01-23 01:11:07,159 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 31 minutes, 4 seconds)
2026-01-23 01:12:47,818 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:12:49,465 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 626.90271 ± 238.585
2026-01-23 01:12:49,466 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [544.7581, 1222.118, 622.27856, 524.55756, 476.06842, 625.9039, 483.7615, 484.5661, 901.07544, 383.93958]
2026-01-23 01:12:49,466 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [102.0, 248.0, 120.0, 99.0, 89.0, 120.0, 92.0, 92.0, 192.0, 72.0]
2026-01-23 01:12:49,473 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 29 minutes, 34 seconds)
2026-01-23 01:14:29,050 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:14:30,396 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 536.93823 ± 94.598
2026-01-23 01:14:30,397 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [447.3491, 590.05743, 407.54706, 704.96265, 502.72003, 590.11224, 479.88446, 456.56766, 513.88324, 676.29834]
2026-01-23 01:14:30,397 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [84.0, 114.0, 77.0, 136.0, 101.0, 111.0, 92.0, 87.0, 103.0, 143.0]
2026-01-23 01:14:30,404 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 27 minutes, 56 seconds)
2026-01-23 01:16:10,209 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:16:11,982 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 679.17859 ± 170.356
2026-01-23 01:16:11,983 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [643.6331, 783.4659, 482.53726, 628.72156, 847.1627, 640.37213, 542.3629, 489.13022, 665.2196, 1069.1809]
2026-01-23 01:16:11,983 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [139.0, 152.0, 97.0, 120.0, 177.0, 123.0, 110.0, 92.0, 127.0, 213.0]
2026-01-23 01:16:11,983 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (679.18) for latency DatasetOffice
2026-01-23 01:16:11,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 26 minutes, 14 seconds)
2026-01-23 01:17:52,434 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:17:54,220 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 654.49377 ± 133.437
2026-01-23 01:17:54,220 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [762.36694, 885.6099, 539.9991, 657.5936, 832.5544, 663.3601, 540.28186, 539.7236, 672.79034, 450.65836]
2026-01-23 01:17:54,220 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [162.0, 188.0, 113.0, 132.0, 165.0, 126.0, 103.0, 104.0, 147.0, 93.0]
2026-01-23 01:17:54,229 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 24 minutes, 45 seconds)
2026-01-23 01:19:34,147 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:19:35,670 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 574.30353 ± 121.521
2026-01-23 01:19:35,670 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [390.8071, 477.35324, 745.40967, 521.66565, 485.14322, 631.8996, 652.76794, 428.98862, 696.9862, 712.01385]
2026-01-23 01:19:35,670 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [85.0, 94.0, 149.0, 100.0, 106.0, 122.0, 131.0, 82.0, 143.0, 138.0]
2026-01-23 01:19:35,678 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 23 minutes, 3 seconds)
2026-01-23 01:21:16,008 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:21:17,705 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 658.69122 ± 135.510
2026-01-23 01:21:17,706 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [598.6949, 564.6692, 625.95746, 622.5588, 993.76184, 765.7632, 743.8238, 623.9431, 535.40924, 512.3307]
2026-01-23 01:21:17,706 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [112.0, 107.0, 119.0, 123.0, 194.0, 158.0, 148.0, 119.0, 100.0, 111.0]
2026-01-23 01:21:17,711 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 21 minutes, 19 seconds)
2026-01-23 01:22:58,685 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:23:00,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 710.84778 ± 197.725
2026-01-23 01:23:00,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [594.266, 626.44574, 549.718, 823.6887, 808.21747, 806.4738, 1177.3766, 703.9349, 592.47864, 425.87808]
2026-01-23 01:23:00,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [117.0, 119.0, 106.0, 164.0, 156.0, 156.0, 237.0, 149.0, 116.0, 85.0]
2026-01-23 01:23:00,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (710.85) for latency DatasetOffice
2026-01-23 01:23:00,582 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 19 minutes, 55 seconds)
2026-01-23 01:24:40,399 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:24:42,257 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 659.32520 ± 119.207
2026-01-23 01:24:42,257 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [451.81302, 604.78143, 820.1127, 598.2161, 606.85535, 844.46454, 714.69946, 781.42474, 568.73016, 602.1547]
2026-01-23 01:24:42,257 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [95.0, 115.0, 162.0, 114.0, 133.0, 184.0, 159.0, 169.0, 126.0, 135.0]
2026-01-23 01:24:42,262 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 18 minutes, 14 seconds)
2026-01-23 01:26:22,446 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:26:24,077 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 607.24603 ± 146.370
2026-01-23 01:26:24,077 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [623.3393, 668.9485, 586.53314, 501.10883, 453.71445, 949.77405, 598.9262, 444.36774, 746.35657, 499.3921]
2026-01-23 01:26:24,077 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [124.0, 139.0, 112.0, 102.0, 88.0, 197.0, 114.0, 96.0, 161.0, 96.0]
2026-01-23 01:26:24,082 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 16 minutes, 28 seconds)
2026-01-23 01:28:04,449 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:28:06,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 689.25891 ± 103.543
2026-01-23 01:28:06,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [678.3665, 793.66296, 752.2541, 553.9924, 604.10895, 517.6479, 756.04407, 868.20404, 665.5384, 702.76965]
2026-01-23 01:28:06,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [134.0, 169.0, 160.0, 105.0, 117.0, 97.0, 160.0, 188.0, 142.0, 152.0]
2026-01-23 01:28:06,361 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 14 minutes, 54 seconds)
2026-01-23 01:29:45,475 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:29:47,344 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 687.27551 ± 238.610
2026-01-23 01:29:47,344 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [522.5635, 597.43555, 554.22546, 482.28702, 607.685, 601.8793, 1025.6575, 573.4399, 1263.0454, 644.53674]
2026-01-23 01:29:47,344 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [99.0, 116.0, 106.0, 94.0, 117.0, 115.0, 210.0, 109.0, 261.0, 135.0]
2026-01-23 01:29:47,351 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 13 minutes, 2 seconds)
2026-01-23 01:31:27,430 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:31:29,198 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 656.72559 ± 118.352
2026-01-23 01:31:29,198 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [695.8776, 873.6523, 658.47644, 683.1026, 575.2035, 779.237, 611.9431, 399.86334, 624.1283, 665.7717]
2026-01-23 01:31:29,198 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [135.0, 172.0, 137.0, 133.0, 120.0, 153.0, 125.0, 75.0, 119.0, 136.0]
2026-01-23 01:31:29,207 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 11 minutes, 12 seconds)
2026-01-23 01:33:09,940 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:33:11,810 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 684.14392 ± 150.418
2026-01-23 01:33:11,810 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [869.52313, 665.9348, 632.9544, 426.5783, 640.3112, 556.42773, 916.447, 624.2657, 618.6396, 890.3574]
2026-01-23 01:33:11,810 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [176.0, 127.0, 135.0, 80.0, 129.0, 109.0, 190.0, 120.0, 116.0, 179.0]
2026-01-23 01:33:11,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 9 minutes, 38 seconds)
2026-01-23 01:34:51,553 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:34:53,027 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 573.69464 ± 183.663
2026-01-23 01:34:53,027 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [508.5919, 543.00977, 479.27432, 467.92297, 1055.8296, 405.4709, 483.61975, 564.16565, 755.50366, 473.55746]
2026-01-23 01:34:53,027 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [97.0, 104.0, 89.0, 86.0, 213.0, 76.0, 90.0, 110.0, 147.0, 92.0]
2026-01-23 01:34:53,033 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 7 minutes, 51 seconds)
2026-01-23 01:36:33,725 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:36:35,428 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 639.40027 ± 78.965
2026-01-23 01:36:35,428 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [560.3226, 608.0408, 518.9763, 666.5519, 594.6825, 610.5908, 674.2312, 619.22156, 792.5266, 748.8579]
2026-01-23 01:36:35,428 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [105.0, 125.0, 106.0, 128.0, 112.0, 125.0, 128.0, 119.0, 158.0, 150.0]
2026-01-23 01:36:35,434 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 6 minutes, 10 seconds)
2026-01-23 01:38:15,880 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:38:17,645 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 653.24353 ± 96.126
2026-01-23 01:38:17,645 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [544.70337, 614.01025, 643.3073, 883.0818, 670.9286, 627.0814, 624.72815, 670.87335, 733.268, 520.45294]
2026-01-23 01:38:17,645 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [103.0, 122.0, 131.0, 175.0, 136.0, 122.0, 123.0, 131.0, 144.0, 106.0]
2026-01-23 01:38:17,652 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 4 minutes, 38 seconds)
2026-01-23 01:39:57,351 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:39:59,048 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 613.83264 ± 117.024
2026-01-23 01:39:59,049 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [593.96265, 611.0639, 708.1739, 713.1482, 473.18588, 678.7936, 535.32275, 458.93747, 847.3241, 518.4142]
2026-01-23 01:39:59,049 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [111.0, 118.0, 138.0, 143.0, 93.0, 133.0, 117.0, 99.0, 181.0, 102.0]
2026-01-23 01:39:59,055 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 2 minutes, 52 seconds)
2026-01-23 01:41:40,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:41:42,144 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 645.57288 ± 121.838
2026-01-23 01:41:42,144 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [625.2604, 491.6187, 484.93253, 640.7596, 702.05695, 835.01447, 525.2398, 593.2805, 720.82446, 836.7415]
2026-01-23 01:41:42,144 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [121.0, 101.0, 93.0, 130.0, 141.0, 165.0, 106.0, 116.0, 154.0, 170.0]
2026-01-23 01:41:42,151 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 1 minute, 14 seconds)
2026-01-23 01:43:22,778 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:43:24,799 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 706.93170 ± 222.207
2026-01-23 01:43:24,799 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [686.61316, 839.76105, 997.5319, 475.97894, 888.88873, 436.08044, 683.5039, 472.96564, 1077.7173, 510.27533]
2026-01-23 01:43:24,799 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [134.0, 167.0, 210.0, 92.0, 178.0, 81.0, 134.0, 94.0, 219.0, 111.0]
2026-01-23 01:43:24,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 66/100 (estimated time remaining: 59 minutes, 42 seconds)
2026-01-23 01:45:05,065 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:45:06,904 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 649.41345 ± 130.262
2026-01-23 01:45:06,904 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [633.43304, 528.9642, 614.1809, 607.3338, 561.76044, 631.26196, 600.967, 534.90717, 813.0958, 968.23065]
2026-01-23 01:45:06,904 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [126.0, 113.0, 119.0, 120.0, 112.0, 122.0, 117.0, 104.0, 169.0, 196.0]
2026-01-23 01:45:06,911 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 67/100 (estimated time remaining: 57 minutes, 58 seconds)
2026-01-23 01:46:46,962 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:46:48,978 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 707.24725 ± 180.716
2026-01-23 01:46:48,978 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [484.00934, 717.98566, 679.0754, 643.6845, 636.0771, 1209.9987, 733.54266, 631.4448, 620.85535, 715.7986]
2026-01-23 01:46:48,978 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [92.0, 137.0, 134.0, 125.0, 129.0, 245.0, 145.0, 138.0, 122.0, 145.0]
2026-01-23 01:46:48,985 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 68/100 (estimated time remaining: 56 minutes, 14 seconds)
2026-01-23 01:48:29,875 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:48:31,646 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 638.26971 ± 95.992
2026-01-23 01:48:31,646 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [601.124, 524.9117, 811.90625, 774.63776, 656.9419, 606.3661, 725.96454, 545.3687, 540.0558, 595.42096]
2026-01-23 01:48:31,646 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [115.0, 100.0, 160.0, 153.0, 130.0, 118.0, 154.0, 109.0, 110.0, 118.0]
2026-01-23 01:48:31,656 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 69/100 (estimated time remaining: 54 minutes, 40 seconds)
2026-01-23 01:50:13,152 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:50:15,323 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 750.57135 ± 128.747
2026-01-23 01:50:15,323 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [642.9115, 970.6459, 867.9429, 659.64545, 827.6023, 777.2726, 657.33246, 899.52045, 572.49927, 630.3404]
2026-01-23 01:50:15,324 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [130.0, 202.0, 181.0, 132.0, 165.0, 164.0, 129.0, 187.0, 111.0, 128.0]
2026-01-23 01:50:15,324 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (750.57) for latency DatasetOffice
2026-01-23 01:50:15,330 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 70/100 (estimated time remaining: 53 minutes, 1 second)
2026-01-23 01:51:55,845 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:51:58,184 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 794.16223 ± 167.292
2026-01-23 01:51:58,184 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [990.06604, 757.49585, 706.9416, 676.2114, 669.70416, 1174.519, 806.9861, 752.48663, 561.8986, 845.3136]
2026-01-23 01:51:58,184 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [202.0, 151.0, 142.0, 134.0, 128.0, 240.0, 163.0, 148.0, 114.0, 171.0]
2026-01-23 01:51:58,184 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (794.16) for latency DatasetOffice
2026-01-23 01:51:58,190 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 71/100 (estimated time remaining: 51 minutes, 20 seconds)
2026-01-23 01:53:40,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:53:41,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 663.22943 ± 128.536
2026-01-23 01:53:41,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [679.422, 537.4914, 597.43286, 529.2169, 869.7633, 877.7641, 641.78674, 660.5152, 489.3124, 749.58954]
2026-01-23 01:53:41,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [136.0, 103.0, 122.0, 108.0, 174.0, 182.0, 133.0, 128.0, 104.0, 147.0]
2026-01-23 01:53:41,995 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 72/100 (estimated time remaining: 49 minutes, 47 seconds)
2026-01-23 01:55:25,603 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:55:27,320 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 619.56500 ± 83.956
2026-01-23 01:55:27,320 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [546.91644, 606.3979, 565.7293, 784.0643, 499.18204, 705.0313, 595.5566, 571.0544, 719.7643, 601.9538]
2026-01-23 01:55:27,320 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [107.0, 126.0, 109.0, 158.0, 102.0, 139.0, 117.0, 120.0, 143.0, 114.0]
2026-01-23 01:55:27,330 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 73/100 (estimated time remaining: 48 minutes, 22 seconds)
2026-01-23 01:57:09,210 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:57:11,604 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 827.59619 ± 385.106
2026-01-23 01:57:11,604 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1261.2388, 582.5649, 682.91376, 1746.0371, 713.3381, 503.66788, 625.6274, 550.9032, 1063.8138, 545.85675]
2026-01-23 01:57:11,604 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [265.0, 113.0, 132.0, 369.0, 144.0, 98.0, 121.0, 108.0, 215.0, 105.0]
2026-01-23 01:57:11,604 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (827.60) for latency DatasetOffice
2026-01-23 01:57:11,611 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 74/100 (estimated time remaining: 46 minutes, 47 seconds)
2026-01-23 01:58:52,288 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:58:54,467 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 741.05988 ± 149.586
2026-01-23 01:58:54,467 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [890.195, 533.7578, 557.9156, 781.01984, 928.42596, 756.00385, 698.91003, 977.6541, 568.4174, 718.299]
2026-01-23 01:58:54,467 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [179.0, 102.0, 111.0, 156.0, 188.0, 152.0, 140.0, 204.0, 110.0, 150.0]
2026-01-23 01:58:54,474 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 75/100 (estimated time remaining: 44 minutes, 59 seconds)
2026-01-23 02:00:36,319 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:00:38,422 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 711.77100 ± 154.201
2026-01-23 02:00:38,422 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1087.2385, 805.67566, 772.4787, 511.07727, 527.92737, 731.285, 711.36273, 666.5774, 670.2903, 633.7969]
2026-01-23 02:00:38,422 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [216.0, 157.0, 157.0, 101.0, 106.0, 145.0, 147.0, 134.0, 133.0, 129.0]
2026-01-23 02:00:38,429 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 76/100 (estimated time remaining: 43 minutes, 21 seconds)
2026-01-23 02:02:18,614 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:02:20,498 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 680.54700 ± 193.304
2026-01-23 02:02:20,498 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [790.17267, 500.2394, 791.1346, 463.66577, 810.3327, 583.61304, 894.7517, 406.39316, 1008.15247, 557.01404]
2026-01-23 02:02:20,498 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [164.0, 95.0, 154.0, 89.0, 170.0, 118.0, 180.0, 84.0, 203.0, 113.0]
2026-01-23 02:02:20,508 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 77/100 (estimated time remaining: 41 minutes, 28 seconds)
2026-01-23 02:04:03,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:04:05,988 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 741.71960 ± 108.440
2026-01-23 02:04:05,988 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [687.6872, 713.4424, 863.1741, 723.49854, 685.7182, 765.75867, 751.9264, 517.2656, 952.38074, 756.34393]
2026-01-23 02:04:05,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [138.0, 144.0, 171.0, 139.0, 137.0, 150.0, 146.0, 104.0, 187.0, 149.0]
2026-01-23 02:04:05,996 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 78/100 (estimated time remaining: 39 minutes, 45 seconds)
2026-01-23 02:05:45,193 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:05:47,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 733.87195 ± 225.088
2026-01-23 02:05:47,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [581.5902, 641.65765, 1065.2316, 493.8119, 1050.8608, 714.8267, 612.48083, 476.95425, 1072.757, 628.5487]
2026-01-23 02:05:47,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [112.0, 121.0, 210.0, 94.0, 211.0, 142.0, 118.0, 90.0, 220.0, 126.0]
2026-01-23 02:05:47,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 79/100 (estimated time remaining: 37 minutes, 48 seconds)
2026-01-23 02:07:30,110 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:07:31,940 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 639.27936 ± 138.711
2026-01-23 02:07:31,941 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [578.3426, 550.03033, 764.5494, 549.84155, 978.4691, 685.1914, 574.18585, 490.09717, 687.8274, 534.25946]
2026-01-23 02:07:31,941 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [111.0, 107.0, 154.0, 105.0, 199.0, 135.0, 111.0, 94.0, 136.0, 104.0]
2026-01-23 02:07:31,949 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 80/100 (estimated time remaining: 36 minutes, 13 seconds)
2026-01-23 02:09:10,379 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:09:12,788 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 848.08008 ± 286.694
2026-01-23 02:09:12,788 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [583.23157, 604.7406, 949.42773, 643.6705, 1027.491, 1484.7662, 1108.0502, 612.2252, 569.9807, 897.21655]
2026-01-23 02:09:12,788 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [111.0, 121.0, 187.0, 120.0, 216.0, 304.0, 226.0, 118.0, 111.0, 179.0]
2026-01-23 02:09:12,788 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (848.08) for latency DatasetOffice
2026-01-23 02:09:12,796 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 81/100 (estimated time remaining: 34 minutes, 17 seconds)
2026-01-23 02:10:53,171 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:10:55,170 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 701.90735 ± 152.262
2026-01-23 02:10:55,170 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [950.8543, 688.7009, 716.1321, 498.4933, 698.1201, 500.63794, 749.6589, 555.46686, 954.569, 706.4403]
2026-01-23 02:10:55,170 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [190.0, 135.0, 145.0, 98.0, 141.0, 102.0, 148.0, 110.0, 194.0, 138.0]
2026-01-23 02:10:55,178 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 82/100 (estimated time remaining: 32 minutes, 35 seconds)
2026-01-23 02:12:36,429 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:12:38,554 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 769.70459 ± 291.139
2026-01-23 02:12:38,554 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1087.5149, 657.02496, 754.31934, 635.6144, 867.1744, 1485.232, 594.63184, 497.29898, 540.78094, 577.4541]
2026-01-23 02:12:38,554 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [222.0, 144.0, 150.0, 123.0, 184.0, 297.0, 115.0, 95.0, 105.0, 112.0]
2026-01-23 02:12:38,562 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 83/100 (estimated time remaining: 30 minutes, 45 seconds)
2026-01-23 02:14:18,834 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:14:20,936 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 751.01532 ± 199.627
2026-01-23 02:14:20,936 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1007.7869, 553.34125, 872.04614, 735.2137, 968.86633, 550.46094, 848.9678, 494.46268, 499.45657, 979.5515]
2026-01-23 02:14:20,936 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [208.0, 108.0, 177.0, 142.0, 198.0, 107.0, 171.0, 94.0, 98.0, 201.0]
2026-01-23 02:14:20,950 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 84/100 (estimated time remaining: 29 minutes, 6 seconds)
2026-01-23 02:16:02,598 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:16:04,202 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 578.67853 ± 82.776
2026-01-23 02:16:04,202 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [572.27106, 526.2139, 636.2559, 486.32178, 652.4496, 545.48474, 506.5074, 466.60535, 670.8198, 723.8559]
2026-01-23 02:16:04,202 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [109.0, 103.0, 125.0, 97.0, 130.0, 106.0, 99.0, 92.0, 133.0, 159.0]
2026-01-23 02:16:04,211 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 85/100 (estimated time remaining: 27 minutes, 19 seconds)
2026-01-23 02:17:43,881 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:17:46,369 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 870.40546 ± 253.175
2026-01-23 02:17:46,369 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [988.7986, 669.7092, 855.0569, 526.75354, 887.67914, 604.4202, 1467.191, 777.2277, 1050.4733, 876.74506]
2026-01-23 02:17:46,369 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [209.0, 131.0, 172.0, 100.0, 175.0, 118.0, 306.0, 153.0, 225.0, 172.0]
2026-01-23 02:17:46,369 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (870.41) for latency DatasetOffice
2026-01-23 02:17:46,377 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 86/100 (estimated time remaining: 25 minutes, 40 seconds)
2026-01-23 02:19:26,784 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:19:28,562 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 640.47174 ± 319.620
2026-01-23 02:19:28,562 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [637.15204, 773.3117, 943.89886, 555.00146, 1317.2357, 797.3958, 227.0549, 399.69543, 541.5535, 212.41756]
2026-01-23 02:19:28,562 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [124.0, 153.0, 188.0, 109.0, 274.0, 157.0, 49.0, 76.0, 104.0, 41.0]
2026-01-23 02:19:28,571 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 87/100 (estimated time remaining: 23 minutes, 57 seconds)
2026-01-23 02:21:10,905 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:21:12,946 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 742.78009 ± 207.865
2026-01-23 02:21:12,947 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [518.46484, 584.0761, 1095.2772, 1074.6179, 685.726, 553.96234, 874.959, 535.9195, 859.28577, 645.5121]
2026-01-23 02:21:12,947 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [99.0, 114.0, 218.0, 214.0, 136.0, 107.0, 176.0, 100.0, 169.0, 129.0]
2026-01-23 02:21:12,956 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 88/100 (estimated time remaining: 22 minutes, 17 seconds)
2026-01-23 02:22:53,614 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:22:55,910 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 813.14850 ± 293.762
2026-01-23 02:22:55,910 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [585.081, 710.5271, 540.5713, 638.4579, 671.59204, 716.94226, 1313.7976, 522.7793, 1156.8032, 1274.9332]
2026-01-23 02:22:55,910 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [117.0, 139.0, 106.0, 126.0, 131.0, 147.0, 275.0, 99.0, 227.0, 269.0]
2026-01-23 02:22:55,919 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 89/100 (estimated time remaining: 20 minutes, 35 seconds)
2026-01-23 02:24:36,763 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:24:39,137 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 837.48907 ± 297.731
2026-01-23 02:24:39,137 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1207.1014, 602.67346, 722.13354, 631.1117, 687.6739, 539.4473, 1522.443, 1014.6515, 700.0683, 747.58685]
2026-01-23 02:24:39,137 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [251.0, 117.0, 142.0, 123.0, 141.0, 105.0, 320.0, 203.0, 138.0, 148.0]
2026-01-23 02:24:39,145 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 90/100 (estimated time remaining: 18 minutes, 52 seconds)
2026-01-23 02:26:19,501 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:26:21,931 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 864.23846 ± 244.471
2026-01-23 02:26:21,931 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [519.04877, 557.18524, 1257.9962, 959.3405, 837.00006, 731.442, 607.36316, 944.7849, 1067.1499, 1161.0732]
2026-01-23 02:26:21,931 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [101.0, 107.0, 262.0, 192.0, 178.0, 143.0, 121.0, 187.0, 215.0, 246.0]
2026-01-23 02:26:21,939 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 91/100 (estimated time remaining: 17 minutes, 11 seconds)
2026-01-23 02:28:03,749 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:28:05,803 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 735.72650 ± 139.301
2026-01-23 02:28:05,803 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [900.8909, 595.87744, 699.0686, 757.1247, 1052.0948, 678.89105, 583.11755, 667.81793, 630.61743, 791.76526]
2026-01-23 02:28:05,803 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [179.0, 116.0, 136.0, 149.0, 222.0, 133.0, 113.0, 130.0, 123.0, 157.0]
2026-01-23 02:28:05,811 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 92/100 (estimated time remaining: 15 minutes, 31 seconds)
2026-01-23 02:29:47,555 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:29:50,474 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1034.35583 ± 322.807
2026-01-23 02:29:50,474 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1164.6177, 1192.0529, 794.83704, 778.07947, 854.91016, 865.3414, 792.27423, 713.1497, 1476.0758, 1712.2203]
2026-01-23 02:29:50,474 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [248.0, 251.0, 157.0, 152.0, 167.0, 172.0, 174.0, 140.0, 298.0, 351.0]
2026-01-23 02:29:50,474 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (1034.36) for latency DatasetOffice
2026-01-23 02:29:50,482 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 48 seconds)
2026-01-23 02:31:29,826 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:31:32,307 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 859.17078 ± 140.537
2026-01-23 02:31:32,307 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [838.08356, 900.23755, 637.95856, 1038.7438, 1022.211, 766.0085, 917.21814, 1023.97394, 647.27094, 800.00226]
2026-01-23 02:31:32,307 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [175.0, 180.0, 135.0, 208.0, 214.0, 152.0, 178.0, 209.0, 134.0, 162.0]
2026-01-23 02:31:32,316 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 94/100 (estimated time remaining: 12 minutes, 2 seconds)
2026-01-23 02:33:14,324 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:33:16,388 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 740.90784 ± 152.853
2026-01-23 02:33:16,388 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [730.63666, 592.1705, 703.3603, 577.86035, 853.3759, 730.59973, 980.47906, 584.94916, 635.6805, 1019.9664]
2026-01-23 02:33:16,388 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [146.0, 113.0, 139.0, 112.0, 179.0, 144.0, 196.0, 113.0, 127.0, 208.0]
2026-01-23 02:33:16,399 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 20 seconds)
2026-01-23 02:34:57,977 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:35:00,483 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 903.03973 ± 333.853
2026-01-23 02:35:00,483 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [511.90433, 584.5798, 1358.6533, 850.2904, 765.32886, 837.662, 1454.9318, 586.3406, 737.05853, 1343.6473]
2026-01-23 02:35:00,483 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [102.0, 113.0, 283.0, 170.0, 151.0, 161.0, 291.0, 110.0, 145.0, 269.0]
2026-01-23 02:35:00,494 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 38 seconds)
2026-01-23 02:36:39,973 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:36:42,383 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 852.73212 ± 173.679
2026-01-23 02:36:42,383 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [485.77237, 812.6505, 1159.7511, 722.73975, 1032.1301, 879.87714, 819.553, 918.6502, 759.1922, 937.00525]
2026-01-23 02:36:42,383 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [97.0, 170.0, 235.0, 142.0, 202.0, 177.0, 164.0, 185.0, 151.0, 185.0]
2026-01-23 02:36:42,394 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 53 seconds)
2026-01-23 02:38:26,002 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:38:28,410 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 858.48065 ± 276.106
2026-01-23 02:38:28,410 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [511.4334, 608.87915, 850.74084, 714.17975, 925.5626, 811.1456, 997.9533, 562.08826, 1463.1284, 1139.6956]
2026-01-23 02:38:28,410 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [96.0, 127.0, 170.0, 140.0, 189.0, 165.0, 217.0, 108.0, 294.0, 234.0]
2026-01-23 02:38:28,421 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 10 seconds)
2026-01-23 02:40:07,931 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:40:10,442 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 873.93909 ± 394.027
2026-01-23 02:40:10,442 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [634.9051, 623.4746, 709.7392, 649.8186, 1762.9448, 787.4231, 710.5094, 585.81335, 1530.5536, 744.2095]
2026-01-23 02:40:10,442 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [124.0, 121.0, 139.0, 127.0, 355.0, 156.0, 141.0, 114.0, 311.0, 147.0]
2026-01-23 02:40:10,452 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 27 seconds)
2026-01-23 02:41:51,247 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:41:54,140 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 998.41974 ± 344.037
2026-01-23 02:41:54,140 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [830.4341, 962.91693, 827.46155, 721.81586, 868.9824, 1656.581, 563.1104, 751.4182, 1290.6375, 1510.8394]
2026-01-23 02:41:54,140 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [165.0, 201.0, 167.0, 141.0, 180.0, 335.0, 110.0, 150.0, 270.0, 307.0]
2026-01-23 02:41:54,152 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 43 seconds)
2026-01-23 02:43:37,823 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:43:40,720 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1005.14404 ± 228.721
2026-01-23 02:43:40,720 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [836.27704, 1073.998, 1419.6688, 701.6519, 901.5525, 1296.7968, 787.6339, 825.22485, 1228.3975, 980.23944]
2026-01-23 02:43:40,721 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [172.0, 211.0, 290.0, 137.0, 181.0, 270.0, 152.0, 165.0, 241.0, 210.0]
2026-01-23 02:43:40,731 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1299 [DEBUG]: Training session finished
