2026-01-25 17:02:38,825 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-humanoid/DatasetOffice-sac
2026-01-25 17:02:38,825 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-humanoid/DatasetOffice-sac
2026-01-25 17:02:38,825 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x14723556c690>}
2026-01-25 17:02:38,825 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1159 [DEBUG]: using device: cuda
2026-01-25 17:02:38,964 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1181 [INFO]: Creating new trainer
2026-01-25 17:02:38,970 baseline-sac-noisy-humanoid:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=376, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2026-01-25 17:02:38,970 baseline-sac-noisy-humanoid:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-25 17:02:39,932 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1242 [DEBUG]: Starting training session...
2026-01-25 17:02:39,932 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 1/100
2026-01-25 17:04:09,565 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:04:10,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 314.51175 ± 62.022
2026-01-25 17:04:10,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [373.3612, 291.83075, 267.10773, 286.11948, 277.9633, 476.326, 276.0858, 332.35904, 283.57266, 280.39145]
2026-01-25 17:04:10,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [70.0, 58.0, 53.0, 56.0, 54.0, 90.0, 54.0, 64.0, 56.0, 55.0]
2026-01-25 17:04:10,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (314.51) for latency DatasetOffice
2026-01-25 17:04:10,305 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 29 minutes, 6 seconds)
2026-01-25 17:05:46,917 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:05:47,768 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 354.63705 ± 97.470
2026-01-25 17:05:47,768 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [270.69568, 419.81686, 312.87885, 504.11145, 283.55588, 230.11647, 323.39514, 381.8314, 285.0371, 534.93164]
2026-01-25 17:05:47,768 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [59.0, 82.0, 68.0, 94.0, 65.0, 50.0, 72.0, 73.0, 66.0, 102.0]
2026-01-25 17:05:47,768 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (354.64) for latency DatasetOffice
2026-01-25 17:05:47,774 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 33 minutes, 24 seconds)
2026-01-25 17:07:25,332 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:07:26,167 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 347.17621 ± 71.528
2026-01-25 17:07:26,167 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [342.72998, 404.91446, 359.5589, 516.9244, 262.72388, 254.94292, 361.91827, 346.0544, 324.53, 297.465]
2026-01-25 17:07:26,167 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [67.0, 85.0, 75.0, 105.0, 50.0, 55.0, 73.0, 65.0, 73.0, 62.0]
2026-01-25 17:07:26,170 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 34 minutes, 15 seconds)
2026-01-25 17:09:02,328 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:09:03,253 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 373.40460 ± 60.207
2026-01-25 17:09:03,253 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [337.34174, 383.85535, 374.84488, 438.85974, 310.9478, 336.53265, 284.36334, 414.01443, 497.5168, 355.76923]
2026-01-25 17:09:03,253 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [71.0, 74.0, 80.0, 97.0, 70.0, 74.0, 62.0, 79.0, 102.0, 76.0]
2026-01-25 17:09:03,253 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (373.40) for latency DatasetOffice
2026-01-25 17:09:03,257 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 33 minutes, 19 seconds)
2026-01-25 17:10:40,386 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:10:41,308 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 395.60065 ± 82.042
2026-01-25 17:10:41,308 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [262.36047, 370.50388, 403.36328, 564.14935, 485.4559, 410.93777, 347.10464, 430.57245, 304.98322, 376.57538]
2026-01-25 17:10:41,308 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [54.0, 77.0, 77.0, 117.0, 95.0, 74.0, 68.0, 95.0, 61.0, 72.0]
2026-01-25 17:10:41,308 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (395.60) for latency DatasetOffice
2026-01-25 17:10:41,311 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 32 minutes, 26 seconds)
2026-01-25 17:12:18,233 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:12:19,304 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 452.45297 ± 98.112
2026-01-25 17:12:19,304 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [325.19455, 429.70007, 499.3662, 650.9141, 425.19806, 509.93872, 496.14, 501.45435, 289.5579, 397.06522]
2026-01-25 17:12:19,304 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [70.0, 87.0, 94.0, 127.0, 81.0, 108.0, 99.0, 107.0, 64.0, 76.0]
2026-01-25 17:12:19,304 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (452.45) for latency DatasetOffice
2026-01-25 17:12:19,308 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 33 minutes, 13 seconds)
2026-01-25 17:13:56,166 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:13:57,055 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 392.24567 ± 66.536
2026-01-25 17:13:57,056 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [469.36136, 291.167, 437.10538, 362.53494, 455.03683, 288.32025, 400.76935, 421.02344, 467.44464, 329.69354]
2026-01-25 17:13:57,056 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [99.0, 62.0, 84.0, 68.0, 84.0, 54.0, 76.0, 79.0, 87.0, 69.0]
2026-01-25 17:13:57,060 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 31 minutes, 40 seconds)
2026-01-25 17:15:34,552 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:15:35,545 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 424.76105 ± 114.928
2026-01-25 17:15:35,545 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [527.78217, 347.75677, 323.73914, 549.7082, 347.61792, 306.89645, 344.4542, 639.757, 522.5633, 337.3353]
2026-01-25 17:15:35,545 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [112.0, 74.0, 72.0, 105.0, 77.0, 58.0, 72.0, 123.0, 96.0, 74.0]
2026-01-25 17:15:35,552 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 30 minutes, 4 seconds)
2026-01-25 17:17:12,038 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:17:13,388 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 571.48688 ± 103.598
2026-01-25 17:17:13,388 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [689.0006, 715.8309, 499.68625, 679.3586, 655.9152, 575.00104, 459.04892, 491.15775, 400.74252, 549.1263]
2026-01-25 17:17:13,388 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [150.0, 138.0, 106.0, 139.0, 125.0, 119.0, 103.0, 93.0, 73.0, 115.0]
2026-01-25 17:17:13,388 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (571.49) for latency DatasetOffice
2026-01-25 17:17:13,391 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 28 minutes, 40 seconds)
2026-01-25 17:18:49,115 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:18:50,198 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 455.57513 ± 105.716
2026-01-25 17:18:50,198 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [690.3853, 344.9108, 423.27002, 346.73224, 494.4586, 425.33673, 352.82687, 394.67798, 541.0759, 542.07697]
2026-01-25 17:18:50,199 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [138.0, 66.0, 89.0, 73.0, 116.0, 94.0, 76.0, 76.0, 106.0, 102.0]
2026-01-25 17:18:50,202 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 26 minutes, 40 seconds)
2026-01-25 17:20:26,539 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:20:27,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 464.48785 ± 87.385
2026-01-25 17:20:27,649 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [536.6063, 395.94656, 490.36078, 495.87436, 390.58054, 378.02405, 631.92786, 525.20337, 323.12366, 477.23093]
2026-01-25 17:20:27,649 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [116.0, 88.0, 95.0, 109.0, 72.0, 87.0, 118.0, 98.0, 74.0, 92.0]
2026-01-25 17:20:27,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 24 minutes, 52 seconds)
2026-01-25 17:22:02,606 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:22:03,696 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 469.72437 ± 101.198
2026-01-25 17:22:03,696 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [458.91818, 406.86682, 617.2722, 652.1292, 392.8178, 411.50272, 481.04547, 318.9485, 552.9261, 404.81647]
2026-01-25 17:22:03,696 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [85.0, 90.0, 122.0, 126.0, 86.0, 81.0, 90.0, 59.0, 109.0, 77.0]
2026-01-25 17:22:03,699 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 22 minutes, 44 seconds)
2026-01-25 17:23:38,721 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:23:39,640 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 421.42123 ± 94.949
2026-01-25 17:23:39,641 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [409.99527, 542.27264, 404.23962, 165.06557, 425.18243, 401.8093, 453.66376, 483.68195, 449.35858, 478.94324]
2026-01-25 17:23:39,641 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [75.0, 112.0, 77.0, 32.0, 87.0, 76.0, 83.0, 87.0, 83.0, 85.0]
2026-01-25 17:23:39,644 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 20 minutes, 23 seconds)
2026-01-25 17:25:15,088 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:25:16,256 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 506.57013 ± 81.984
2026-01-25 17:25:16,256 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [573.5758, 560.1972, 590.2453, 442.50946, 342.28894, 638.0139, 462.71408, 480.8678, 511.90237, 463.386]
2026-01-25 17:25:16,256 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [127.0, 105.0, 110.0, 94.0, 64.0, 137.0, 87.0, 91.0, 96.0, 86.0]
2026-01-25 17:25:16,259 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 18 minutes, 25 seconds)
2026-01-25 17:26:52,202 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:26:53,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 616.57800 ± 142.690
2026-01-25 17:26:53,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [583.4288, 550.63446, 884.15106, 576.70966, 721.47235, 504.31366, 349.68857, 605.6945, 600.593, 789.09454]
2026-01-25 17:26:53,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [126.0, 114.0, 173.0, 112.0, 149.0, 104.0, 65.0, 115.0, 126.0, 155.0]
2026-01-25 17:26:53,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (616.58) for latency DatasetOffice
2026-01-25 17:26:53,633 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 16 minutes, 58 seconds)
2026-01-25 17:28:29,213 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:28:30,654 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 636.06915 ± 183.875
2026-01-25 17:28:30,654 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [840.39087, 437.20856, 624.1641, 868.3333, 447.9188, 626.66846, 877.50055, 725.97394, 321.73016, 590.8031]
2026-01-25 17:28:30,654 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [160.0, 99.0, 132.0, 163.0, 82.0, 118.0, 172.0, 138.0, 71.0, 107.0]
2026-01-25 17:28:30,654 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (636.07) for latency DatasetOffice
2026-01-25 17:28:30,658 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 15 minutes, 14 seconds)
2026-01-25 17:30:06,918 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:30:08,411 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 639.33417 ± 195.961
2026-01-25 17:30:08,411 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [455.5922, 624.33844, 1165.3889, 521.26733, 534.7821, 561.14355, 466.29437, 665.89014, 652.1091, 746.5357]
2026-01-25 17:30:08,411 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [87.0, 117.0, 245.0, 112.0, 110.0, 112.0, 99.0, 131.0, 126.0, 139.0]
2026-01-25 17:30:08,411 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (639.33) for latency DatasetOffice
2026-01-25 17:30:08,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 14 minutes, 6 seconds)
2026-01-25 17:31:43,659 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:31:44,893 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 538.72748 ± 121.471
2026-01-25 17:31:44,893 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [459.6439, 596.2831, 488.45532, 389.89154, 515.8517, 699.89325, 419.62936, 604.9411, 778.2387, 434.44666]
2026-01-25 17:31:44,893 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [90.0, 130.0, 90.0, 73.0, 97.0, 138.0, 78.0, 115.0, 161.0, 81.0]
2026-01-25 17:31:44,897 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 12 minutes, 38 seconds)
2026-01-25 17:33:20,920 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:33:22,366 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 606.59784 ± 196.802
2026-01-25 17:33:22,366 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [533.30273, 578.01373, 573.2705, 416.84003, 758.6133, 473.53668, 1138.349, 579.2344, 508.5287, 506.28934]
2026-01-25 17:33:22,366 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [104.0, 111.0, 107.0, 77.0, 145.0, 88.0, 245.0, 111.0, 110.0, 105.0]
2026-01-25 17:33:22,369 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 11 minutes, 14 seconds)
2026-01-25 17:34:58,336 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:34:59,829 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 635.40619 ± 120.700
2026-01-25 17:34:59,829 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [704.7379, 598.264, 829.2193, 567.0781, 604.6271, 500.2588, 646.04065, 611.0485, 842.7442, 450.04333]
2026-01-25 17:34:59,829 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [160.0, 122.0, 169.0, 107.0, 114.0, 93.0, 126.0, 123.0, 170.0, 86.0]
2026-01-25 17:34:59,833 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 9 minutes, 39 seconds)
2026-01-25 17:36:36,049 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:36:37,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 601.25891 ± 173.365
2026-01-25 17:36:37,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [575.7428, 572.1293, 609.22034, 1081.7245, 434.2355, 426.25806, 625.0424, 608.3306, 512.66846, 567.2371]
2026-01-25 17:36:37,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [108.0, 106.0, 112.0, 216.0, 81.0, 79.0, 116.0, 127.0, 95.0, 123.0]
2026-01-25 17:36:37,410 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 8 minutes, 10 seconds)
2026-01-25 17:38:13,392 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:38:14,532 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 482.74677 ± 206.801
2026-01-25 17:38:14,532 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [539.9777, 572.84595, 454.62845, 207.44757, 904.20233, 499.41687, 733.4372, 358.94116, 280.2075, 276.36255]
2026-01-25 17:38:14,532 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [101.0, 106.0, 86.0, 40.0, 188.0, 109.0, 147.0, 68.0, 68.0, 55.0]
2026-01-25 17:38:14,537 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 6 minutes, 23 seconds)
2026-01-25 17:39:50,811 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:39:52,401 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 619.82843 ± 119.026
2026-01-25 17:39:52,401 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [433.4414, 586.6935, 401.05695, 707.38544, 644.13403, 679.51154, 593.77277, 679.3608, 822.2655, 650.66187]
2026-01-25 17:39:52,401 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [85.0, 136.0, 92.0, 142.0, 131.0, 137.0, 120.0, 155.0, 176.0, 143.0]
2026-01-25 17:39:52,408 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 5 minutes, 7 seconds)
2026-01-25 17:41:28,393 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:41:29,515 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 507.13339 ± 169.364
2026-01-25 17:41:29,515 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [100.99923, 631.1258, 621.03564, 462.05774, 628.52, 582.91956, 713.17, 495.25162, 338.74506, 497.50916]
2026-01-25 17:41:29,515 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [20.0, 127.0, 119.0, 84.0, 118.0, 115.0, 134.0, 96.0, 63.0, 91.0]
2026-01-25 17:41:29,519 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 3 minutes, 24 seconds)
2026-01-25 17:43:05,252 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:43:06,721 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 635.54651 ± 294.702
2026-01-25 17:43:06,721 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [315.6195, 147.67926, 633.43097, 958.2204, 502.28323, 570.669, 1053.7896, 646.66156, 451.93808, 1075.1731]
2026-01-25 17:43:06,721 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [59.0, 33.0, 122.0, 187.0, 102.0, 111.0, 208.0, 135.0, 85.0, 201.0]
2026-01-25 17:43:06,726 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 1 minute, 43 seconds)
2026-01-25 17:44:44,095 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:44:45,581 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 627.65247 ± 310.550
2026-01-25 17:44:45,581 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1433.9651, 738.3231, 270.4743, 709.94916, 468.79874, 463.7548, 798.80255, 463.5378, 519.6471, 409.27206]
2026-01-25 17:44:45,581 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [285.0, 155.0, 51.0, 132.0, 93.0, 86.0, 163.0, 90.0, 104.0, 76.0]
2026-01-25 17:44:45,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 24 seconds)
2026-01-25 17:46:21,322 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:46:23,013 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 689.11230 ± 165.209
2026-01-25 17:46:23,013 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [446.97083, 1052.0409, 689.47485, 682.6357, 781.77167, 824.7318, 700.9611, 638.7597, 594.1115, 479.66486]
2026-01-25 17:46:23,013 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [90.0, 212.0, 144.0, 147.0, 162.0, 158.0, 131.0, 121.0, 124.0, 101.0]
2026-01-25 17:46:23,013 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (689.11) for latency DatasetOffice
2026-01-25 17:46:23,017 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 28/100 (estimated time remaining: 1 hour, 58 minutes, 51 seconds)
2026-01-25 17:47:58,463 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:48:00,766 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 926.40320 ± 254.658
2026-01-25 17:48:00,766 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1042.8285, 584.446, 1204.7941, 1016.1174, 552.1326, 722.2487, 872.6396, 1362.8903, 1120.1332, 785.8027]
2026-01-25 17:48:00,766 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [218.0, 108.0, 250.0, 196.0, 103.0, 135.0, 173.0, 290.0, 229.0, 150.0]
2026-01-25 17:48:00,766 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (926.40) for latency DatasetOffice
2026-01-25 17:48:00,771 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 29/100 (estimated time remaining: 1 hour, 57 minutes, 12 seconds)
2026-01-25 17:49:37,387 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:49:38,968 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 673.29871 ± 133.910
2026-01-25 17:49:38,968 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [570.7321, 530.28925, 704.04645, 761.2254, 550.6473, 555.09296, 622.9899, 773.4298, 678.06555, 986.46844]
2026-01-25 17:49:38,968 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [110.0, 99.0, 131.0, 138.0, 115.0, 107.0, 114.0, 169.0, 123.0, 208.0]
2026-01-25 17:49:38,974 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 30/100 (estimated time remaining: 1 hour, 55 minutes, 50 seconds)
2026-01-25 17:51:16,320 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:51:17,774 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 618.07501 ± 142.612
2026-01-25 17:51:17,774 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [761.9017, 548.8972, 555.42194, 529.5217, 625.6956, 469.32474, 772.9018, 451.0236, 914.2106, 551.8512]
2026-01-25 17:51:17,774 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [147.0, 106.0, 105.0, 107.0, 117.0, 102.0, 147.0, 86.0, 182.0, 104.0]
2026-01-25 17:51:17,780 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 54 minutes, 34 seconds)
2026-01-25 17:52:51,489 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:52:53,058 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 663.59540 ± 242.351
2026-01-25 17:52:53,058 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [602.33606, 543.09924, 89.30136, 722.91736, 719.76074, 938.45593, 663.5135, 564.70355, 1036.0956, 755.7707]
2026-01-25 17:52:53,058 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [120.0, 112.0, 19.0, 141.0, 135.0, 189.0, 132.0, 122.0, 198.0, 152.0]
2026-01-25 17:52:53,062 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 52 minutes, 7 seconds)
2026-01-25 17:54:30,116 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:54:31,741 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 714.40985 ± 229.748
2026-01-25 17:54:31,741 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [556.52954, 628.2613, 1305.3024, 850.76483, 512.96436, 582.89465, 762.2497, 567.7493, 838.27484, 539.10803]
2026-01-25 17:54:31,741 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [104.0, 128.0, 257.0, 160.0, 94.0, 111.0, 143.0, 103.0, 157.0, 100.0]
2026-01-25 17:54:31,746 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 50 minutes, 46 seconds)
2026-01-25 17:56:08,065 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:56:10,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 870.84778 ± 359.237
2026-01-25 17:56:10,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [731.5454, 1011.9423, 427.00983, 1488.2323, 746.6802, 1190.0132, 651.68884, 533.0696, 1397.8654, 530.4307]
2026-01-25 17:56:10,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [154.0, 214.0, 83.0, 323.0, 146.0, 231.0, 125.0, 102.0, 281.0, 103.0]
2026-01-25 17:56:10,232 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 49 minutes, 18 seconds)
2026-01-25 17:57:45,270 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:57:46,839 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 675.69806 ± 186.293
2026-01-25 17:57:46,839 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [542.4161, 560.6421, 552.5008, 666.6077, 567.85693, 978.8279, 705.8794, 497.7883, 1075.9413, 608.5201]
2026-01-25 17:57:46,839 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [105.0, 103.0, 123.0, 128.0, 108.0, 192.0, 133.0, 104.0, 205.0, 112.0]
2026-01-25 17:57:46,847 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 47 minutes, 19 seconds)
2026-01-25 17:59:23,010 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:59:24,315 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 549.55505 ± 303.984
2026-01-25 17:59:24,315 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [883.1916, 801.5881, 591.4775, 717.3668, 1089.0903, 399.0675, 247.8711, 311.7673, 385.1993, 68.93098]
2026-01-25 17:59:24,315 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [188.0, 166.0, 118.0, 151.0, 216.0, 87.0, 49.0, 60.0, 71.0, 14.0]
2026-01-25 17:59:24,324 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 45 minutes, 25 seconds)
2026-01-25 18:01:00,616 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:01:02,338 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 736.86267 ± 279.302
2026-01-25 18:01:02,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [549.0229, 777.8257, 687.79767, 1525.5522, 834.5478, 529.67114, 595.4521, 553.2733, 658.12085, 657.36285]
2026-01-25 18:01:02,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [104.0, 145.0, 131.0, 294.0, 175.0, 116.0, 114.0, 122.0, 124.0, 131.0]
2026-01-25 18:01:02,343 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 44 minutes, 22 seconds)
2026-01-25 18:02:37,624 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:02:39,477 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 794.32654 ± 185.696
2026-01-25 18:02:39,477 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [816.5217, 688.5231, 797.1569, 678.6494, 636.47577, 508.71793, 1230.5426, 788.8081, 921.98206, 875.8883]
2026-01-25 18:02:39,477 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [155.0, 132.0, 147.0, 143.0, 127.0, 104.0, 236.0, 154.0, 182.0, 166.0]
2026-01-25 18:02:39,481 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 42 minutes, 25 seconds)
2026-01-25 18:04:16,157 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:04:18,176 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 868.90918 ± 280.750
2026-01-25 18:04:18,176 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [715.3745, 1529.5841, 764.0679, 880.5505, 500.71506, 827.5405, 531.6498, 841.7144, 1034.399, 1063.4966]
2026-01-25 18:04:18,176 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [138.0, 304.0, 153.0, 173.0, 95.0, 167.0, 99.0, 161.0, 201.0, 195.0]
2026-01-25 18:04:18,181 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 40 minutes, 50 seconds)
2026-01-25 18:05:56,166 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:05:58,417 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 911.59796 ± 542.560
2026-01-25 18:05:58,418 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [641.7653, 1026.7277, 580.5367, 921.2323, 593.38416, 1540.4124, 669.36237, 769.8334, 2191.7275, 180.9982]
2026-01-25 18:05:58,418 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [136.0, 199.0, 113.0, 181.0, 116.0, 310.0, 144.0, 153.0, 438.0, 35.0]
2026-01-25 18:05:58,422 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 39 minutes, 57 seconds)
2026-01-25 18:07:31,800 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:07:33,846 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 847.97009 ± 351.317
2026-01-25 18:07:33,846 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1112.9624, 466.39426, 801.8935, 477.9205, 820.88544, 726.6669, 652.0352, 1746.6501, 931.09906, 743.1937]
2026-01-25 18:07:33,846 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [211.0, 102.0, 174.0, 91.0, 163.0, 147.0, 126.0, 341.0, 183.0, 154.0]
2026-01-25 18:07:33,852 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 37 minutes, 54 seconds)
2026-01-25 18:09:11,134 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:09:13,482 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 963.95593 ± 497.997
2026-01-25 18:09:13,482 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1429.7214, 2242.2979, 664.6086, 659.9951, 616.2188, 1202.9907, 703.967, 715.3217, 737.0403, 667.39886]
2026-01-25 18:09:13,483 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [288.0, 479.0, 125.0, 126.0, 116.0, 236.0, 133.0, 133.0, 137.0, 131.0]
2026-01-25 18:09:13,483 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (963.96) for latency DatasetOffice
2026-01-25 18:09:13,490 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 36 minutes, 35 seconds)
2026-01-25 18:10:48,663 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:10:50,762 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 914.17401 ± 302.050
2026-01-25 18:10:50,763 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [998.2535, 633.2372, 515.1849, 1086.7886, 838.3867, 739.6636, 623.4449, 878.213, 1327.3147, 1501.2532]
2026-01-25 18:10:50,763 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [189.0, 120.0, 97.0, 201.0, 157.0, 141.0, 118.0, 161.0, 261.0, 308.0]
2026-01-25 18:10:50,771 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 34 minutes, 58 seconds)
2026-01-25 18:12:25,730 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:12:27,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 724.27502 ± 229.550
2026-01-25 18:12:27,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [648.15405, 701.9097, 968.6335, 926.9686, 471.4207, 699.4645, 278.52072, 622.86884, 1090.6633, 834.14716]
2026-01-25 18:12:27,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [122.0, 134.0, 187.0, 177.0, 88.0, 133.0, 52.0, 120.0, 219.0, 153.0]
2026-01-25 18:12:27,354 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 32 minutes, 56 seconds)
2026-01-25 18:14:04,247 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:14:06,111 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 772.19861 ± 312.103
2026-01-25 18:14:06,112 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [884.2444, 790.06586, 710.20123, 729.28424, 550.67236, 632.4513, 542.1823, 1655.532, 655.37506, 571.9778]
2026-01-25 18:14:06,112 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [172.0, 155.0, 141.0, 142.0, 102.0, 128.0, 102.0, 356.0, 121.0, 105.0]
2026-01-25 18:14:06,117 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 31 minutes, 2 seconds)
2026-01-25 18:15:42,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:15:44,066 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 740.53998 ± 160.161
2026-01-25 18:15:44,066 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [779.91077, 636.30695, 738.2711, 991.10486, 640.03827, 1061.4666, 702.1879, 535.47974, 583.41785, 737.21545]
2026-01-25 18:15:44,066 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [152.0, 122.0, 139.0, 193.0, 141.0, 204.0, 136.0, 117.0, 115.0, 146.0]
2026-01-25 18:15:44,072 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 29 minutes, 52 seconds)
2026-01-25 18:17:19,732 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:17:21,808 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 882.91766 ± 356.537
2026-01-25 18:17:21,808 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1174.2642, 808.6033, 689.7267, 764.1897, 623.31793, 810.33276, 791.4282, 704.6901, 1853.3279, 609.2957]
2026-01-25 18:17:21,808 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [224.0, 165.0, 132.0, 152.0, 121.0, 157.0, 156.0, 138.0, 371.0, 114.0]
2026-01-25 18:17:21,815 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 27 minutes, 53 seconds)
2026-01-25 18:18:57,374 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:18:59,671 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 931.55652 ± 415.135
2026-01-25 18:18:59,671 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [662.0815, 1493.1185, 1906.7668, 940.89294, 833.98114, 577.51776, 859.2761, 821.60645, 723.1149, 497.20956]
2026-01-25 18:18:59,671 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [130.0, 290.0, 404.0, 183.0, 157.0, 112.0, 183.0, 160.0, 161.0, 102.0]
2026-01-25 18:18:59,678 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 26 minutes, 22 seconds)
2026-01-25 18:20:35,662 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:20:38,209 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1069.38281 ± 614.584
2026-01-25 18:20:38,209 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [929.5225, 2730.5432, 577.4188, 995.7698, 734.211, 629.29193, 821.24817, 1556.6206, 689.80023, 1029.4028]
2026-01-25 18:20:38,209 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [183.0, 525.0, 115.0, 203.0, 142.0, 123.0, 164.0, 295.0, 131.0, 201.0]
2026-01-25 18:20:38,209 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (1069.38) for latency DatasetOffice
2026-01-25 18:20:38,215 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 25 minutes, 4 seconds)
2026-01-25 18:22:15,395 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:22:16,984 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 694.16669 ± 126.755
2026-01-25 18:22:16,984 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [791.31366, 564.54193, 594.6905, 619.23804, 532.3361, 806.5806, 682.99493, 717.31647, 658.86896, 973.78546]
2026-01-25 18:22:16,984 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [147.0, 111.0, 113.0, 117.0, 100.0, 153.0, 128.0, 142.0, 129.0, 189.0]
2026-01-25 18:22:16,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 23 minutes, 26 seconds)
2026-01-25 18:23:53,522 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:23:55,872 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 938.63330 ± 318.477
2026-01-25 18:23:55,872 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1413.7252, 822.2721, 1053.7537, 994.6808, 547.33075, 1368.8243, 647.1332, 558.65094, 1294.7888, 685.1724]
2026-01-25 18:23:55,872 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [293.0, 178.0, 221.0, 193.0, 110.0, 262.0, 141.0, 104.0, 250.0, 130.0]
2026-01-25 18:23:55,879 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 21 minutes, 58 seconds)
2026-01-25 18:25:31,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:25:33,047 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 740.57904 ± 254.105
2026-01-25 18:25:33,047 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [802.0098, 903.5232, 687.89264, 1059.2675, 900.6731, 789.9182, 156.69682, 999.7914, 600.3972, 505.62057]
2026-01-25 18:25:33,047 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [152.0, 172.0, 128.0, 203.0, 176.0, 152.0, 31.0, 193.0, 115.0, 101.0]
2026-01-25 18:25:33,056 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 20 minutes, 14 seconds)
2026-01-25 18:27:09,928 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:27:12,283 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1000.36945 ± 370.953
2026-01-25 18:27:12,283 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1417.1969, 980.01465, 818.0687, 704.7468, 1864.7694, 781.49097, 995.23584, 764.40533, 1137.4783, 540.28845]
2026-01-25 18:27:12,283 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [275.0, 183.0, 150.0, 132.0, 372.0, 151.0, 187.0, 148.0, 219.0, 101.0]
2026-01-25 18:27:12,290 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 18 minutes, 49 seconds)
2026-01-25 18:28:48,373 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:28:50,188 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 758.51501 ± 204.307
2026-01-25 18:28:50,188 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [770.6515, 566.66626, 1088.3944, 536.727, 923.5937, 784.01935, 663.2837, 615.71423, 1099.1182, 536.98254]
2026-01-25 18:28:50,188 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [154.0, 113.0, 233.0, 110.0, 172.0, 147.0, 122.0, 121.0, 210.0, 113.0]
2026-01-25 18:28:50,194 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 17 minutes, 4 seconds)
2026-01-25 18:30:26,241 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:30:28,452 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 929.08997 ± 297.392
2026-01-25 18:30:28,452 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1313.5822, 1300.9153, 963.41205, 402.83096, 643.37427, 1043.0856, 847.75903, 1300.998, 797.5034, 677.43854]
2026-01-25 18:30:28,452 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [270.0, 248.0, 182.0, 72.0, 122.0, 203.0, 156.0, 264.0, 161.0, 125.0]
2026-01-25 18:30:28,459 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 15 minutes, 21 seconds)
2026-01-25 18:32:06,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:32:08,714 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1045.72522 ± 349.690
2026-01-25 18:32:08,714 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1065.019, 1072.9764, 1945.5073, 971.3525, 775.194, 795.9551, 807.4455, 1031.0669, 1319.7385, 672.9961]
2026-01-25 18:32:08,714 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [208.0, 209.0, 403.0, 194.0, 149.0, 152.0, 175.0, 210.0, 256.0, 131.0]
2026-01-25 18:32:08,721 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 13 minutes, 55 seconds)
2026-01-25 18:33:43,438 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:33:44,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 654.73865 ± 269.582
2026-01-25 18:33:44,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [676.6207, 997.1629, 836.37585, 615.92865, 657.27466, 590.76636, 1098.1095, 469.15637, 517.4143, 88.57696]
2026-01-25 18:33:44,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [136.0, 211.0, 157.0, 120.0, 122.0, 114.0, 218.0, 89.0, 98.0, 18.0]
2026-01-25 18:33:44,996 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 12 minutes, 9 seconds)
2026-01-25 18:35:20,984 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:35:23,330 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 997.23083 ± 415.594
2026-01-25 18:35:23,330 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1774.1698, 908.0286, 779.1953, 578.3544, 721.1832, 952.9866, 1833.2235, 770.33844, 770.86334, 883.9646]
2026-01-25 18:35:23,330 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [343.0, 171.0, 144.0, 114.0, 130.0, 180.0, 349.0, 145.0, 152.0, 167.0]
2026-01-25 18:35:23,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 10 minutes, 22 seconds)
2026-01-25 18:36:59,545 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:37:01,770 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 935.94855 ± 368.297
2026-01-25 18:37:01,770 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1674.1625, 763.69037, 981.7816, 906.9959, 562.65894, 684.4729, 701.1782, 1466.0585, 1131.757, 486.7297]
2026-01-25 18:37:01,770 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [320.0, 141.0, 191.0, 172.0, 117.0, 128.0, 133.0, 304.0, 219.0, 93.0]
2026-01-25 18:37:01,777 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 8 minutes, 49 seconds)
2026-01-25 18:38:38,665 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:38:40,508 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 795.83325 ± 348.040
2026-01-25 18:38:40,508 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1681.6675, 822.89636, 628.7528, 920.10583, 518.1938, 831.2121, 500.51266, 1018.87726, 437.73004, 598.384]
2026-01-25 18:38:40,508 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [319.0, 158.0, 126.0, 169.0, 98.0, 155.0, 104.0, 195.0, 91.0, 119.0]
2026-01-25 18:38:40,516 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 7 minutes, 14 seconds)
2026-01-25 18:40:16,645 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:40:19,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1048.33813 ± 354.780
2026-01-25 18:40:19,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [535.8805, 1173.7538, 883.2591, 675.7244, 782.7708, 795.5971, 1473.1818, 1667.3281, 1383.0886, 1112.797]
2026-01-25 18:40:19,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [101.0, 235.0, 171.0, 133.0, 148.0, 151.0, 297.0, 328.0, 268.0, 209.0]
2026-01-25 18:40:19,133 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 5 minutes, 23 seconds)
2026-01-25 18:41:56,268 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:41:58,825 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1125.28931 ± 334.125
2026-01-25 18:41:58,825 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1076.4541, 815.915, 1113.2344, 690.3231, 1375.0989, 917.9526, 1177.3313, 758.3909, 1717.0955, 1611.0975]
2026-01-25 18:41:58,825 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [203.0, 162.0, 208.0, 129.0, 258.0, 189.0, 223.0, 143.0, 325.0, 305.0]
2026-01-25 18:41:58,825 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (1125.29) for latency DatasetOffice
2026-01-25 18:41:58,832 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 4 minutes, 11 seconds)
2026-01-25 18:43:35,905 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:43:38,214 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 992.23224 ± 268.427
2026-01-25 18:43:38,214 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [727.5657, 817.0611, 833.00397, 912.5328, 732.48706, 1093.7781, 756.9226, 1411.355, 1137.1836, 1500.433]
2026-01-25 18:43:38,214 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [150.0, 154.0, 163.0, 181.0, 137.0, 206.0, 139.0, 271.0, 221.0, 277.0]
2026-01-25 18:43:38,224 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 2 minutes, 41 seconds)
2026-01-25 18:45:13,600 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:45:16,380 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1158.28064 ± 522.204
2026-01-25 18:45:16,380 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [890.1115, 1077.3752, 691.81274, 736.2522, 2214.4895, 855.2646, 1166.315, 2095.4248, 753.32764, 1102.4341]
2026-01-25 18:45:16,380 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [178.0, 207.0, 149.0, 156.0, 419.0, 162.0, 243.0, 419.0, 138.0, 209.0]
2026-01-25 18:45:16,380 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (1158.28) for latency DatasetOffice
2026-01-25 18:45:16,387 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 1 minute)
2026-01-25 18:46:53,399 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:46:55,724 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 997.28503 ± 202.332
2026-01-25 18:46:55,724 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1096.5768, 1056.0066, 673.92523, 819.75995, 902.282, 1080.9481, 1406.8213, 1199.2797, 902.85974, 834.3913]
2026-01-25 18:46:55,724 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [210.0, 208.0, 126.0, 153.0, 171.0, 199.0, 268.0, 233.0, 167.0, 155.0]
2026-01-25 18:46:55,732 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 65/100 (estimated time remaining: 59 minutes, 25 seconds)
2026-01-25 18:48:31,642 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:48:34,121 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1080.37048 ± 179.760
2026-01-25 18:48:34,121 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1036.5142, 910.53894, 949.9401, 1291.3485, 1220.1997, 942.2361, 898.029, 955.3145, 1146.439, 1453.145]
2026-01-25 18:48:34,122 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [192.0, 168.0, 176.0, 243.0, 234.0, 197.0, 168.0, 174.0, 214.0, 275.0]
2026-01-25 18:48:34,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 66/100 (estimated time remaining: 57 minutes, 44 seconds)
2026-01-25 18:50:10,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:50:12,378 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 889.03528 ± 223.466
2026-01-25 18:50:12,378 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [679.11224, 713.63354, 934.671, 1079.0433, 938.9629, 1241.7174, 450.35205, 1140.6565, 828.2608, 883.9427]
2026-01-25 18:50:12,378 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [125.0, 131.0, 180.0, 210.0, 176.0, 262.0, 100.0, 214.0, 152.0, 163.0]
2026-01-25 18:50:12,384 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 67/100 (estimated time remaining: 55 minutes, 56 seconds)
2026-01-25 18:51:48,268 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:51:51,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1280.55884 ± 590.708
2026-01-25 18:51:51,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1264.7867, 605.991, 889.4667, 1346.2891, 770.389, 2759.8782, 1452.5111, 1080.0404, 1739.7428, 896.4937]
2026-01-25 18:51:51,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [240.0, 113.0, 166.0, 264.0, 142.0, 539.0, 276.0, 204.0, 330.0, 168.0]
2026-01-25 18:51:51,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (1280.56) for latency DatasetOffice
2026-01-25 18:51:51,309 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 68/100 (estimated time remaining: 54 minutes, 14 seconds)
2026-01-25 18:53:28,259 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:53:30,541 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 976.81512 ± 444.860
2026-01-25 18:53:30,542 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1880.9072, 646.0355, 713.17163, 546.1982, 629.1887, 1070.8396, 800.19073, 865.07263, 1757.8116, 858.7353]
2026-01-25 18:53:30,542 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [365.0, 129.0, 135.0, 105.0, 119.0, 205.0, 151.0, 163.0, 342.0, 165.0]
2026-01-25 18:53:30,550 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 69/100 (estimated time remaining: 52 minutes, 42 seconds)
2026-01-25 18:55:09,674 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:55:12,315 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1113.48853 ± 511.350
2026-01-25 18:55:12,315 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [890.5558, 1306.6897, 1282.8807, 968.2665, 773.2669, 762.5762, 2425.0352, 1363.0609, 477.51846, 885.0343]
2026-01-25 18:55:12,315 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [171.0, 257.0, 245.0, 181.0, 142.0, 140.0, 468.0, 257.0, 96.0, 166.0]
2026-01-25 18:55:12,323 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 70/100 (estimated time remaining: 51 minutes, 18 seconds)
2026-01-25 18:56:58,460 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:57:01,307 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1173.40454 ± 497.573
2026-01-25 18:57:01,307 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [823.7265, 1925.4803, 1633.7593, 873.6449, 740.6111, 784.68274, 916.6563, 873.72925, 2155.2947, 1006.45953]
2026-01-25 18:57:01,307 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [158.0, 375.0, 334.0, 175.0, 133.0, 149.0, 176.0, 163.0, 422.0, 198.0]
2026-01-25 18:57:01,317 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 71/100 (estimated time remaining: 50 minutes, 43 seconds)
2026-01-25 18:58:49,360 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:58:52,744 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1374.06360 ± 584.361
2026-01-25 18:58:52,744 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1527.0713, 2097.3948, 1158.8877, 1451.6102, 1236.1012, 1863.839, 2396.9917, 780.4765, 654.20715, 574.0564]
2026-01-25 18:58:52,745 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [287.0, 411.0, 235.0, 282.0, 248.0, 364.0, 476.0, 168.0, 131.0, 115.0]
2026-01-25 18:58:52,745 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (1374.06) for latency DatasetOffice
2026-01-25 18:58:52,753 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 72/100 (estimated time remaining: 50 minutes, 18 seconds)
2026-01-25 19:00:33,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:00:37,183 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1402.21069 ± 510.128
2026-01-25 19:00:37,183 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1030.142, 1204.7102, 2588.4316, 2063.926, 990.2912, 1011.5289, 1084.5813, 1092.5249, 1634.9691, 1321.0018]
2026-01-25 19:00:37,183 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [193.0, 243.0, 555.0, 392.0, 184.0, 190.0, 216.0, 221.0, 300.0, 253.0]
2026-01-25 19:00:37,184 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (1402.21) for latency DatasetOffice
2026-01-25 19:00:37,190 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 73/100 (estimated time remaining: 49 minutes, 4 seconds)
2026-01-25 19:02:13,110 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:02:15,169 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 869.71649 ± 583.781
2026-01-25 19:02:15,169 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [197.87305, 162.34325, 241.32278, 371.38257, 1079.8623, 1024.4064, 990.3107, 1941.2925, 1087.3925, 1600.9781]
2026-01-25 19:02:15,169 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [38.0, 34.0, 51.0, 77.0, 204.0, 193.0, 187.0, 382.0, 211.0, 299.0]
2026-01-25 19:02:15,178 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 74/100 (estimated time remaining: 47 minutes, 12 seconds)
2026-01-25 19:03:59,174 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:04:01,713 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1051.62769 ± 394.439
2026-01-25 19:04:01,713 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1117.1719, 1020.5194, 953.19366, 2031.8921, 728.1144, 873.47424, 1429.7054, 567.4534, 993.7896, 800.9609]
2026-01-25 19:04:01,713 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [233.0, 197.0, 192.0, 399.0, 137.0, 172.0, 280.0, 117.0, 196.0, 154.0]
2026-01-25 19:04:01,723 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 75/100 (estimated time remaining: 45 minutes, 52 seconds)
2026-01-25 19:05:31,827 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:05:35,227 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1410.41089 ± 446.632
2026-01-25 19:05:35,227 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1620.3395, 664.12885, 1471.441, 1291.361, 1042.585, 1898.5679, 1042.6305, 2324.543, 1254.276, 1494.2362]
2026-01-25 19:05:35,227 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [313.0, 139.0, 290.0, 261.0, 196.0, 377.0, 205.0, 468.0, 235.0, 284.0]
2026-01-25 19:05:35,227 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (1410.41) for latency DatasetOffice
2026-01-25 19:05:35,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 76/100 (estimated time remaining: 42 minutes, 49 seconds)
2026-01-25 19:07:12,169 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:07:15,283 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1307.82495 ± 422.985
2026-01-25 19:07:15,284 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1607.2524, 2201.8699, 1347.6416, 855.12604, 1157.5342, 1003.5693, 943.3265, 1824.0337, 1249.3029, 888.5936]
2026-01-25 19:07:15,284 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [309.0, 422.0, 250.0, 163.0, 220.0, 193.0, 176.0, 349.0, 258.0, 176.0]
2026-01-25 19:07:15,292 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 77/100 (estimated time remaining: 40 minutes, 12 seconds)
2026-01-25 19:08:51,383 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:08:54,754 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1436.60059 ± 462.692
2026-01-25 19:08:54,754 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [707.1906, 987.33154, 1575.482, 1533.1326, 975.9512, 1873.8304, 1536.5789, 1363.161, 2415.4238, 1397.9238]
2026-01-25 19:08:54,754 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [133.0, 187.0, 296.0, 284.0, 186.0, 352.0, 293.0, 258.0, 460.0, 261.0]
2026-01-25 19:08:54,754 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (1436.60) for latency DatasetOffice
2026-01-25 19:08:54,761 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 78/100 (estimated time remaining: 38 minutes, 8 seconds)
2026-01-25 19:10:32,492 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:10:35,245 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1180.27930 ± 316.970
2026-01-25 19:10:35,245 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1216.2089, 1258.746, 1327.3806, 1207.3878, 1119.0625, 846.6548, 1972.039, 1149.6268, 759.1038, 946.5835]
2026-01-25 19:10:35,245 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [232.0, 240.0, 257.0, 219.0, 223.0, 172.0, 371.0, 215.0, 142.0, 171.0]
2026-01-25 19:10:35,255 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 79/100 (estimated time remaining: 36 minutes, 40 seconds)
2026-01-25 19:12:10,627 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:12:13,939 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1407.96216 ± 711.668
2026-01-25 19:12:13,940 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1231.2306, 912.0149, 1166.1552, 945.0019, 630.21747, 801.4753, 3029.5334, 1222.8894, 2060.4177, 2080.686]
2026-01-25 19:12:13,940 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [231.0, 177.0, 215.0, 181.0, 113.0, 150.0, 604.0, 237.0, 393.0, 389.0]
2026-01-25 19:12:13,949 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 80/100 (estimated time remaining: 34 minutes, 27 seconds)
2026-01-25 19:13:51,847 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:13:54,709 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1227.58838 ± 164.961
2026-01-25 19:13:54,709 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1512.4547, 1326.2692, 930.0141, 1157.0464, 1304.5181, 1286.7627, 1394.741, 1225.6858, 1060.0032, 1078.3889]
2026-01-25 19:13:54,709 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [296.0, 242.0, 170.0, 228.0, 244.0, 238.0, 258.0, 234.0, 200.0, 195.0]
2026-01-25 19:13:54,718 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 81/100 (estimated time remaining: 33 minutes, 17 seconds)
2026-01-25 19:15:30,943 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:15:33,748 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1124.91968 ± 996.107
2026-01-25 19:15:33,748 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [3393.2556, 1072.4159, 1141.4718, 277.8135, 250.95786, 378.59833, 331.70966, 488.71478, 2354.8901, 1559.3695]
2026-01-25 19:15:33,748 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [667.0, 207.0, 223.0, 56.0, 51.0, 75.0, 73.0, 94.0, 463.0, 318.0]
2026-01-25 19:15:33,758 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 82/100 (estimated time remaining: 31 minutes, 34 seconds)
2026-01-25 19:17:10,386 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:17:14,994 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1837.38281 ± 1514.899
2026-01-25 19:17:14,994 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [824.323, 795.6549, 1108.487, 851.6633, 2483.4846, 5081.9854, 1297.6111, 4277.747, 1049.0614, 603.81055]
2026-01-25 19:17:14,994 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [153.0, 146.0, 211.0, 158.0, 486.0, 1000.0, 267.0, 857.0, 205.0, 131.0]
2026-01-25 19:17:14,994 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (1837.38) for latency DatasetOffice
2026-01-25 19:17:15,004 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 83/100 (estimated time remaining: 30 minutes)
2026-01-25 19:18:51,589 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:18:56,379 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1942.89612 ± 854.719
2026-01-25 19:18:56,379 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [3230.7344, 1842.5082, 1374.9921, 2254.0603, 3484.6116, 2550.8125, 1065.8851, 1082.5277, 1098.3503, 1444.4789]
2026-01-25 19:18:56,379 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [633.0, 348.0, 255.0, 436.0, 694.0, 488.0, 207.0, 215.0, 216.0, 295.0]
2026-01-25 19:18:56,379 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (1942.90) for latency DatasetOffice
2026-01-25 19:18:56,388 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 84/100 (estimated time remaining: 28 minutes, 23 seconds)
2026-01-25 19:20:33,796 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:20:37,692 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1577.92517 ± 547.295
2026-01-25 19:20:37,692 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1791.9841, 1568.2859, 1288.2085, 2251.6753, 1572.4066, 932.11847, 2684.2808, 1724.9316, 1124.4022, 840.9595]
2026-01-25 19:20:37,692 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [368.0, 306.0, 243.0, 429.0, 304.0, 187.0, 531.0, 337.0, 211.0, 161.0]
2026-01-25 19:20:37,699 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 85/100 (estimated time remaining: 26 minutes, 52 seconds)
2026-01-25 19:22:15,544 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:22:18,059 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1049.41479 ± 586.646
2026-01-25 19:22:18,060 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1031.2759, 804.0247, 943.7717, 785.9844, 1388.3529, 2434.0774, 873.36584, 1534.886, 246.73672, 451.6728]
2026-01-25 19:22:18,060 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [214.0, 166.0, 184.0, 150.0, 261.0, 468.0, 164.0, 306.0, 47.0, 89.0]
2026-01-25 19:22:18,068 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 86/100 (estimated time remaining: 25 minutes, 10 seconds)
2026-01-25 19:23:53,666 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:23:57,146 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1474.31323 ± 506.806
2026-01-25 19:23:57,146 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1783.6301, 1542.4885, 1614.6715, 1107.0638, 1155.8724, 1695.6833, 1214.1569, 1490.6382, 2588.5725, 550.35516]
2026-01-25 19:23:57,146 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [328.0, 295.0, 301.0, 209.0, 214.0, 322.0, 231.0, 281.0, 513.0, 110.0]
2026-01-25 19:23:57,156 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 87/100 (estimated time remaining: 23 minutes, 29 seconds)
2026-01-25 19:25:36,000 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:25:40,084 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1673.43286 ± 614.305
2026-01-25 19:25:40,085 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1004.38367, 1131.1234, 1049.5663, 1358.7181, 1003.5707, 2083.8298, 2671.0178, 2265.3723, 2433.3137, 1733.4329]
2026-01-25 19:25:40,085 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [198.0, 221.0, 205.0, 262.0, 197.0, 413.0, 517.0, 434.0, 460.0, 322.0]
2026-01-25 19:25:40,093 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 88/100 (estimated time remaining: 21 minutes, 53 seconds)
2026-01-25 19:27:15,918 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:27:20,142 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1732.52209 ± 977.160
2026-01-25 19:27:20,142 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [4049.2148, 1345.1434, 2612.7356, 1303.102, 785.2963, 1637.6345, 2293.357, 796.11926, 1756.9523, 745.6638]
2026-01-25 19:27:20,142 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [779.0, 254.0, 490.0, 255.0, 142.0, 312.0, 436.0, 147.0, 347.0, 156.0]
2026-01-25 19:27:20,153 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 89/100 (estimated time remaining: 20 minutes, 9 seconds)
2026-01-25 19:28:55,306 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:28:59,185 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1600.34082 ± 754.733
2026-01-25 19:28:59,185 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1462.7675, 1291.3492, 1787.5204, 1270.8962, 695.7296, 1381.1302, 2535.5054, 1225.8457, 3379.1545, 973.51013]
2026-01-25 19:28:59,185 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [299.0, 252.0, 337.0, 237.0, 129.0, 264.0, 504.0, 230.0, 649.0, 178.0]
2026-01-25 19:28:59,194 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 90/100 (estimated time remaining: 18 minutes, 23 seconds)
2026-01-25 19:30:38,275 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:30:42,503 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1765.39624 ± 763.521
2026-01-25 19:30:42,503 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [865.5598, 841.94934, 1118.3896, 1894.5844, 2438.7346, 1804.6034, 1405.935, 3041.483, 2916.9521, 1325.7719]
2026-01-25 19:30:42,503 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [194.0, 157.0, 218.0, 345.0, 466.0, 346.0, 273.0, 590.0, 562.0, 249.0]
2026-01-25 19:30:42,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 91/100 (estimated time remaining: 16 minutes, 48 seconds)
2026-01-25 19:32:19,863 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:32:23,207 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1406.57666 ± 537.149
2026-01-25 19:32:23,207 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1227.611, 1043.7542, 1350.1244, 1485.0853, 1915.59, 1477.6458, 381.81546, 895.8704, 2300.951, 1987.318]
2026-01-25 19:32:23,207 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [230.0, 193.0, 263.0, 277.0, 385.0, 278.0, 81.0, 173.0, 439.0, 375.0]
2026-01-25 19:32:23,220 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 92/100 (estimated time remaining: 15 minutes, 10 seconds)
2026-01-25 19:33:59,345 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:34:03,312 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1605.46704 ± 818.853
2026-01-25 19:34:03,312 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [2519.7007, 1758.6769, 365.55157, 2153.4785, 551.24445, 1426.98, 596.69073, 2421.514, 2691.778, 1569.0558]
2026-01-25 19:34:03,312 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [474.0, 348.0, 69.0, 432.0, 104.0, 275.0, 115.0, 476.0, 530.0, 301.0]
2026-01-25 19:34:03,327 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 25 seconds)
2026-01-25 19:35:40,780 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:35:45,005 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1669.68091 ± 618.882
2026-01-25 19:35:45,006 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [2284.3127, 1706.4319, 2328.0208, 661.1129, 959.76117, 1671.3566, 923.7997, 2450.6616, 2216.892, 1494.459]
2026-01-25 19:35:45,006 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [486.0, 331.0, 464.0, 123.0, 180.0, 333.0, 177.0, 492.0, 464.0, 285.0]
2026-01-25 19:35:45,016 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 46 seconds)
2026-01-25 19:37:24,153 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:37:29,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 2150.52881 ± 1644.661
2026-01-25 19:37:29,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [340.38422, 859.8597, 5044.432, 5022.9316, 2110.7266, 531.9463, 1211.3817, 2066.3906, 1165.9764, 3151.2603]
2026-01-25 19:37:29,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [65.0, 172.0, 1000.0, 1000.0, 406.0, 114.0, 229.0, 397.0, 228.0, 601.0]
2026-01-25 19:37:29,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1274 [INFO]: New best (2150.53) for latency DatasetOffice
2026-01-25 19:37:29,699 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 12 seconds)
2026-01-25 19:39:08,247 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:39:13,351 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 2003.88940 ± 1222.550
2026-01-25 19:39:13,352 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1544.8541, 1543.7935, 2496.6458, 956.9974, 1321.0187, 1771.512, 5156.1367, 850.8579, 1385.2562, 3011.8228]
2026-01-25 19:39:13,352 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [304.0, 287.0, 483.0, 181.0, 266.0, 371.0, 1000.0, 172.0, 267.0, 613.0]
2026-01-25 19:39:13,362 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 30 seconds)
2026-01-25 19:40:49,573 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:40:54,187 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1896.16235 ± 1378.274
2026-01-25 19:40:54,187 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [714.788, 4194.979, 588.4316, 1396.9948, 3907.2239, 848.5138, 3273.5215, 2436.616, 1358.7458, 241.80788]
2026-01-25 19:40:54,187 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [136.0, 834.0, 109.0, 284.0, 738.0, 158.0, 637.0, 470.0, 276.0, 49.0]
2026-01-25 19:40:54,195 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 48 seconds)
2026-01-25 19:42:32,463 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:42:36,321 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1637.67859 ± 707.463
2026-01-25 19:42:36,321 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [2228.9148, 1831.1975, 967.4576, 1015.6169, 1029.7903, 1129.5676, 2473.711, 3077.999, 1021.7803, 1600.7518]
2026-01-25 19:42:36,321 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [423.0, 356.0, 192.0, 193.0, 196.0, 227.0, 477.0, 593.0, 207.0, 296.0]
2026-01-25 19:42:36,329 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 7 seconds)
2026-01-25 19:44:11,759 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:44:15,429 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1576.38074 ± 506.541
2026-01-25 19:44:15,429 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [2180.7803, 1556.3204, 2257.543, 1083.598, 1150.2443, 1151.4515, 2111.8667, 706.44055, 1817.5784, 1747.9854]
2026-01-25 19:44:15,429 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [410.0, 295.0, 427.0, 211.0, 225.0, 218.0, 410.0, 142.0, 342.0, 340.0]
2026-01-25 19:44:15,439 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 24 seconds)
2026-01-25 19:45:53,125 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:45:58,083 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1993.02612 ± 1106.103
2026-01-25 19:45:58,083 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [1662.6259, 1462.9811, 1916.8925, 3143.5686, 1600.3165, 1455.2042, 1329.3268, 657.9174, 1908.1711, 4793.2593]
2026-01-25 19:45:58,083 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [325.0, 283.0, 362.0, 623.0, 320.0, 270.0, 248.0, 133.0, 377.0, 969.0]
2026-01-25 19:45:58,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1247 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 41 seconds)
2026-01-25 19:47:32,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:47:37,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1269 [DEBUG]: Total Reward: 1820.56934 ± 1242.455
2026-01-25 19:47:37,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1270 [DEBUG]: All rewards: [2251.347, 2205.639, 1509.238, 3464.7915, 1733.4177, 2182.2214, 4039.3137, 135.28354, 219.19273, 465.2477]
2026-01-25 19:47:37,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [423.0, 427.0, 283.0, 663.0, 330.0, 420.0, 845.0, 28.0, 41.0, 95.0]
2026-01-25 19:47:37,104 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1299 [DEBUG]: Training session finished
