2026-01-23 01:11:39,959 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-walker2d/DatasetOffice-sac-aug-mem1  
2026-01-23 01:11:39,959 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-walker2d/DatasetOffice-sac-aug-mem1  
2026-01-23 01:11:39,959 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x150363a7ea50>}
2026-01-23 01:11:39,959 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1159 [DEBUG]: using device: cuda
2026-01-23 01:11:39,959 baseline-sac-noisy-walker2d:77 [WARNING]: args.memorize_actions != args.horizon: 1 != 32
2026-01-23 01:11:40,101 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1181 [INFO]: Creating new trainer
2026-01-23 01:11:40,117 baseline-sac-noisy-walker2d:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=23, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2026-01-23 01:11:40,117 baseline-sac-noisy-walker2d:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=29, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-23 01:11:40,948 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1242 [DEBUG]: Starting training session...
2026-01-23 01:11:40,948 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 1/100
2026-01-23 01:13:00,013 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:13:00,448 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 8.73656 ± 4.504
2026-01-23 01:13:00,448 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [18.629473, 4.84256, 5.038408, 4.4613767, 6.322744, 11.845701, 5.8163195, 5.9361386, 13.107668, 11.365196]
2026-01-23 01:13:00,448 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [61.0, 49.0, 54.0, 48.0, 56.0, 54.0, 57.0, 52.0, 57.0, 51.0]
2026-01-23 01:13:00,448 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (8.74) for latency DatasetOffice
2026-01-23 01:13:00,452 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 11 minutes, 10 seconds)
2026-01-23 01:14:28,371 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:14:29,654 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 63.21041 ± 33.346
2026-01-23 01:14:29,654 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [72.949455, 79.75758, 34.173534, 68.51821, 68.00239, 142.5629, 73.15601, 26.125267, 45.913757, 20.945]
2026-01-23 01:14:29,654 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [215.0, 268.0, 41.0, 92.0, 188.0, 262.0, 99.0, 151.0, 158.0, 74.0]
2026-01-23 01:14:29,654 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (63.21) for latency DatasetOffice
2026-01-23 01:14:29,656 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 17 minutes, 46 seconds)
2026-01-23 01:16:01,429 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:16:03,794 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 241.53503 ± 185.983
2026-01-23 01:16:03,794 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [225.03064, 18.172483, 228.78094, 602.5262, 499.0264, 3.1642518, 350.5587, 100.613945, 247.68654, 139.79039]
2026-01-23 01:16:03,794 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [329.0, 94.0, 363.0, 592.0, 289.0, 146.0, 245.0, 81.0, 482.0, 157.0]
2026-01-23 01:16:03,794 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (241.54) for latency DatasetOffice
2026-01-23 01:16:03,801 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 21 minutes, 38 seconds)
2026-01-23 01:17:26,107 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:17:27,944 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 229.15288 ± 199.109
2026-01-23 01:17:27,944 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [-2.682702, 37.281788, 264.1636, 405.1879, 24.991747, 43.77269, 213.62593, 233.40982, 468.1106, 603.66754]
2026-01-23 01:17:27,945 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [7.0, 146.0, 318.0, 294.0, 29.0, 145.0, 222.0, 195.0, 329.0, 511.0]
2026-01-23 01:17:27,950 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 18 minutes, 48 seconds)
2026-01-23 01:18:56,050 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:18:58,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 423.39923 ± 159.582
2026-01-23 01:18:58,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [425.5847, 176.23718, 491.99045, 327.9308, 444.7517, 356.97183, 826.5511, 336.57565, 374.64795, 472.75085]
2026-01-23 01:18:58,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [306.0, 142.0, 280.0, 173.0, 289.0, 206.0, 643.0, 196.0, 188.0, 443.0]
2026-01-23 01:18:58,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (423.40) for latency DatasetOffice
2026-01-23 01:18:58,517 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 18 minutes, 33 seconds)
2026-01-23 01:20:26,487 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:20:28,598 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 317.11139 ± 183.601
2026-01-23 01:20:28,598 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [254.21738, 346.85867, 665.4953, 59.711613, 174.50317, 396.7016, 483.3129, 285.36407, 453.68277, 51.266075]
2026-01-23 01:20:28,598 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [339.0, 252.0, 449.0, 155.0, 116.0, 253.0, 301.0, 175.0, 287.0, 209.0]
2026-01-23 01:20:28,603 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 20 minutes, 25 seconds)
2026-01-23 01:21:52,956 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:21:53,256 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 18.90844 ± 24.128
2026-01-23 01:21:53,256 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [85.3946, -1.079023, 19.193357, 19.778662, 15.635776, 18.148981, 2.993328, 1.6699275, 27.532965, -0.18418929]
2026-01-23 01:21:53,256 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [146.0, 11.0, 32.0, 30.0, 23.0, 28.0, 19.0, 27.0, 32.0, 14.0]
2026-01-23 01:21:53,260 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 17 minutes, 31 seconds)
2026-01-23 01:23:20,517 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:23:21,959 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 338.72534 ± 172.026
2026-01-23 01:23:21,959 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [429.46872, 470.64847, 464.97034, 348.6749, 306.38354, 30.267323, 614.4311, 357.25497, 54.238976, 310.91498]
2026-01-23 01:23:21,959 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [182.0, 246.0, 253.0, 190.0, 150.0, 40.0, 293.0, 177.0, 55.0, 160.0]
2026-01-23 01:23:21,962 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 14 minutes, 22 seconds)
2026-01-23 01:24:52,396 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:24:53,669 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 315.78220 ± 160.065
2026-01-23 01:24:53,669 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [270.04532, 536.35645, 441.15088, 313.99728, 319.13208, 412.81656, 27.079857, 458.94, 338.14908, 40.154274]
2026-01-23 01:24:53,669 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [143.0, 212.0, 167.0, 146.0, 150.0, 159.0, 40.0, 265.0, 206.0, 52.0]
2026-01-23 01:24:53,676 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 15 minutes, 12 seconds)
2026-01-23 01:26:18,670 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:26:20,647 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 499.86969 ± 196.744
2026-01-23 01:26:20,647 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [552.95166, 396.65576, 1015.9171, 288.78415, 389.83517, 376.58618, 566.52563, 607.9919, 406.35315, 397.096]
2026-01-23 01:26:20,647 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [337.0, 193.0, 497.0, 144.0, 175.0, 167.0, 235.0, 264.0, 203.0, 164.0]
2026-01-23 01:26:20,647 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (499.87) for latency DatasetOffice
2026-01-23 01:26:20,650 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 12 minutes, 38 seconds)
2026-01-23 01:27:47,389 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:27:49,053 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 366.19415 ± 240.879
2026-01-23 01:27:49,053 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [465.6875, 526.9001, 642.98346, 136.6288, 7.882475, 50.921597, 196.00229, 714.23486, 559.28723, 361.41333]
2026-01-23 01:27:49,053 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [231.0, 280.0, 303.0, 176.0, 28.0, 94.0, 101.0, 307.0, 291.0, 182.0]
2026-01-23 01:27:49,058 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 10 minutes, 40 seconds)
2026-01-23 01:29:17,549 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:29:21,011 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 577.84479 ± 478.830
2026-01-23 01:29:21,011 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [729.8999, 1662.8196, 348.64514, 795.35504, 972.1912, 277.08127, 102.19235, 730.77954, 100.11799, 59.3656]
2026-01-23 01:29:21,011 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [512.0, 1000.0, 309.0, 476.0, 559.0, 383.0, 83.0, 436.0, 121.0, 80.0]
2026-01-23 01:29:21,011 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (577.84) for latency DatasetOffice
2026-01-23 01:29:21,015 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 11 minutes, 20 seconds)
2026-01-23 01:30:46,226 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:30:47,718 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 384.89862 ± 179.301
2026-01-23 01:30:47,718 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [352.42532, 393.2326, 298.43686, 331.33423, 798.5094, 464.24414, 29.839945, 380.17325, 345.25262, 455.5378]
2026-01-23 01:30:47,718 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [170.0, 181.0, 136.0, 185.0, 357.0, 219.0, 43.0, 177.0, 159.0, 177.0]
2026-01-23 01:30:47,722 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 9 minutes, 16 seconds)
2026-01-23 01:32:15,323 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:32:17,493 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 555.21429 ± 228.952
2026-01-23 01:32:17,493 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [426.58466, 1089.5647, 682.4979, 414.29016, 420.10703, 551.51373, 802.16516, 446.30664, 455.4137, 263.69916]
2026-01-23 01:32:17,493 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [219.0, 491.0, 301.0, 222.0, 204.0, 288.0, 290.0, 241.0, 202.0, 141.0]
2026-01-23 01:32:17,497 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 7 minutes, 13 seconds)
2026-01-23 01:33:53,452 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:33:56,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 609.06342 ± 665.635
2026-01-23 01:33:56,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [1479.5059, 1492.3596, 1685.4625, 554.3951, 756.52155, 7.243013, 14.705112, 25.318892, 39.35908, 35.763447]
2026-01-23 01:33:56,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [794.0, 840.0, 749.0, 300.0, 419.0, 21.0, 25.0, 50.0, 60.0, 64.0]
2026-01-23 01:33:56,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (609.06) for latency DatasetOffice
2026-01-23 01:33:56,352 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 9 minutes, 6 seconds)
2026-01-23 01:35:18,701 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:35:25,373 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1788.79370 ± 707.284
2026-01-23 01:35:25,373 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2088.7158, 1306.7369, 1604.4185, 2302.2563, 358.77423, 2307.0112, 2330.0022, 787.4326, 2293.8513, 2508.7383]
2026-01-23 01:35:25,373 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [815.0, 595.0, 596.0, 859.0, 195.0, 1000.0, 1000.0, 344.0, 1000.0, 1000.0]
2026-01-23 01:35:25,373 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (1788.79) for latency DatasetOffice
2026-01-23 01:35:25,376 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 7 minutes, 46 seconds)
2026-01-23 01:36:52,258 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:36:59,814 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2210.54053 ± 824.090
2026-01-23 01:36:59,814 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [1832.3011, 2629.357, 2715.9285, 2579.62, 2603.2166, 1648.9014, 2804.2312, 1.3903811, 2664.9856, 2625.4714]
2026-01-23 01:36:59,814 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [692.0, 1000.0, 1000.0, 1000.0, 1000.0, 614.0, 1000.0, 19.0, 1000.0, 1000.0]
2026-01-23 01:36:59,814 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (2210.54) for latency DatasetOffice
2026-01-23 01:36:59,818 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 6 minutes, 56 seconds)
2026-01-23 01:38:26,931 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:38:31,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1207.25623 ± 1167.142
2026-01-23 01:38:31,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2646.4207, 1536.237, 72.66662, 9.32041, 215.68236, 90.577065, 5.5481944, 2594.517, 2330.676, 2570.917]
2026-01-23 01:38:31,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 597.0, 91.0, 34.0, 95.0, 157.0, 15.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:38:31,519 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 6 minutes, 46 seconds)
2026-01-23 01:40:03,951 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:40:12,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2258.85889 ± 589.765
2026-01-23 01:40:12,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2446.8354, 2482.9148, 2113.6013, 525.30164, 2520.2612, 2472.7183, 2524.2695, 2555.5286, 2502.231, 2444.9282]
2026-01-23 01:40:12,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 849.0, 190.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:40:12,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (2258.86) for latency DatasetOffice
2026-01-23 01:40:12,240 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 8 minutes, 10 seconds)
2026-01-23 01:41:40,271 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:41:48,001 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2081.54907 ± 825.856
2026-01-23 01:41:48,001 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2530.2012, 2487.4858, 2380.0186, 969.28375, 10.112144, 2461.4883, 2505.0488, 2553.3152, 2406.2546, 2512.2808]
2026-01-23 01:41:48,001 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 417.0, 25.0, 1000.0, 1000.0, 968.0, 1000.0, 1000.0]
2026-01-23 01:41:48,005 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 5 minutes, 46 seconds)
2026-01-23 01:43:11,452 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:43:14,049 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 632.57117 ± 987.743
2026-01-23 01:43:14,049 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2455.913, 2406.3997, 1390.3468, -5.264487, 33.59945, -7.111756, 28.284702, 20.976242, -3.6673481, 6.23651]
2026-01-23 01:43:14,049 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 643.0, 20.0, 42.0, 12.0, 40.0, 35.0, 20.0, 23.0]
2026-01-23 01:43:14,054 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 3 minutes, 25 seconds)
2026-01-23 01:44:45,438 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:44:54,765 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2490.36328 ± 59.146
2026-01-23 01:44:54,765 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2461.5706, 2562.0579, 2450.2537, 2443.7212, 2476.314, 2489.2705, 2527.248, 2397.3289, 2483.1821, 2612.6858]
2026-01-23 01:44:54,765 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:44:54,765 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (2490.36) for latency DatasetOffice
2026-01-23 01:44:54,771 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 3 minutes, 29 seconds)
2026-01-23 01:46:21,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:46:28,786 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1913.32385 ± 997.661
2026-01-23 01:46:28,786 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2604.5562, 2473.3118, 1475.6827, 30.768791, 4.02293, 2492.361, 2549.797, 2489.9478, 2554.1948, 2458.5935]
2026-01-23 01:46:28,786 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 684.0, 37.0, 15.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:46:28,791 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 2 minutes, 29 seconds)
2026-01-23 01:47:54,840 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:47:58,911 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1079.07581 ± 1122.637
2026-01-23 01:47:58,911 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2626.26, 63.103653, 178.37599, 2559.4224, 1955.7743, 474.65396, 2576.6897, 56.226936, 37.00843, 263.24258]
2026-01-23 01:47:58,911 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 166.0, 152.0, 1000.0, 688.0, 256.0, 1000.0, 112.0, 46.0, 154.0]
2026-01-23 01:47:58,916 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 25/100 (estimated time remaining: 1 hour, 58 minutes, 13 seconds)
2026-01-23 01:49:34,181 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:49:41,835 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2200.05322 ± 1000.925
2026-01-23 01:49:41,835 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2714.1294, 2618.0789, 2759.556, 271.9655, 130.70221, 2697.0027, 2734.2185, 2720.7495, 2741.1667, 2612.9644]
2026-01-23 01:49:41,835 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 176.0, 156.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:49:41,840 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 26/100 (estimated time remaining: 1 hour, 58 minutes, 27 seconds)
2026-01-23 01:51:10,036 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:51:16,954 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1974.35022 ± 844.123
2026-01-23 01:51:16,954 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2655.0527, 908.38776, 2663.5872, 2661.568, 974.1817, 2623.8782, 2673.8433, 2632.5022, 630.7104, 1319.7905]
2026-01-23 01:51:16,954 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 394.0, 1000.0, 1000.0, 412.0, 1000.0, 1000.0, 1000.0, 312.0, 524.0]
2026-01-23 01:51:16,959 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 27/100 (estimated time remaining: 1 hour, 59 minutes, 7 seconds)
2026-01-23 01:52:38,875 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:52:42,587 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1098.30090 ± 1241.354
2026-01-23 01:52:42,587 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2823.8203, 1280.9716, 26.509897, 56.073166, 91.31573, 33.494164, 19.193886, 3007.4448, 2861.068, 783.11743]
2026-01-23 01:52:42,588 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 526.0, 43.0, 66.0, 94.0, 46.0, 29.0, 1000.0, 1000.0, 311.0]
2026-01-23 01:52:42,592 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 28/100 (estimated time remaining: 1 hour, 53 minutes, 50 seconds)
2026-01-23 01:54:09,447 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:54:16,522 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2012.19080 ± 1054.268
2026-01-23 01:54:16,523 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2679.9607, 21.513456, 2824.8745, 1276.9219, 2736.8145, 2714.2593, 2685.286, 2392.4375, 144.44821, 2645.394]
2026-01-23 01:54:16,523 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 33.0, 1000.0, 494.0, 1000.0, 1000.0, 1000.0, 1000.0, 172.0, 1000.0]
2026-01-23 01:54:16,529 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 29/100 (estimated time remaining: 1 hour, 52 minutes, 15 seconds)
2026-01-23 01:55:51,592 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:55:58,325 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2097.40381 ± 1207.173
2026-01-23 01:55:58,325 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2916.877, 3005.0322, 228.59456, 318.52737, 2894.6138, 2753.806, 2869.8315, 220.81114, 2911.2756, 2854.67]
2026-01-23 01:55:58,325 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 133.0, 146.0, 1000.0, 1000.0, 1000.0, 111.0, 1000.0, 1000.0]
2026-01-23 01:55:58,333 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 30/100 (estimated time remaining: 1 hour, 53 minutes, 27 seconds)
2026-01-23 01:57:23,911 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:57:29,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1605.69995 ± 1151.251
2026-01-23 01:57:29,115 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2971.623, 2919.1492, 2101.0386, 2993.348, 2106.8584, 5.324827, 1681.0912, 989.05676, 256.45767, 33.050335]
2026-01-23 01:57:29,115 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 747.0, 1000.0, 698.0, 42.0, 653.0, 381.0, 157.0, 45.0]
2026-01-23 01:57:29,122 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 49 minutes, 1 second)
2026-01-23 01:58:53,693 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:59:02,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2664.36572 ± 33.330
2026-01-23 01:59:02,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2669.5142, 2703.44, 2644.8347, 2620.5015, 2608.8462, 2666.8118, 2645.852, 2703.6414, 2711.9128, 2668.3037]
2026-01-23 01:59:02,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:59:02,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (2664.37) for latency DatasetOffice
2026-01-23 01:59:02,928 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 47 minutes, 10 seconds)
2026-01-23 02:00:38,108 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:00:45,487 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2293.62158 ± 942.137
2026-01-23 02:00:45,487 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2813.4229, 2877.5046, 2703.217, 2638.637, 2347.488, 980.14716, 2902.2664, 2950.4573, -0.5834526, 2723.6602]
2026-01-23 02:00:45,487 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 860.0, 382.0, 977.0, 1000.0, 11.0, 1000.0]
2026-01-23 02:00:45,492 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 49 minutes, 27 seconds)
2026-01-23 02:02:19,752 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:02:28,018 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2462.67969 ± 657.147
2026-01-23 02:02:28,018 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2641.9138, 496.72775, 2581.4077, 2682.5535, 2657.6692, 2699.5535, 2729.1853, 2685.5159, 2777.427, 2674.8425]
2026-01-23 02:02:28,018 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 246.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:02:28,024 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 49 minutes, 46 seconds)
2026-01-23 02:03:54,122 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:04:02,132 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2577.02539 ± 500.297
2026-01-23 02:04:02,133 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2201.6982, 2781.5576, 2951.2021, 2823.5125, 2797.764, 2860.7058, 2465.6965, 2813.9692, 2855.2036, 1218.9457]
2026-01-23 02:04:02,133 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [736.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 833.0, 1000.0, 1000.0, 441.0]
2026-01-23 02:04:02,138 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 46 minutes, 26 seconds)
2026-01-23 02:05:27,510 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:05:33,561 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1910.59692 ± 1241.655
2026-01-23 02:05:33,561 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2938.3489, 2929.871, 2934.2725, 2920.1104, 68.36847, 408.04544, 2800.0723, 2926.6624, 207.71478, 972.5048]
2026-01-23 02:05:33,561 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 54.0, 196.0, 1000.0, 1000.0, 105.0, 369.0]
2026-01-23 02:05:33,570 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 44 minutes, 57 seconds)
2026-01-23 02:07:00,917 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:07:09,075 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2780.60229 ± 193.836
2026-01-23 02:07:09,075 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2488.247, 2927.8494, 2891.5889, 2887.5715, 2955.7258, 2484.6096, 2485.4636, 2887.4922, 2905.27, 2892.208]
2026-01-23 02:07:09,075 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [854.0, 1000.0, 1000.0, 1000.0, 1000.0, 874.0, 841.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:07:09,075 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (2780.60) for latency DatasetOffice
2026-01-23 02:07:09,082 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 43 minutes, 42 seconds)
2026-01-23 02:08:34,527 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:08:40,412 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2120.97070 ± 816.501
2026-01-23 02:08:40,413 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [1222.9797, 1069.6946, 1135.7991, 3159.545, 2938.1182, 2954.4302, 2290.4155, 3076.924, 1598.1074, 1763.6924]
2026-01-23 02:08:40,413 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [386.0, 345.0, 357.0, 1000.0, 996.0, 932.0, 838.0, 1000.0, 562.0, 595.0]
2026-01-23 02:08:40,420 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 39 minutes, 44 seconds)
2026-01-23 02:10:02,685 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:10:08,263 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1943.74438 ± 1319.590
2026-01-23 02:10:08,264 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2990.757, 387.39178, 3027.7773, 314.40387, 2989.4006, 3092.9294, 228.26532, 3022.6294, 2999.0664, 384.82266]
2026-01-23 02:10:08,264 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 185.0, 1000.0, 152.0, 1000.0, 1000.0, 107.0, 1000.0, 1000.0, 192.0]
2026-01-23 02:10:08,271 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 35 minutes, 7 seconds)
2026-01-23 02:11:31,328 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:11:37,425 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2082.49658 ± 1079.507
2026-01-23 02:11:37,425 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [365.27966, 2988.2769, 1209.6942, 2544.8293, 2970.9658, 3026.4792, 1613.0383, 2971.4583, 209.73015, 2925.214]
2026-01-23 02:11:37,426 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [161.0, 1000.0, 441.0, 880.0, 1000.0, 1000.0, 595.0, 1000.0, 112.0, 1000.0]
2026-01-23 02:11:37,432 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 32 minutes, 34 seconds)
2026-01-23 02:13:04,086 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:13:08,031 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1323.57092 ± 1262.332
2026-01-23 02:13:08,031 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [217.52747, 191.27962, 764.02594, 246.0649, 173.39996, 2818.6077, 2823.6792, 2906.3945, 2878.0493, 216.6819]
2026-01-23 02:13:08,031 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [107.0, 97.0, 245.0, 116.0, 94.0, 1000.0, 1000.0, 1000.0, 1000.0, 109.0]
2026-01-23 02:13:08,040 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 30 minutes, 53 seconds)
2026-01-23 02:14:27,433 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:14:31,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1440.56860 ± 1384.554
2026-01-23 02:14:31,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3031.6892, 2987.699, 3176.9678, 662.95483, 2994.5754, 1517.8112, -5.142863, -1.7725025, 26.312323, 14.592111]
2026-01-23 02:14:31,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 235.0, 1000.0, 524.0, 8.0, 10.0, 42.0, 22.0]
2026-01-23 02:14:31,519 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 27 minutes)
2026-01-23 02:15:51,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:15:58,781 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2306.72656 ± 1051.861
2026-01-23 02:15:58,781 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [214.42705, 2876.3503, 2918.7803, 2894.3276, 2849.173, 216.61243, 2853.0476, 2497.454, 2903.6086, 2843.4846]
2026-01-23 02:15:58,781 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [110.0, 1000.0, 1000.0, 1000.0, 1000.0, 117.0, 1000.0, 915.0, 1000.0, 1000.0]
2026-01-23 02:15:58,788 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 24 minutes, 45 seconds)
2026-01-23 02:17:21,866 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:17:30,377 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2823.53345 ± 227.892
2026-01-23 02:17:30,377 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3048.7114, 2836.5496, 2893.7744, 2793.2778, 3014.7065, 2797.6565, 2907.7083, 2847.2776, 2912.3496, 2183.322]
2026-01-23 02:17:30,378 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:17:30,378 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (2823.53) for latency DatasetOffice
2026-01-23 02:17:30,386 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 24 minutes)
2026-01-23 02:18:51,667 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:18:56,258 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1679.53638 ± 1042.083
2026-01-23 02:18:56,259 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2170.2517, 2173.1213, 3075.211, 2817.3223, 2731.537, 1238.7108, 1257.9495, 1288.8174, -1.5080762, 43.949314]
2026-01-23 02:18:56,259 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [720.0, 699.0, 1000.0, 884.0, 923.0, 450.0, 453.0, 473.0, 36.0, 67.0]
2026-01-23 02:18:56,265 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 21 minutes, 54 seconds)
2026-01-23 02:20:22,675 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:20:31,135 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2966.62451 ± 39.004
2026-01-23 02:20:31,135 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3043.2405, 2969.1562, 2886.1594, 2954.0374, 2942.6794, 2939.198, 2995.244, 2984.6697, 2981.4004, 2970.4587]
2026-01-23 02:20:31,135 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 961.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:20:31,135 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (2966.62) for latency DatasetOffice
2026-01-23 02:20:31,142 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 21 minutes, 14 seconds)
2026-01-23 02:21:53,744 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:21:59,258 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1774.18982 ± 1262.666
2026-01-23 02:21:59,258 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2929.3718, 2779.155, 313.58377, 603.2638, 28.545017, 29.734135, 2842.0742, 2860.1135, 2837.0144, 2519.0425]
2026-01-23 02:21:59,258 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 182.0, 364.0, 38.0, 63.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:21:59,266 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 20 minutes, 35 seconds)
2026-01-23 02:23:17,517 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:23:22,944 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1839.38965 ± 1202.422
2026-01-23 02:23:22,944 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [40.460228, 14.792223, 185.07776, 2811.4504, 2920.0872, 2942.489, 2528.4177, 2947.8918, 2189.1304, 1814.0991]
2026-01-23 02:23:22,944 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [52.0, 34.0, 123.0, 1000.0, 1000.0, 1000.0, 863.0, 1000.0, 765.0, 631.0]
2026-01-23 02:23:22,951 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 18 minutes, 28 seconds)
2026-01-23 02:24:45,236 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:24:52,060 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2472.52368 ± 779.498
2026-01-23 02:24:52,061 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [965.40686, 1342.9956, 3101.5566, 3068.844, 1665.9666, 3106.1663, 2698.6902, 3019.043, 3030.7583, 2725.8098]
2026-01-23 02:24:52,061 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [359.0, 476.0, 1000.0, 1000.0, 588.0, 1000.0, 938.0, 984.0, 1000.0, 893.0]
2026-01-23 02:24:52,070 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 16 minutes, 33 seconds)
2026-01-23 02:26:16,153 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:26:23,365 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2392.37817 ± 616.405
2026-01-23 02:26:23,365 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2780.6567, 1087.586, 2801.4812, 2812.315, 2146.3506, 2725.7952, 2709.5894, 2774.8232, 1357.2795, 2727.905]
2026-01-23 02:26:23,365 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 386.0, 1000.0, 1000.0, 758.0, 1000.0, 1000.0, 1000.0, 524.0, 1000.0]
2026-01-23 02:26:23,373 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 16 minutes)
2026-01-23 02:27:43,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:27:49,256 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1829.74670 ± 1430.478
2026-01-23 02:27:49,256 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [10.739183, 64.60588, 198.97488, 41.9693, 2978.373, 2965.6375, 3081.7507, 2994.82, 2963.5938, 2997.0012]
2026-01-23 02:27:49,256 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [29.0, 88.0, 119.0, 57.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:27:49,262 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 13 minutes, 1 second)
2026-01-23 02:29:17,168 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:29:25,263 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2845.31909 ± 270.878
2026-01-23 02:29:25,263 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2886.2341, 2905.5317, 3024.7224, 2997.298, 2999.4182, 2696.6018, 2906.9954, 3050.903, 2900.5525, 2084.9346]
2026-01-23 02:29:25,263 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 703.0]
2026-01-23 02:29:25,272 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 12 minutes, 50 seconds)
2026-01-23 02:30:42,634 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:30:50,148 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2513.33691 ± 773.822
2026-01-23 02:30:50,148 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2847.889, 2880.3142, 1930.6653, 2836.1343, 2879.4421, 2850.7156, 2853.1182, 2882.7954, 345.4745, 2826.8196]
2026-01-23 02:30:50,148 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 706.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 176.0, 1000.0]
2026-01-23 02:30:50,157 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 11 minutes, 33 seconds)
2026-01-23 02:32:11,282 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:32:19,409 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2880.00586 ± 484.219
2026-01-23 02:32:19,409 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2986.891, 2976.9053, 3022.8462, 3139.605, 3050.0588, 3013.9902, 3123.4116, 3006.4995, 1435.3923, 3044.4587]
2026-01-23 02:32:19,409 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 507.0, 1000.0]
2026-01-23 02:32:19,417 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 10 minutes, 5 seconds)
2026-01-23 02:33:44,091 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:33:51,897 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2947.08276 ± 620.140
2026-01-23 02:33:51,898 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3144.2468, 3138.6858, 3231.6904, 3142.1543, 3124.7605, 3192.7246, 3116.7239, 3217.76, 1091.8279, 3070.2534]
2026-01-23 02:33:51,898 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 396.0, 1000.0]
2026-01-23 02:33:51,908 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 8 minutes, 46 seconds)
2026-01-23 02:35:18,343 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:35:23,172 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1784.57739 ± 1464.698
2026-01-23 02:35:23,173 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3177.1853, 3142.61, 3179.113, 3152.4248, 3196.4475, 1742.2255, 7.8173103, 44.892876, 30.11134, 172.94562]
2026-01-23 02:35:23,173 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 584.0, 20.0, 53.0, 43.0, 119.0]
2026-01-23 02:35:23,187 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 8 minutes, 5 seconds)
2026-01-23 02:36:39,851 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:36:48,247 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3108.44653 ± 29.828
2026-01-23 02:36:48,248 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3100.2734, 3134.6108, 3131.1714, 3096.4792, 3044.8079, 3099.8445, 3134.2717, 3126.7788, 3142.4556, 3073.771]
2026-01-23 02:36:48,248 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:36:48,248 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (3108.45) for latency DatasetOffice
2026-01-23 02:36:48,261 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 4 minutes, 58 seconds)
2026-01-23 02:38:10,356 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:38:17,509 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2677.91138 ± 977.286
2026-01-23 02:38:17,509 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3155.6196, 3093.642, 3125.928, 1798.3207, 0.50618416, 3052.294, 3218.7136, 3171.9478, 3131.8643, 3030.2776]
2026-01-23 02:38:17,509 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 600.0, 20.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:38:17,518 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 4 minutes, 7 seconds)
2026-01-23 02:39:46,168 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:39:50,121 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1475.30005 ± 1288.824
2026-01-23 02:39:50,121 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3221.8206, 2744.3296, 3242.4702, 2265.964, 1946.4166, 20.018568, 973.8135, 27.616255, 277.48007, 33.06967]
2026-01-23 02:39:50,121 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 871.0, 1000.0, 710.0, 644.0, 30.0, 367.0, 40.0, 140.0, 52.0]
2026-01-23 02:39:50,129 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 3 minutes, 5 seconds)
2026-01-23 02:41:08,351 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:41:16,026 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2807.55566 ± 931.373
2026-01-23 02:41:16,026 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [13.837776, 3127.499, 3143.4224, 3092.5251, 3110.3667, 3116.4067, 3140.603, 3121.3237, 3095.3762, 3114.1958]
2026-01-23 02:41:16,026 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [27.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:41:16,037 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 41 seconds)
2026-01-23 02:42:37,590 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:42:45,201 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2684.89209 ± 506.640
2026-01-23 02:42:45,202 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2315.3743, 2992.567, 2968.5146, 2957.4062, 2977.855, 3033.068, 2104.9187, 1491.6559, 2931.9946, 3075.5693]
2026-01-23 02:42:45,202 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [794.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 733.0, 521.0, 1000.0, 1000.0]
2026-01-23 02:42:45,209 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 61/100 (estimated time remaining: 58 minutes, 56 seconds)
2026-01-23 02:44:08,175 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:44:15,669 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2813.15698 ± 938.997
2026-01-23 02:44:15,670 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3150.602, 3108.0217, 3180.2585, 3171.3525, 3173.216, 3113.9111, -1.4147085, 3075.4702, 3074.6287, 3085.5232]
2026-01-23 02:44:15,670 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 9.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:44:15,681 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 62/100 (estimated time remaining: 58 minutes, 9 seconds)
2026-01-23 02:45:40,964 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:45:49,341 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3149.45605 ± 34.688
2026-01-23 02:45:49,342 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3140.9, 3116.856, 3137.102, 3119.1667, 3171.4485, 3239.0864, 3134.5134, 3159.56, 3156.9368, 3118.9895]
2026-01-23 02:45:49,342 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:45:49,342 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (3149.46) for latency DatasetOffice
2026-01-23 02:45:49,353 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 63/100 (estimated time remaining: 57 minutes, 13 seconds)
2026-01-23 02:47:11,370 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:47:16,635 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1879.91602 ± 1331.666
2026-01-23 02:47:16,636 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3063.1606, 3136.3984, 3168.913, 1063.3108, 3050.971, 164.84282, 3121.3108, 1895.3307, 35.428654, 99.494316]
2026-01-23 02:47:16,636 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 394.0, 1000.0, 101.0, 1000.0, 628.0, 38.0, 106.0]
2026-01-23 02:47:16,643 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 64/100 (estimated time remaining: 55 minutes, 4 seconds)
2026-01-23 02:48:39,185 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:48:47,088 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2992.66992 ± 350.783
2026-01-23 02:48:47,089 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3173.3237, 3169.3481, 3197.9165, 3215.9502, 3161.8113, 2245.5034, 2355.1914, 3029.205, 3226.7017, 3151.7478]
2026-01-23 02:48:47,089 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 722.0, 764.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:48:47,096 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 65/100 (estimated time remaining: 54 minutes, 7 seconds)
2026-01-23 02:50:11,704 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:50:18,742 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2510.38818 ± 1022.716
2026-01-23 02:50:18,743 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3021.252, -8.867172, 2925.474, 2934.194, 3001.3372, 3079.7043, 3051.9626, 1055.0126, 3027.749, 3016.0635]
2026-01-23 02:50:18,743 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 21.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 401.0, 1000.0, 1000.0]
2026-01-23 02:50:18,753 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 66/100 (estimated time remaining: 52 minutes, 54 seconds)
2026-01-23 02:51:36,386 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:51:41,173 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1851.79321 ± 1455.614
2026-01-23 02:51:41,174 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3274.706, 3320.0942, 3031.2065, 3299.8945, 1035.6172, 3372.477, 1098.0637, 9.581816, 52.10663, 24.183039]
2026-01-23 02:51:41,174 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 308.0, 1000.0, 332.0, 29.0, 54.0, 35.0]
2026-01-23 02:51:41,184 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 67/100 (estimated time remaining: 50 minutes, 29 seconds)
2026-01-23 02:53:02,561 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:53:09,265 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2524.68530 ± 866.334
2026-01-23 02:53:09,266 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3137.6584, 1653.7042, 2209.3938, 3069.9644, 2123.5154, 3162.4233, 461.07858, 3116.8323, 3152.139, 3160.1438]
2026-01-23 02:53:09,266 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 562.0, 660.0, 1000.0, 675.0, 1000.0, 207.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:53:09,276 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 68/100 (estimated time remaining: 48 minutes, 23 seconds)
2026-01-23 02:54:32,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:54:39,722 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2758.62500 ± 915.243
2026-01-23 02:54:39,722 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3073.704, 3031.3557, 3111.9329, 3008.6902, 3137.8289, 15.833128, 3014.6902, 3076.5579, 3017.3157, 3098.34]
2026-01-23 02:54:39,722 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 27.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:54:39,730 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 69/100 (estimated time remaining: 47 minutes, 15 seconds)
2026-01-23 02:56:08,859 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:56:13,564 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1800.74902 ± 1250.182
2026-01-23 02:56:13,564 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3184.2456, 3214.508, 2708.788, 3227.9165, 2273.294, 16.9915, 1879.7404, 41.7738, 899.82745, 560.4038]
2026-01-23 02:56:13,565 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 842.0, 1000.0, 726.0, 28.0, 597.0, 61.0, 343.0, 191.0]
2026-01-23 02:56:13,576 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 70/100 (estimated time remaining: 46 minutes, 8 seconds)
2026-01-23 02:57:30,978 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:57:38,558 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2930.42993 ± 771.567
2026-01-23 02:57:38,558 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [655.61053, 3239.7114, 3251.7688, 3201.0251, 3220.046, 3273.9458, 3228.9692, 3255.1028, 3215.9124, 2762.2078]
2026-01-23 02:57:38,558 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [247.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 878.0]
2026-01-23 02:57:38,568 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 71/100 (estimated time remaining: 43 minutes, 58 seconds)
2026-01-23 02:59:04,557 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:59:11,805 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2752.41553 ± 751.967
2026-01-23 02:59:11,805 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [1066.4651, 3324.9294, 3235.209, 2666.528, 3161.5508, 3251.7017, 2814.891, 1571.8043, 3231.026, 3200.0513]
2026-01-23 02:59:11,805 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [345.0, 1000.0, 1000.0, 843.0, 1000.0, 1000.0, 899.0, 527.0, 1000.0, 1000.0]
2026-01-23 02:59:11,814 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 72/100 (estimated time remaining: 43 minutes, 33 seconds)
2026-01-23 03:00:33,554 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:00:40,843 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2834.09863 ± 1030.079
2026-01-23 03:00:40,843 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3313.6912, 5.1922226, 3327.5833, 3318.2576, 3326.9333, 3295.4617, 3225.478, 3326.2588, 3286.7146, 1915.4164]
2026-01-23 03:00:40,843 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 16.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 574.0]
2026-01-23 03:00:40,852 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 73/100 (estimated time remaining: 42 minutes, 8 seconds)
2026-01-23 03:02:05,155 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:02:13,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3149.28564 ± 47.806
2026-01-23 03:02:13,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3230.1208, 3162.6482, 3160.6953, 3149.2412, 3075.1726, 3145.9688, 3122.7615, 3074.2825, 3158.7346, 3213.2322]
2026-01-23 03:02:13,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:02:13,663 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 74/100 (estimated time remaining: 40 minutes, 51 seconds)
2026-01-23 03:03:32,293 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:03:38,394 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2138.25146 ± 1139.596
2026-01-23 03:03:38,395 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3003.2825, 2958.5403, 2962.6028, 2969.2117, 2161.2236, 2775.511, 2935.7463, 1531.3793, 56.695263, 28.322075]
2026-01-23 03:03:38,395 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 684.0, 931.0, 1000.0, 539.0, 63.0, 32.0]
2026-01-23 03:03:38,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 75/100 (estimated time remaining: 38 minutes, 33 seconds)
2026-01-23 03:05:01,865 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:05:09,041 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2616.43799 ± 938.515
2026-01-23 03:05:09,041 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3104.2498, 3137.1848, 3126.529, 3159.1047, 2909.9377, 2217.2183, 2234.3425, 5.8824105, 3131.5623, 3138.3682]
2026-01-23 03:05:09,041 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 720.0, 740.0, 23.0, 1000.0, 1000.0]
2026-01-23 03:05:09,052 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 76/100 (estimated time remaining: 37 minutes, 32 seconds)
2026-01-23 03:06:33,541 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:06:40,775 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2686.91357 ± 834.291
2026-01-23 03:06:40,776 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3110.4492, 3137.0132, 3110.8743, 3102.8137, 3113.661, 3080.0378, 3067.9407, 1289.7614, 3077.3904, 779.1962]
2026-01-23 03:06:40,776 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 430.0, 1000.0, 264.0]
2026-01-23 03:06:40,784 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 77/100 (estimated time remaining: 35 minutes, 55 seconds)
2026-01-23 03:07:55,890 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:08:01,473 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2147.88770 ± 1410.566
2026-01-23 03:08:01,473 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [1482.2963, 3141.3984, 3278.712, 3272.5837, 3286.4417, 3295.2644, 3270.9487, 377.92493, 7.14243, 66.16313]
2026-01-23 03:08:01,473 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [481.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 171.0, 18.0, 96.0]
2026-01-23 03:08:01,483 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 78/100 (estimated time remaining: 33 minutes, 46 seconds)
2026-01-23 03:09:25,515 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:09:32,319 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2526.59814 ± 1137.825
2026-01-23 03:09:32,319 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3039.984, 541.7923, 2880.5728, 3101.6006, -0.7750054, 3135.6377, 3085.6777, 3117.2913, 3118.4424, 3245.7573]
2026-01-23 03:09:32,320 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 237.0, 914.0, 1000.0, 9.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:09:32,328 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 79/100 (estimated time remaining: 32 minutes, 10 seconds)
2026-01-23 03:10:55,018 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:11:03,133 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3040.05518 ± 409.976
2026-01-23 03:11:03,134 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3122.6863, 3220.1353, 3176.6018, 3125.756, 1814.174, 3216.2776, 3167.9639, 3205.5972, 3199.782, 3151.5781]
2026-01-23 03:11:03,134 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 590.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:11:03,142 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 80/100 (estimated time remaining: 31 minutes, 7 seconds)
2026-01-23 03:12:24,015 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:12:28,816 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1738.09473 ± 1217.392
2026-01-23 03:12:28,816 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3096.229, 3099.7324, 2595.5364, 3081.7444, 2141.6108, 4.4588246, 1956.8695, 1035.7949, 333.536, 35.435238]
2026-01-23 03:12:28,816 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 848.0, 1000.0, 712.0, 22.0, 643.0, 408.0, 163.0, 57.0]
2026-01-23 03:12:28,825 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 81/100 (estimated time remaining: 29 minutes, 19 seconds)
2026-01-23 03:13:55,868 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:14:03,554 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2887.00488 ± 653.086
2026-01-23 03:14:03,555 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3216.3018, 3157.9104, 3213.4436, 3154.262, 3096.9878, 3139.5166, 3124.9387, 2595.776, 3174.2966, 996.6143]
2026-01-23 03:14:03,555 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 842.0, 1000.0, 372.0]
2026-01-23 03:14:03,564 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 82/100 (estimated time remaining: 28 minutes, 2 seconds)
2026-01-23 03:15:21,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:15:28,155 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2589.01904 ± 915.324
2026-01-23 03:15:28,155 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3211.3506, 3212.7793, 1695.7499, 834.2102, 3174.2659, 3130.5266, 3187.1492, 1141.6952, 3196.511, 3105.9536]
2026-01-23 03:15:28,155 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 549.0, 296.0, 1000.0, 1000.0, 1000.0, 403.0, 1000.0, 1000.0]
2026-01-23 03:15:28,164 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 83/100 (estimated time remaining: 26 minutes, 48 seconds)
2026-01-23 03:16:51,443 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:16:58,795 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2832.70532 ± 969.424
2026-01-23 03:16:58,795 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3240.549, 2474.1401, 3240.243, 3251.0571, 3250.6997, 3227.9531, 3215.2812, 3181.9133, 3240.3071, 4.9095745]
2026-01-23 03:16:58,795 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 767.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 20.0]
2026-01-23 03:16:58,804 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 84/100 (estimated time remaining: 25 minutes, 18 seconds)
2026-01-23 03:18:20,410 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:18:28,519 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3063.46631 ± 255.211
2026-01-23 03:18:28,520 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3117.2412, 3137.4883, 3168.1729, 3179.196, 3136.776, 3141.0972, 3187.1636, 2301.586, 3162.7373, 3103.202]
2026-01-23 03:18:28,520 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 710.0, 1000.0, 1000.0]
2026-01-23 03:18:28,528 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 85/100 (estimated time remaining: 23 minutes, 45 seconds)
2026-01-23 03:19:57,198 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:20:02,371 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1904.93298 ± 1344.490
2026-01-23 03:20:02,371 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3126.9717, 3108.436, 3148.9614, 1476.1206, 3112.2014, 3064.6257, 1936.173, 30.857624, 6.0147915, 38.9673]
2026-01-23 03:20:02,371 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 489.0, 1000.0, 1000.0, 639.0, 51.0, 19.0, 41.0]
2026-01-23 03:20:02,380 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 86/100 (estimated time remaining: 22 minutes, 40 seconds)
2026-01-23 03:21:17,076 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:21:24,142 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2673.34131 ± 702.190
2026-01-23 03:21:24,142 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3101.819, 2258.4653, 3226.506, 3274.5732, 2056.4834, 3141.881, 3094.4207, 975.7847, 2473.4878, 3129.9907]
2026-01-23 03:21:24,142 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 716.0, 1000.0, 1000.0, 634.0, 1000.0, 1000.0, 358.0, 804.0, 1000.0]
2026-01-23 03:21:24,151 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 87/100 (estimated time remaining: 20 minutes, 33 seconds)
2026-01-23 03:22:48,608 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:22:56,573 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2996.21631 ± 600.280
2026-01-23 03:22:56,573 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3206.51, 3217.0195, 3209.6846, 3222.8967, 3061.4246, 3204.0122, 3229.766, 3221.815, 1200.7572, 3188.277]
2026-01-23 03:22:56,573 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 403.0, 1000.0]
2026-01-23 03:22:56,583 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 88/100 (estimated time remaining: 19 minutes, 25 seconds)
2026-01-23 03:24:22,895 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:24:27,679 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1730.31775 ± 1442.035
2026-01-23 03:24:27,679 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3066.9214, 3089.8271, 3124.24, 3083.4336, 3070.0388, 1755.5692, 2.6792264, 29.720726, 70.71622, 10.030463]
2026-01-23 03:24:27,679 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 593.0, 24.0, 43.0, 76.0, 24.0]
2026-01-23 03:24:27,689 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 89/100 (estimated time remaining: 17 minutes, 57 seconds)
2026-01-23 03:25:44,972 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:25:52,901 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3037.10278 ± 451.409
2026-01-23 03:25:52,901 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3181.5857, 3176.0496, 2911.728, 3228.0908, 3211.897, 1712.5477, 3234.7075, 3236.6187, 3235.482, 3242.3186]
2026-01-23 03:25:52,901 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 908.0, 1000.0, 1000.0, 548.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:25:52,911 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 90/100 (estimated time remaining: 16 minutes, 17 seconds)
2026-01-23 03:27:15,119 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:27:23,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3113.27100 ± 198.286
2026-01-23 03:27:23,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3146.1152, 3151.526, 3128.0828, 3185.2542, 3130.437, 3182.4062, 3241.302, 3206.2673, 3232.0137, 2529.3064]
2026-01-23 03:27:23,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 792.0]
2026-01-23 03:27:23,238 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 91/100 (estimated time remaining: 14 minutes, 41 seconds)
2026-01-23 03:28:46,984 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:28:50,881 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1453.16968 ± 1245.848
2026-01-23 03:28:50,882 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3241.843, 2396.3987, 3190.4365, 2266.6765, 1874.6256, 26.498041, 1209.7281, 0.54755145, 300.75488, 24.186186]
2026-01-23 03:28:50,882 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 754.0, 1000.0, 711.0, 625.0, 37.0, 412.0, 13.0, 140.0, 37.0]
2026-01-23 03:28:50,891 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 92/100 (estimated time remaining: 13 minutes, 24 seconds)
2026-01-23 03:30:17,563 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:30:25,206 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3033.80518 ± 824.653
2026-01-23 03:30:25,206 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3334.4214, 3284.9663, 3303.57, 3281.5103, 3295.759, 3435.1096, 3278.313, 3247.2998, 3313.0996, 564.0047]
2026-01-23 03:30:25,206 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 215.0]
2026-01-23 03:30:25,219 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 93/100 (estimated time remaining: 11 minutes, 57 seconds)
2026-01-23 03:31:42,736 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:31:48,685 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2271.14185 ± 1166.849
2026-01-23 03:31:48,686 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [6.166646, 3222.237, 3157.6567, 1235.5945, 3080.6096, 3209.386, 3187.5989, 941.9803, 1448.4941, 3221.6953]
2026-01-23 03:31:48,686 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [23.0, 1000.0, 1000.0, 428.0, 1000.0, 1000.0, 1000.0, 336.0, 491.0, 1000.0]
2026-01-23 03:31:48,696 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 94/100 (estimated time remaining: 10 minutes, 17 seconds)
2026-01-23 03:33:08,969 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:33:12,459 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1263.21216 ± 1530.316
2026-01-23 03:33:12,460 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [-7.621416, 22.254774, 36.414658, 1.1719867, 31.461206, 2.277497, 3011.3264, 3166.3823, 3174.771, 3193.682]
2026-01-23 03:33:12,460 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [33.0, 53.0, 42.0, 15.0, 34.0, 19.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:33:12,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 95/100 (estimated time remaining: 8 minutes, 47 seconds)
2026-01-23 03:34:37,069 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:34:45,413 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3083.89136 ± 8.677
2026-01-23 03:34:45,414 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3075.5303, 3098.2756, 3087.3035, 3095.0657, 3083.1523, 3078.263, 3070.4004, 3083.988, 3075.5613, 3091.3748]
2026-01-23 03:34:45,414 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:34:45,427 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 96/100 (estimated time remaining: 7 minutes, 22 seconds)
2026-01-23 03:36:07,439 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:36:14,974 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2862.61865 ± 764.728
2026-01-23 03:36:14,974 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3234.517, 3196.3394, 3215.3376, 3225.6553, 727.0314, 2269.312, 3152.0586, 3211.726, 3201.7131, 3192.4985]
2026-01-23 03:36:14,974 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 272.0, 698.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:36:14,986 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 97/100 (estimated time remaining: 5 minutes, 55 seconds)
2026-01-23 03:37:34,078 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:37:38,534 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1678.68420 ± 1408.320
2026-01-23 03:37:38,534 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [25.603708, 985.24457, 510.03214, 28.476316, 42.91129, 3234.5527, 3184.687, 3217.502, 2344.364, 3213.467]
2026-01-23 03:37:38,534 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [38.0, 338.0, 196.0, 45.0, 67.0, 1000.0, 1000.0, 1000.0, 740.0, 1000.0]
2026-01-23 03:37:38,548 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 98/100 (estimated time remaining: 4 minutes, 19 seconds)
2026-01-23 03:39:07,263 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:39:14,971 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2986.92114 ± 514.166
2026-01-23 03:39:14,977 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3413.422, 2138.441, 3203.4814, 3304.1484, 1818.3746, 3187.7925, 3242.1123, 3145.5195, 3192.5454, 3223.374]
2026-01-23 03:39:14,977 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 678.0, 1000.0, 1000.0, 579.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:39:14,991 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 99/100 (estimated time remaining: 2 minutes, 58 seconds)
2026-01-23 03:40:31,633 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:40:39,728 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3082.22900 ± 320.895
2026-01-23 03:40:39,728 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3203.1558, 3132.8757, 3150.0366, 3221.209, 2126.1523, 3209.563, 3146.907, 3170.7483, 3197.9778, 3263.6624]
2026-01-23 03:40:39,728 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 674.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:40:39,740 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 29 seconds)
2026-01-23 03:42:01,304 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:42:09,375 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3134.19336 ± 410.674
2026-01-23 03:42:09,376 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3286.9084, 3212.463, 3269.5803, 3213.7568, 3293.763, 3322.7458, 3290.183, 1906.2614, 3255.1687, 3291.1062]
2026-01-23 03:42:09,376 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 602.0, 1000.0, 1000.0]
2026-01-23 03:42:09,387 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1299 [DEBUG]: Training session finished
