2026-01-23 01:15:14,391 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-walker2d/DatasetOffice-sac-aug-mem2
2026-01-23 01:15:14,391 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-walker2d/DatasetOffice-sac-aug-mem2
2026-01-23 01:15:14,391 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x1522469f7f50>}
2026-01-23 01:15:14,392 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1159 [DEBUG]: using device: cuda
2026-01-23 01:15:14,392 baseline-sac-noisy-walker2d:77 [WARNING]: args.memorize_actions != args.horizon: 2 != 32
2026-01-23 01:15:14,532 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1181 [INFO]: Creating new trainer
2026-01-23 01:15:14,550 baseline-sac-noisy-walker2d:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=29, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2026-01-23 01:15:14,550 baseline-sac-noisy-walker2d:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-23 01:15:15,346 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1242 [DEBUG]: Starting training session...
2026-01-23 01:15:15,346 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 1/100
2026-01-23 01:16:36,574 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:16:38,367 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 90.01404 ± 75.255
2026-01-23 01:16:38,367 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [30.47711, 73.086716, 142.89935, 61.739025, 78.17808, 77.936806, 293.74576, 67.41725, 11.332959, 63.32734]
2026-01-23 01:16:38,367 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [132.0, 207.0, 213.0, 235.0, 236.0, 259.0, 223.0, 222.0, 99.0, 232.0]
2026-01-23 01:16:38,367 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (90.01) for latency DatasetOffice
2026-01-23 01:16:38,371 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 16 minutes, 59 seconds)
2026-01-23 01:18:07,258 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:18:08,525 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 75.91898 ± 96.532
2026-01-23 01:18:08,525 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [53.67566, 8.901594, 36.275528, 60.660088, 274.81473, 7.5923915, 26.26821, 17.210775, 16.857738, 256.93317]
2026-01-23 01:18:08,525 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [231.0, 117.0, 188.0, 214.0, 156.0, 154.0, 46.0, 33.0, 123.0, 210.0]
2026-01-23 01:18:08,527 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 21 minutes, 25 seconds)
2026-01-23 01:19:38,307 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:19:40,295 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 265.10474 ± 101.793
2026-01-23 01:19:40,295 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [204.72906, 282.71442, 215.92815, 234.95186, 429.4032, 285.8217, 241.51254, 330.88782, 383.58087, 41.51792]
2026-01-23 01:19:40,295 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [132.0, 166.0, 353.0, 393.0, 282.0, 178.0, 144.0, 188.0, 238.0, 224.0]
2026-01-23 01:19:40,295 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (265.10) for latency DatasetOffice
2026-01-23 01:19:40,300 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 22 minutes, 46 seconds)
2026-01-23 01:21:08,750 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:21:10,274 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 277.36002 ± 68.206
2026-01-23 01:21:10,274 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [432.14764, 198.67961, 223.61581, 325.52756, 292.75598, 283.19705, 219.12862, 202.88393, 321.45392, 274.21]
2026-01-23 01:21:10,274 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [252.0, 121.0, 128.0, 332.0, 162.0, 135.0, 123.0, 124.0, 260.0, 150.0]
2026-01-23 01:21:10,274 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (277.36) for latency DatasetOffice
2026-01-23 01:21:10,280 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 21 minutes, 58 seconds)
2026-01-23 01:22:38,321 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:22:39,903 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 290.15704 ± 110.687
2026-01-23 01:22:39,903 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [466.19415, 379.77478, 236.51942, 135.78232, 284.614, 159.6994, 180.33658, 447.16504, 270.10983, 341.375]
2026-01-23 01:22:39,904 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [281.0, 226.0, 148.0, 74.0, 137.0, 88.0, 113.0, 434.0, 149.0, 184.0]
2026-01-23 01:22:39,904 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (290.16) for latency DatasetOffice
2026-01-23 01:22:39,906 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 20 minutes, 46 seconds)
2026-01-23 01:24:10,343 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:24:12,028 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 368.87921 ± 261.491
2026-01-23 01:24:12,029 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [334.94974, 135.55763, 403.12045, 368.91574, 200.32321, -4.870702, 1008.6424, 557.48926, 262.84952, 421.81476]
2026-01-23 01:24:12,029 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [179.0, 120.0, 227.0, 178.0, 131.0, 20.0, 499.0, 236.0, 156.0, 212.0]
2026-01-23 01:24:12,029 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (368.88) for latency DatasetOffice
2026-01-23 01:24:12,032 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 22 minutes, 8 seconds)
2026-01-23 01:25:39,972 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:25:41,802 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 272.40585 ± 163.283
2026-01-23 01:25:41,802 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [159.7535, 127.491035, 168.70445, 141.4863, 407.42908, 199.25685, 560.3997, 408.5959, 475.39655, 75.54532]
2026-01-23 01:25:41,802 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [190.0, 397.0, 128.0, 143.0, 261.0, 113.0, 251.0, 253.0, 293.0, 83.0]
2026-01-23 01:25:41,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 20 minutes, 30 seconds)
2026-01-23 01:27:10,498 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:27:11,711 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 240.08902 ± 137.417
2026-01-23 01:27:11,712 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [309.20538, 357.03766, 245.08864, 141.72939, 363.53094, 188.56197, 14.062389, 179.55498, 501.80856, 100.31029]
2026-01-23 01:27:11,712 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [148.0, 178.0, 128.0, 87.0, 208.0, 112.0, 64.0, 91.0, 244.0, 166.0]
2026-01-23 01:27:11,715 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 18 minutes, 26 seconds)
2026-01-23 01:28:40,205 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:28:41,793 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 346.93945 ± 131.338
2026-01-23 01:28:41,799 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [420.25812, 516.18854, 207.17984, 454.18402, 391.09723, 297.3308, 168.34222, 255.14954, 205.84102, 553.8229]
2026-01-23 01:28:41,799 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [263.0, 291.0, 112.0, 233.0, 189.0, 153.0, 105.0, 165.0, 123.0, 246.0]
2026-01-23 01:28:41,805 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 16 minutes, 57 seconds)
2026-01-23 01:30:12,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:30:13,553 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 285.82004 ± 185.466
2026-01-23 01:30:13,554 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [24.11798, 50.17784, 383.9127, 360.002, 153.93576, 553.0129, 167.21924, 202.73457, 385.45914, 577.62836]
2026-01-23 01:30:13,554 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [88.0, 138.0, 203.0, 214.0, 99.0, 255.0, 90.0, 113.0, 219.0, 342.0]
2026-01-23 01:30:13,558 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 16 minutes, 5 seconds)
2026-01-23 01:31:42,243 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:31:44,061 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 367.10278 ± 149.995
2026-01-23 01:31:44,061 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [349.87454, 449.7575, 86.57489, 439.11716, 245.10532, 594.0062, 556.6512, 214.63258, 432.57825, 302.73035]
2026-01-23 01:31:44,061 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [193.0, 251.0, 121.0, 244.0, 235.0, 288.0, 286.0, 135.0, 191.0, 173.0]
2026-01-23 01:31:44,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 14 minutes, 6 seconds)
2026-01-23 01:33:12,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:33:14,871 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 446.00128 ± 272.746
2026-01-23 01:33:14,871 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [525.017, 572.96985, 969.21094, 406.48486, 403.43326, 222.98576, 424.30725, 25.771067, 125.17346, 784.6592]
2026-01-23 01:33:14,871 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [385.0, 265.0, 532.0, 190.0, 204.0, 144.0, 257.0, 38.0, 83.0, 471.0]
2026-01-23 01:33:14,871 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (446.00) for latency DatasetOffice
2026-01-23 01:33:14,875 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 12 minutes, 54 seconds)
2026-01-23 01:34:45,340 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:34:47,714 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 506.29941 ± 222.326
2026-01-23 01:34:47,715 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [541.37756, 439.92795, 202.59709, 414.0346, 283.60095, 892.47314, 909.6318, 564.5493, 365.28876, 449.51343]
2026-01-23 01:34:47,715 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [239.0, 180.0, 106.0, 212.0, 160.0, 662.0, 435.0, 281.0, 181.0, 253.0]
2026-01-23 01:34:47,715 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (506.30) for latency DatasetOffice
2026-01-23 01:34:47,718 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 12 minutes, 14 seconds)
2026-01-23 01:36:14,932 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:36:16,619 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 402.63989 ± 140.660
2026-01-23 01:36:16,620 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [643.7352, 427.99142, 191.5117, 194.02396, 415.48242, 623.4639, 381.11844, 379.5119, 399.29526, 370.265]
2026-01-23 01:36:16,620 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [273.0, 204.0, 120.0, 114.0, 197.0, 294.0, 188.0, 166.0, 231.0, 179.0]
2026-01-23 01:36:16,625 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 10 minutes, 22 seconds)
2026-01-23 01:37:46,980 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:37:47,259 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 17.12020 ± 16.921
2026-01-23 01:37:47,259 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [26.36101, 2.8277698, 35.91401, -0.56008804, 0.572338, 2.115729, 5.1834764, 43.11697, 14.402442, 41.26833]
2026-01-23 01:37:47,259 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [38.0, 29.0, 43.0, 11.0, 25.0, 19.0, 27.0, 53.0, 19.0, 63.0]
2026-01-23 01:37:47,262 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 8 minutes, 32 seconds)
2026-01-23 01:39:15,518 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:39:17,883 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 555.96149 ± 157.796
2026-01-23 01:39:17,883 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [755.2007, 578.20667, 503.68832, 435.23187, 932.98706, 410.27475, 427.4472, 556.204, 472.3546, 488.01968]
2026-01-23 01:39:17,883 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [391.0, 363.0, 232.0, 200.0, 448.0, 214.0, 209.0, 248.0, 231.0, 206.0]
2026-01-23 01:39:17,883 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (555.96) for latency DatasetOffice
2026-01-23 01:39:17,893 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 7 minutes, 4 seconds)
2026-01-23 01:40:47,364 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:40:49,549 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 528.15833 ± 275.998
2026-01-23 01:40:49,550 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [358.59772, 532.0257, 373.37933, 407.52164, 520.0658, 490.9634, 423.00977, 1339.1759, 410.17172, 426.67215]
2026-01-23 01:40:49,550 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [215.0, 216.0, 203.0, 213.0, 227.0, 244.0, 240.0, 551.0, 234.0, 196.0]
2026-01-23 01:40:49,555 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 5 minutes, 47 seconds)
2026-01-23 01:42:18,832 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:42:20,693 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 440.00836 ± 217.652
2026-01-23 01:42:20,693 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [547.888, 436.29846, 405.95544, 424.35345, 100.13784, 21.54518, 571.0407, 474.68607, 650.50775, 767.67035]
2026-01-23 01:42:20,693 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [248.0, 200.0, 203.0, 204.0, 70.0, 43.0, 284.0, 289.0, 274.0, 360.0]
2026-01-23 01:42:20,697 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 3 minutes, 48 seconds)
2026-01-23 01:43:50,874 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:43:53,654 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 498.06000 ± 324.292
2026-01-23 01:43:53,654 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [1298.274, 546.66925, 159.26501, 217.20924, 618.09064, 219.6715, 517.37915, 749.0241, 297.09033, 357.9268]
2026-01-23 01:43:53,654 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [729.0, 288.0, 187.0, 195.0, 388.0, 128.0, 333.0, 525.0, 180.0, 216.0]
2026-01-23 01:43:53,658 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 3 minutes, 23 seconds)
2026-01-23 01:45:23,259 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:45:24,686 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 363.47510 ± 183.684
2026-01-23 01:45:24,686 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [253.96599, 727.9974, 249.95967, 657.3933, 247.43892, 277.12958, 250.42651, 234.7901, 221.26678, 514.3827]
2026-01-23 01:45:24,686 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [121.0, 300.0, 128.0, 264.0, 127.0, 139.0, 128.0, 122.0, 112.0, 222.0]
2026-01-23 01:45:24,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 1 minute, 58 seconds)
2026-01-23 01:46:52,729 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:46:55,151 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 635.27356 ± 553.066
2026-01-23 01:46:55,151 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [336.33636, 538.3503, 955.0067, 205.9118, 266.7885, 1057.3131, 2057.0532, 327.7951, 194.05997, 414.12]
2026-01-23 01:46:55,152 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [164.0, 239.0, 372.0, 133.0, 140.0, 400.0, 803.0, 174.0, 134.0, 182.0]
2026-01-23 01:46:55,152 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (635.27) for latency DatasetOffice
2026-01-23 01:46:55,156 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 24 seconds)
2026-01-23 01:48:25,043 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:48:28,825 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 962.04529 ± 513.839
2026-01-23 01:48:28,825 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [618.33136, 330.32367, 1488.1725, 517.3886, 1106.1871, 1183.7506, 498.15933, 971.76074, 789.5224, 2116.857]
2026-01-23 01:48:28,825 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [236.0, 181.0, 535.0, 268.0, 481.0, 571.0, 256.0, 422.0, 357.0, 900.0]
2026-01-23 01:48:28,825 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (962.05) for latency DatasetOffice
2026-01-23 01:48:28,832 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 23/100 (estimated time remaining: 1 hour, 59 minutes, 24 seconds)
2026-01-23 01:50:00,821 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:50:03,341 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 685.91681 ± 317.732
2026-01-23 01:50:03,341 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [749.6402, 695.2366, 28.779766, 327.25366, 641.63403, 601.56, 758.9329, 1235.5157, 792.5108, 1028.1047]
2026-01-23 01:50:03,341 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [263.0, 407.0, 37.0, 192.0, 260.0, 232.0, 328.0, 441.0, 315.0, 408.0]
2026-01-23 01:50:03,347 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 24/100 (estimated time remaining: 1 hour, 58 minutes, 44 seconds)
2026-01-23 01:51:30,419 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:51:35,022 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1240.27026 ± 575.247
2026-01-23 01:51:35,023 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [1846.9705, 1849.1431, 1421.3125, 520.62756, 1049.5645, 1114.9668, 2336.5208, 854.2169, 851.4045, 557.9753]
2026-01-23 01:51:35,023 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [721.0, 705.0, 552.0, 234.0, 406.0, 475.0, 979.0, 355.0, 397.0, 225.0]
2026-01-23 01:51:35,023 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (1240.27) for latency DatasetOffice
2026-01-23 01:51:35,026 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 25/100 (estimated time remaining: 1 hour, 56 minutes, 52 seconds)
2026-01-23 01:53:04,773 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:53:12,070 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1865.79395 ± 821.072
2026-01-23 01:53:12,070 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2382.775, 2449.5688, 501.61407, 2416.774, 2299.411, 590.5948, 2428.5347, 756.74133, 2453.113, 2378.8132]
2026-01-23 01:53:12,070 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 246.0, 1000.0, 1000.0, 252.0, 1000.0, 299.0, 1000.0, 1000.0]
2026-01-23 01:53:12,070 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (1865.79) for latency DatasetOffice
2026-01-23 01:53:12,075 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 26/100 (estimated time remaining: 1 hour, 56 minutes, 50 seconds)
2026-01-23 01:54:43,824 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:54:51,856 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1927.41858 ± 713.255
2026-01-23 01:54:51,856 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2276.597, 2326.5132, 1214.0316, 2217.2075, 2278.403, 2160.332, 2317.4136, 2202.898, 2273.7507, 7.0397573]
2026-01-23 01:54:51,856 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 458.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 990.0, 24.0]
2026-01-23 01:54:51,857 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (1927.42) for latency DatasetOffice
2026-01-23 01:55:33,950 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 7 minutes, 58 seconds)
2026-01-23 01:57:04,673 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:57:13,438 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2168.59424 ± 630.838
2026-01-23 01:57:13,438 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2418.014, 2404.2908, 2370.4868, 2387.7634, 2434.0525, 2388.3076, 279.8567, 2279.8037, 2361.1106, 2362.2551]
2026-01-23 01:57:13,438 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 121.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:57:13,438 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (2168.59) for latency DatasetOffice
2026-01-23 01:57:13,442 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 7 minutes, 39 seconds)
2026-01-23 01:58:47,340 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:58:55,929 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2146.58350 ± 453.642
2026-01-23 01:58:55,929 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2371.4246, 2349.9558, 999.84296, 1561.7684, 2291.5154, 2406.9846, 2348.6335, 2424.5518, 2441.3337, 2269.8235]
2026-01-23 01:58:55,929 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 424.0, 579.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:58:55,934 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 7 minutes, 49 seconds)
2026-01-23 02:00:24,766 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:00:29,815 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1211.42273 ± 1029.608
2026-01-23 02:00:29,815 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2401.1753, 2413.526, 2324.2222, 2327.971, 96.19408, 1481.7811, 765.37897, 269.41623, 28.509718, 6.0531216]
2026-01-23 02:00:29,815 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 85.0, 682.0, 372.0, 170.0, 46.0, 20.0]
2026-01-23 02:00:29,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 6 minutes, 34 seconds)
2026-01-23 02:01:57,723 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:02:06,524 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2127.93335 ± 674.074
2026-01-23 02:02:06,525 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2309.824, 2331.3953, 2337.7761, 2380.324, 2379.9248, 2340.9321, 2362.3816, 106.79519, 2352.2893, 2377.6907]
2026-01-23 02:02:06,525 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 120.0, 1000.0, 1000.0]
2026-01-23 02:02:06,531 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 4 minutes, 42 seconds)
2026-01-23 02:03:38,671 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:03:47,548 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2341.76221 ± 402.284
2026-01-23 02:03:47,548 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2506.2395, 2392.3396, 2500.8772, 2397.9666, 2452.4004, 2509.236, 1143.9521, 2473.8137, 2564.8008, 2475.9954]
2026-01-23 02:03:47,548 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 433.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:03:47,549 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (2341.76) for latency DatasetOffice
2026-01-23 02:03:47,554 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 53 minutes, 31 seconds)
2026-01-23 02:05:20,004 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:05:28,408 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2133.32593 ± 414.021
2026-01-23 02:05:28,408 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2519.5127, 2515.7068, 1717.3014, 1489.945, 1700.5331, 2410.6738, 1622.2047, 2432.7788, 2471.3823, 2453.2212]
2026-01-23 02:05:28,408 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 693.0, 715.0, 726.0, 1000.0, 713.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:05:28,413 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 52 minutes, 11 seconds)
2026-01-23 02:06:54,319 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:07:03,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2420.84131 ± 272.019
2026-01-23 02:07:03,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [1856.4207, 2527.636, 2691.1406, 2478.7283, 2662.572, 1941.1902, 2462.8386, 2512.4312, 2599.2874, 2476.1685]
2026-01-23 02:07:03,332 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [765.0, 1000.0, 1000.0, 1000.0, 1000.0, 759.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:07:03,332 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (2420.84) for latency DatasetOffice
2026-01-23 02:07:03,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 48 minutes, 51 seconds)
2026-01-23 02:08:39,550 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:08:45,074 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1393.14575 ± 1081.904
2026-01-23 02:08:45,074 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2355.6536, 2438.3735, 2430.4844, 60.094044, 2341.4946, 27.165401, 2372.9834, 1587.177, 268.53094, 49.499752]
2026-01-23 02:08:45,074 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 56.0, 1000.0, 31.0, 1000.0, 644.0, 129.0, 56.0]
2026-01-23 02:08:45,079 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 48 minutes, 57 seconds)
2026-01-23 02:10:15,631 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:10:25,163 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2306.89111 ± 56.716
2026-01-23 02:10:25,164 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2264.6335, 2263.3782, 2245.6338, 2280.8442, 2398.3557, 2382.0344, 2227.3257, 2346.917, 2306.9365, 2352.852]
2026-01-23 02:10:25,164 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:10:25,177 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 48 minutes, 2 seconds)
2026-01-23 02:11:52,895 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:11:59,930 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1727.59595 ± 1001.958
2026-01-23 02:11:59,930 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2334.747, 515.8419, 2391.0278, 2416.983, 2370.435, 2409.7998, 2370.2532, 91.20354, 18.494614, 2357.1733]
2026-01-23 02:11:59,930 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 323.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 67.0, 25.0, 1000.0]
2026-01-23 02:11:59,935 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 45 minutes, 2 seconds)
2026-01-23 02:13:34,790 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:13:40,288 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1351.93250 ± 1094.908
2026-01-23 02:13:40,288 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2321.3171, 2382.0664, 2260.3237, 2362.8157, 2381.696, 1682.7208, 23.675207, 42.55367, 65.69653, -3.541041]
2026-01-23 02:13:40,289 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 665.0, 34.0, 49.0, 107.0, 18.0]
2026-01-23 02:13:40,297 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 43 minutes, 17 seconds)
2026-01-23 02:15:03,597 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:15:13,193 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2493.13916 ± 62.333
2026-01-23 02:15:13,194 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2468.1052, 2504.642, 2547.0356, 2440.1758, 2464.2126, 2616.5847, 2500.8958, 2372.5483, 2483.7952, 2533.3945]
2026-01-23 02:15:13,194 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:15:13,194 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (2493.14) for latency DatasetOffice
2026-01-23 02:15:13,202 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 41 minutes, 14 seconds)
2026-01-23 02:16:44,297 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:16:53,760 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2405.13135 ± 44.636
2026-01-23 02:16:53,760 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2392.3447, 2359.2922, 2390.0332, 2328.5647, 2393.1594, 2379.4043, 2446.1033, 2424.809, 2485.9702, 2451.633]
2026-01-23 02:16:53,760 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:16:53,765 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 39 minutes, 21 seconds)
2026-01-23 02:18:27,893 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:18:35,316 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1952.96875 ± 840.032
2026-01-23 02:18:35,316 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2511.4285, 2479.247, 2330.8853, 546.17883, 1535.1001, 205.65935, 2467.4937, 2450.8882, 2545.4712, 2457.3357]
2026-01-23 02:18:35,316 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 231.0, 635.0, 106.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:18:35,323 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 38 minutes, 1 second)
2026-01-23 02:20:07,595 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:20:15,336 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2013.16504 ± 911.627
2026-01-23 02:20:15,336 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [259.31732, 2409.9265, 2548.489, 2514.0317, 2470.186, 126.05491, 2486.3699, 2467.589, 2411.3853, 2438.3013]
2026-01-23 02:20:15,336 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [110.0, 1000.0, 1000.0, 1000.0, 1000.0, 121.0, 1000.0, 1000.0, 993.0, 1000.0]
2026-01-23 02:20:15,343 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 37 minutes, 25 seconds)
2026-01-23 02:21:40,566 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:21:47,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1972.57324 ± 921.977
2026-01-23 02:21:47,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [844.47705, 2373.1882, 2592.3733, 2567.2258, 715.70764, 207.05235, 2544.0486, 2621.6526, 2731.164, 2528.8425]
2026-01-23 02:21:47,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [347.0, 1000.0, 1000.0, 1000.0, 315.0, 109.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:21:47,813 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 34 minutes, 15 seconds)
2026-01-23 02:23:20,930 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:23:29,716 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2334.31934 ± 425.900
2026-01-23 02:23:29,716 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2445.309, 2369.8352, 2585.1091, 2611.8513, 2517.6033, 2167.4775, 1110.8276, 2462.6604, 2491.4626, 2581.055]
2026-01-23 02:23:29,716 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 844.0, 443.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:23:29,721 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 34 minutes, 20 seconds)
2026-01-23 02:25:01,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:25:09,938 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2428.89062 ± 403.185
2026-01-23 02:25:09,938 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2606.7756, 2597.8408, 2512.6035, 2678.0784, 1236.7225, 2623.435, 2418.8242, 2565.5593, 2531.6855, 2517.381]
2026-01-23 02:25:09,938 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 547.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:25:09,949 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 32 minutes, 37 seconds)
2026-01-23 02:26:41,767 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:26:48,592 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1815.54175 ± 1111.074
2026-01-23 02:26:48,592 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2475.6755, 2613.5725, 2577.152, 2684.0312, 34.16913, 63.704464, 2388.4575, 2604.0745, 2435.6377, 278.9415]
2026-01-23 02:26:48,592 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 51.0, 84.0, 1000.0, 1000.0, 1000.0, 154.0]
2026-01-23 02:26:48,598 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 30 minutes, 26 seconds)
2026-01-23 02:28:17,479 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:28:25,594 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2026.26392 ± 584.542
2026-01-23 02:28:25,595 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [1998.6089, 2418.9568, 2400.4883, 1446.642, 2350.9158, 500.03665, 2517.2336, 2257.5322, 2133.6267, 2238.5989]
2026-01-23 02:28:25,595 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [904.0, 1000.0, 1000.0, 605.0, 1000.0, 223.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:28:25,601 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 28 minutes, 14 seconds)
2026-01-23 02:29:54,973 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:30:04,226 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2548.73584 ± 97.136
2026-01-23 02:30:04,226 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2499.5776, 2419.201, 2635.3167, 2623.854, 2570.9648, 2633.4775, 2462.0957, 2371.2998, 2645.7698, 2625.8018]
2026-01-23 02:30:04,226 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:30:04,226 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (2548.74) for latency DatasetOffice
2026-01-23 02:30:04,232 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 27 minutes, 42 seconds)
2026-01-23 02:31:32,024 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:31:39,413 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1931.58521 ± 933.639
2026-01-23 02:31:39,414 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2542.491, 1853.877, 214.8638, 6.5052376, 2430.462, 2222.6284, 2544.618, 2480.0728, 2461.3838, 2558.95]
2026-01-23 02:31:39,414 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 740.0, 147.0, 22.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:31:39,420 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 24 minutes, 52 seconds)
2026-01-23 02:33:06,909 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:33:16,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2485.17920 ± 29.243
2026-01-23 02:33:16,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2482.9197, 2451.7249, 2451.2878, 2446.0889, 2503.3096, 2547.3738, 2494.357, 2485.9731, 2482.1443, 2506.6143]
2026-01-23 02:33:16,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:33:16,242 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 22 minutes, 40 seconds)
2026-01-23 02:34:48,937 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:34:57,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2441.00488 ± 428.345
2026-01-23 02:34:57,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2561.316, 2574.2024, 1174.4756, 2596.0916, 2461.44, 2554.5542, 2590.965, 2523.55, 2761.0918, 2612.3608]
2026-01-23 02:34:57,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 466.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:34:57,805 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 21 minutes, 32 seconds)
2026-01-23 02:36:28,097 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:36:37,452 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2283.36206 ± 423.164
2026-01-23 02:36:37,452 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2496.6362, 2429.7922, 2450.0332, 2075.2727, 1076.7487, 2576.1663, 2429.4448, 2418.4849, 2545.4585, 2335.5837]
2026-01-23 02:36:37,452 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:36:37,458 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 20 minutes, 20 seconds)
2026-01-23 02:38:05,287 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:38:14,002 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2453.32422 ± 465.569
2026-01-23 02:38:14,002 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2592.2153, 2531.8477, 2675.2747, 2644.178, 2664.1003, 2612.8896, 2630.5962, 1063.1752, 2555.1787, 2563.7861]
2026-01-23 02:38:14,002 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 391.0, 1000.0, 1000.0]
2026-01-23 02:38:14,010 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 18 minutes, 21 seconds)
2026-01-23 02:39:47,865 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:39:53,356 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1421.30640 ± 1142.817
2026-01-23 02:39:53,356 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2538.8862, 2321.422, 2433.4922, 2448.626, 1810.3688, 18.654066, 114.85223, 11.194084, 22.128016, 2493.441]
2026-01-23 02:39:53,356 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 762.0, 52.0, 60.0, 23.0, 35.0, 1000.0]
2026-01-23 02:39:53,362 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 17 minutes, 23 seconds)
2026-01-23 02:41:23,380 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:41:32,691 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2614.49097 ± 132.476
2026-01-23 02:41:32,691 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2312.2068, 2589.6406, 2617.4106, 2586.7954, 2651.8772, 2756.2878, 2832.4832, 2691.7283, 2542.3855, 2564.0947]
2026-01-23 02:41:32,691 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:41:32,691 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (2614.49) for latency DatasetOffice
2026-01-23 02:41:32,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 16 minutes, 7 seconds)
2026-01-23 02:43:04,835 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:43:13,545 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2352.58252 ± 637.814
2026-01-23 02:43:13,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2600.102, 2546.7607, 2659.8657, 2554.559, 2566.505, 2535.5623, 2518.7273, 442.46255, 2547.6035, 2553.6785]
2026-01-23 02:43:13,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 272.0, 1000.0, 1000.0]
2026-01-23 02:43:13,558 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 14 minutes, 21 seconds)
2026-01-23 02:44:35,649 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:44:42,936 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2134.37329 ± 732.570
2026-01-23 02:44:42,937 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2529.6274, 2723.7017, 1964.4727, 1541.7078, 1576.0374, 384.36694, 2691.3, 2700.127, 2700.2385, 2532.1528]
2026-01-23 02:44:42,937 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [872.0, 1000.0, 735.0, 576.0, 615.0, 170.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:44:42,945 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 11 minutes, 12 seconds)
2026-01-23 02:46:14,611 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:46:23,501 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2483.12354 ± 466.814
2026-01-23 02:46:23,501 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2757.6362, 2687.101, 2633.6428, 2830.874, 2440.1658, 1141.8068, 2799.3171, 2613.1018, 2466.0227, 2461.5686]
2026-01-23 02:46:23,501 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 472.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:46:23,509 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 10 minutes, 9 seconds)
2026-01-23 02:47:53,539 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:48:02,576 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2622.24854 ± 290.256
2026-01-23 02:48:02,576 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2666.7124, 1772.9576, 2643.6646, 2668.8887, 2693.5967, 2762.9956, 2798.4563, 2813.5266, 2624.3975, 2777.2883]
2026-01-23 02:48:02,576 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 716.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:48:02,576 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (2622.25) for latency DatasetOffice
2026-01-23 02:48:02,583 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 8 minutes, 29 seconds)
2026-01-23 02:49:32,659 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:49:42,063 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2689.68018 ± 42.220
2026-01-23 02:49:42,063 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2611.3088, 2705.8477, 2656.975, 2684.9736, 2729.556, 2656.0156, 2738.8413, 2754.0188, 2659.0334, 2700.2317]
2026-01-23 02:49:42,063 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:49:42,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (2689.68) for latency DatasetOffice
2026-01-23 02:49:42,070 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 6 minutes, 52 seconds)
2026-01-23 02:51:13,810 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:51:22,589 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2532.77783 ± 357.574
2026-01-23 02:51:22,589 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2683.4216, 2704.9832, 2645.811, 2686.6763, 2653.901, 1471.8828, 2680.344, 2691.9607, 2582.3264, 2526.4734]
2026-01-23 02:51:22,589 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 554.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:51:22,596 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 5 minutes, 12 seconds)
2026-01-23 02:52:49,224 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:52:55,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1881.51331 ± 1126.677
2026-01-23 02:52:55,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [7.755139, 356.8408, 2892.9636, 2762.0015, 2610.143, 2727.303, 2659.1694, 2840.179, 403.8816, 1554.8961]
2026-01-23 02:52:55,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [32.0, 187.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 230.0, 534.0]
2026-01-23 02:52:55,656 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 4 minutes, 3 seconds)
2026-01-23 02:54:30,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:54:39,535 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2540.73022 ± 565.703
2026-01-23 02:54:39,535 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2790.262, 2696.6218, 2736.1997, 2751.4185, 2802.5376, 2466.2524, 2733.1763, 2729.2708, 2833.713, 867.8514]
2026-01-23 02:54:39,535 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 316.0]
2026-01-23 02:54:39,543 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 2 minutes, 49 seconds)
2026-01-23 02:56:03,076 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:56:11,728 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2544.08984 ± 616.763
2026-01-23 02:56:11,729 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2707.2505, 2785.5283, 2774.8103, 2747.8508, 2667.2605, 2697.8193, 698.8037, 2831.876, 2772.8367, 2756.8638]
2026-01-23 02:56:11,729 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 299.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:56:11,735 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 19 seconds)
2026-01-23 02:57:42,044 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:57:51,310 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2662.92456 ± 68.052
2026-01-23 02:57:51,310 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2691.4043, 2629.7803, 2722.0112, 2610.5347, 2689.266, 2647.4377, 2496.193, 2744.516, 2708.1182, 2689.9841]
2026-01-23 02:57:51,310 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:57:51,319 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 65/100 (estimated time remaining: 58 minutes, 42 seconds)
2026-01-23 02:59:21,208 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:59:29,801 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2534.08105 ± 733.255
2026-01-23 02:59:29,801 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2883.0518, 2767.08, 2740.7222, 2772.293, 2711.5073, 2893.9302, 2726.4854, 2707.791, 2795.7786, 342.17188]
2026-01-23 02:59:29,801 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 141.0]
2026-01-23 02:59:29,809 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 66/100 (estimated time remaining: 56 minutes, 50 seconds)
2026-01-23 03:00:59,715 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:01:06,608 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1906.24451 ± 937.754
2026-01-23 03:01:06,608 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2606.1753, 2559.1318, 2643.639, 1666.1885, 2570.6982, 1199.3243, 340.26813, 223.60898, 2544.9043, 2708.5078]
2026-01-23 03:01:06,608 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 672.0, 1000.0, 488.0, 237.0, 166.0, 1000.0, 1000.0]
2026-01-23 03:01:06,616 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 67/100 (estimated time remaining: 55 minutes, 38 seconds)
2026-01-23 03:02:35,779 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:02:45,198 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2676.90479 ± 86.423
2026-01-23 03:02:45,199 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2590.7559, 2678.6968, 2640.8235, 2589.241, 2644.8997, 2722.7249, 2877.7537, 2771.2659, 2606.234, 2646.6511]
2026-01-23 03:02:45,199 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:02:45,208 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 68/100 (estimated time remaining: 53 minutes, 25 seconds)
2026-01-23 03:04:15,001 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:04:24,372 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2718.44580 ± 110.236
2026-01-23 03:04:24,372 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2744.7725, 2664.7278, 2718.7744, 2705.126, 2447.7776, 2849.4292, 2833.1301, 2831.443, 2706.7144, 2682.5645]
2026-01-23 03:04:24,372 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 985.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:04:24,372 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (2718.45) for latency DatasetOffice
2026-01-23 03:04:24,382 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 69/100 (estimated time remaining: 52 minutes, 32 seconds)
2026-01-23 03:05:58,563 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:06:04,740 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1822.76562 ± 1204.650
2026-01-23 03:06:04,740 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [1735.2438, -1.776559, 33.35541, 187.32234, 2830.804, 2914.1328, 2042.4785, 2840.9558, 2882.1643, 2762.9756]
2026-01-23 03:06:04,740 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [642.0, 18.0, 57.0, 201.0, 1000.0, 1000.0, 715.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:06:04,748 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 70/100 (estimated time remaining: 50 minutes, 59 seconds)
2026-01-23 03:07:33,607 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:07:42,983 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2679.75220 ± 57.319
2026-01-23 03:07:42,983 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2661.7249, 2641.4626, 2749.156, 2684.4333, 2606.921, 2616.9832, 2675.9097, 2670.3892, 2809.662, 2680.8792]
2026-01-23 03:07:42,983 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:07:42,991 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 71/100 (estimated time remaining: 49 minutes, 19 seconds)
2026-01-23 03:09:14,013 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:09:22,184 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2410.79346 ± 787.125
2026-01-23 03:09:22,185 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2780.7368, 2813.5215, 1711.7041, 2852.837, 2764.1707, 2720.1582, 2712.2275, 2783.0593, 2719.138, 250.38115]
2026-01-23 03:09:22,185 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 628.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 114.0]
2026-01-23 03:09:22,194 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 72/100 (estimated time remaining: 47 minutes, 54 seconds)
2026-01-23 03:10:51,060 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:11:00,399 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2735.39844 ± 117.110
2026-01-23 03:11:00,399 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2774.421, 2785.176, 2708.9058, 2693.9316, 2840.1465, 2809.1929, 2425.2847, 2705.9062, 2866.4844, 2744.5344]
2026-01-23 03:11:00,399 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:11:00,399 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (2735.40) for latency DatasetOffice
2026-01-23 03:11:00,408 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 73/100 (estimated time remaining: 46 minutes, 13 seconds)
2026-01-23 03:12:30,163 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:12:38,641 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2550.29736 ± 772.204
2026-01-23 03:12:38,641 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2837.116, 2811.3645, 2813.2893, 2842.8071, 2722.0889, 2780.3643, 2768.8577, 2829.3508, 236.60948, 2861.127]
2026-01-23 03:12:38,641 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 142.0, 1000.0]
2026-01-23 03:12:38,651 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 74/100 (estimated time remaining: 44 minutes, 29 seconds)
2026-01-23 03:14:03,019 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:14:08,187 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1536.70361 ± 1271.588
2026-01-23 03:14:08,187 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2742.783, 251.7528, 2758.6765, 248.65211, 2829.2466, 2849.9878, 271.2891, 2858.2603, 293.58533, 262.8039]
2026-01-23 03:14:08,187 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 107.0, 1000.0, 105.0, 1000.0, 1000.0, 119.0, 1000.0, 133.0, 112.0]
2026-01-23 03:14:08,196 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 75/100 (estimated time remaining: 41 minutes, 53 seconds)
2026-01-23 03:15:41,270 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:15:50,650 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2833.47314 ± 83.363
2026-01-23 03:15:50,650 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2837.8416, 2863.5605, 3004.676, 2668.6924, 2910.264, 2764.9268, 2825.0366, 2803.7947, 2843.7625, 2812.1777]
2026-01-23 03:15:50,650 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:15:50,650 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (2833.47) for latency DatasetOffice
2026-01-23 03:15:50,659 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 76/100 (estimated time remaining: 40 minutes, 38 seconds)
2026-01-23 03:17:23,527 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:17:32,028 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2553.97168 ± 748.489
2026-01-23 03:17:32,028 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2823.0735, 2832.5562, 2855.6606, 2744.3872, 2850.2095, 2811.3105, 2704.5789, 2847.0242, 2757.7058, 313.2108]
2026-01-23 03:17:32,028 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 133.0]
2026-01-23 03:17:32,037 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 77/100 (estimated time remaining: 39 minutes, 11 seconds)
2026-01-23 03:18:53,657 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:19:00,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2157.81934 ± 866.720
2026-01-23 03:19:00,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2766.916, 2832.916, 2807.814, 2863.9111, 1643.2905, 1594.5487, 816.7378, 585.86304, 2851.2542, 2814.9421]
2026-01-23 03:19:00,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 652.0, 591.0, 424.0, 333.0, 1000.0, 1000.0]
2026-01-23 03:19:00,999 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 78/100 (estimated time remaining: 36 minutes, 50 seconds)
2026-01-23 03:20:34,095 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:20:43,347 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2821.16089 ± 67.515
2026-01-23 03:20:43,347 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2829.4705, 2774.993, 2714.0417, 2898.046, 2819.1265, 2884.0615, 2816.0132, 2936.8628, 2810.0176, 2728.9763]
2026-01-23 03:20:43,347 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:20:43,359 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 79/100 (estimated time remaining: 35 minutes, 32 seconds)
2026-01-23 03:22:13,530 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:22:22,609 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2747.87817 ± 137.112
2026-01-23 03:22:22,609 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2830.0613, 2818.776, 2365.9138, 2720.395, 2819.093, 2787.5908, 2741.7822, 2785.3228, 2718.4097, 2891.4368]
2026-01-23 03:22:22,610 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 850.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:22:22,619 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 80/100 (estimated time remaining: 34 minutes, 36 seconds)
2026-01-23 03:23:54,598 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:24:03,999 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2789.21240 ± 47.740
2026-01-23 03:24:03,999 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2778.4233, 2864.5283, 2774.9739, 2800.4226, 2675.9856, 2821.4336, 2822.6323, 2771.3975, 2817.37, 2764.9573]
2026-01-23 03:24:03,999 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:24:04,009 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 81/100 (estimated time remaining: 32 minutes, 53 seconds)
2026-01-23 03:25:33,808 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:25:43,227 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2741.52881 ± 50.388
2026-01-23 03:25:43,227 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2788.4048, 2723.675, 2728.2322, 2716.333, 2754.6997, 2650.0723, 2679.5593, 2763.029, 2783.2688, 2828.0144]
2026-01-23 03:25:43,227 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:25:43,237 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 82/100 (estimated time remaining: 31 minutes, 6 seconds)
2026-01-23 03:27:07,712 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:27:14,525 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1978.71191 ± 1011.508
2026-01-23 03:27:14,526 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2479.067, 27.538439, 275.89804, 2615.8074, 2744.896, 2817.7654, 2811.2239, 2764.966, 1535.506, 1714.4507]
2026-01-23 03:27:14,526 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [881.0, 28.0, 125.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 599.0, 730.0]
2026-01-23 03:27:14,536 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 83/100 (estimated time remaining: 29 minutes, 36 seconds)
2026-01-23 03:28:42,788 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:28:51,319 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2625.43970 ± 681.556
2026-01-23 03:28:51,319 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [589.0317, 2837.1697, 2794.6406, 2943.5596, 2764.2568, 2938.8723, 2933.2205, 2822.993, 2805.6558, 2824.9983]
2026-01-23 03:28:51,319 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [235.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:28:51,328 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 84/100 (estimated time remaining: 27 minutes, 39 seconds)
2026-01-23 03:30:21,278 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:30:30,529 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2833.39331 ± 32.273
2026-01-23 03:30:30,529 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2843.3508, 2785.082, 2843.2363, 2836.0518, 2771.9626, 2877.75, 2860.9602, 2853.102, 2809.0876, 2853.349]
2026-01-23 03:30:30,529 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:30:30,540 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 85/100 (estimated time remaining: 26 minutes, 1 second)
2026-01-23 03:32:01,061 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:32:10,469 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2868.69287 ± 51.320
2026-01-23 03:32:10,469 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2927.432, 2856.295, 2820.5562, 2875.9966, 2855.1665, 2779.383, 2944.1072, 2915.3704, 2901.2063, 2811.4158]
2026-01-23 03:32:10,469 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:32:10,469 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (2868.69) for latency DatasetOffice
2026-01-23 03:32:10,478 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 86/100 (estimated time remaining: 24 minutes, 19 seconds)
2026-01-23 03:33:40,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:33:48,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2715.26221 ± 617.141
2026-01-23 03:33:48,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3047.747, 2918.5703, 2807.2734, 2899.739, 2914.7153, 878.74976, 2874.9622, 2816.647, 3056.4333, 2937.783]
2026-01-23 03:33:48,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 279.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:33:48,678 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 87/100 (estimated time remaining: 22 minutes, 39 seconds)
2026-01-23 03:35:22,549 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:35:28,683 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1906.70630 ± 1324.321
2026-01-23 03:35:28,683 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2937.3853, 2890.6123, 24.5805, 2931.428, 2949.5298, 1442.5562, 1.4576942, -28.94846, 2858.8037, 3059.6572]
2026-01-23 03:35:28,683 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 32.0, 1000.0, 1000.0, 499.0, 19.0, 84.0, 1000.0, 1000.0]
2026-01-23 03:35:28,692 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 88/100 (estimated time remaining: 21 minutes, 24 seconds)
2026-01-23 03:36:59,344 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:37:08,766 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2851.93335 ± 39.977
2026-01-23 03:37:08,766 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2857.786, 2820.9373, 2850.822, 2812.4275, 2865.4167, 2879.2898, 2843.7417, 2819.2375, 2952.636, 2817.0386]
2026-01-23 03:37:08,766 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:37:08,793 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 89/100 (estimated time remaining: 19 minutes, 53 seconds)
2026-01-23 03:38:38,344 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:38:45,986 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2305.47388 ± 1087.389
2026-01-23 03:38:45,986 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2718.7031, 2877.3123, 2789.4158, 284.25732, 2755.667, -6.6074905, 2888.3257, 2940.9167, 2902.6692, 2904.0789]
2026-01-23 03:38:45,986 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 117.0, 1000.0, 29.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:38:45,998 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 90/100 (estimated time remaining: 18 minutes, 10 seconds)
2026-01-23 03:40:15,044 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:40:21,091 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1823.25220 ± 1345.531
2026-01-23 03:40:21,091 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2858.9333, 2870.5942, 2977.0881, 741.47034, 13.950726, 37.71013, -15.61929, 2848.0474, 2943.154, 2957.1924]
2026-01-23 03:40:21,091 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 277.0, 55.0, 55.0, 127.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:40:21,101 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 91/100 (estimated time remaining: 16 minutes, 21 seconds)
2026-01-23 03:41:51,174 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:42:00,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2881.48071 ± 34.169
2026-01-23 03:42:00,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2817.0776, 2855.2417, 2884.6223, 2883.004, 2901.2834, 2936.8257, 2906.831, 2868.582, 2844.8843, 2916.4534]
2026-01-23 03:42:00,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:42:00,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (2881.48) for latency DatasetOffice
2026-01-23 03:42:00,595 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 92/100 (estimated time remaining: 14 minutes, 45 seconds)
2026-01-23 03:43:31,688 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:43:40,243 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2608.12720 ± 652.917
2026-01-23 03:43:40,243 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2887.3142, 2909.0686, 2844.2815, 2881.2043, 2705.1616, 659.47876, 2762.1567, 2855.5696, 2727.387, 2849.6492]
2026-01-23 03:43:40,243 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 268.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:43:40,253 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 6 seconds)
2026-01-23 03:45:07,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:45:12,786 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1438.76270 ± 1228.720
2026-01-23 03:45:12,786 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [1648.8943, 14.279824, 1005.34174, 5.0146503, 376.3132, 37.946266, 2859.4941, 2794.9946, 2796.2754, 2849.0737]
2026-01-23 03:45:12,786 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [575.0, 24.0, 391.0, 17.0, 176.0, 51.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:45:12,798 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 17 seconds)
2026-01-23 03:46:37,634 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:46:47,032 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2952.08911 ± 60.759
2026-01-23 03:46:47,032 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2917.2666, 2914.6663, 2984.77, 2919.7236, 2963.1697, 2927.6968, 2880.6, 3031.8481, 2898.2842, 3082.8682]
2026-01-23 03:46:47,032 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:46:47,032 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (2952.09) for latency DatasetOffice
2026-01-23 03:46:47,042 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 95/100 (estimated time remaining: 9 minutes, 37 seconds)
2026-01-23 03:48:16,581 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:48:25,899 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2827.74805 ± 78.877
2026-01-23 03:48:25,899 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2892.822, 2682.4675, 2916.1992, 2919.351, 2846.448, 2800.7354, 2835.5427, 2831.1821, 2860.7092, 2692.0234]
2026-01-23 03:48:25,899 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:48:25,909 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 4 seconds)
2026-01-23 03:49:55,729 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:50:04,229 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2631.05786 ± 756.792
2026-01-23 03:50:04,229 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2992.8813, 2847.702, 2823.1995, 2945.4285, 2890.2466, 366.5116, 2851.5967, 2891.1494, 2902.9824, 2798.8813]
2026-01-23 03:50:04,229 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 149.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:50:04,241 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 26 seconds)
2026-01-23 03:51:33,307 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:51:41,731 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2559.12695 ± 769.077
2026-01-23 03:51:41,731 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2728.4734, 2796.7642, 257.819, 2834.1724, 2850.8, 2806.4756, 2860.7703, 2930.1843, 2763.6963, 2762.114]
2026-01-23 03:51:41,731 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 117.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:51:41,742 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 98/100 (estimated time remaining: 4 minutes, 48 seconds)
2026-01-23 03:53:14,835 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:53:21,197 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1907.76819 ± 1127.862
2026-01-23 03:53:21,197 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [1770.6393, -0.05444946, 179.11447, 2613.1343, 2818.5986, 2816.4802, 2750.5193, 2860.3767, 2672.1892, 596.6844]
2026-01-23 03:53:21,197 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [644.0, 13.0, 157.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 226.0]
2026-01-23 03:53:21,207 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 15 seconds)
2026-01-23 03:54:47,383 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:54:55,749 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2546.85596 ± 490.492
2026-01-23 03:54:55,749 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2028.8408, 2810.253, 2685.9272, 1240.2351, 2756.278, 2791.9053, 2838.1511, 2784.9304, 2766.731, 2765.3088]
2026-01-23 03:54:55,749 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [665.0, 1000.0, 1000.0, 444.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:54:55,762 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 37 seconds)
2026-01-23 03:56:21,282 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:56:30,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2761.61670 ± 263.181
2026-01-23 03:56:30,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2874.2908, 2809.6113, 2831.972, 2914.1929, 2813.0537, 2902.8972, 2891.2332, 1982.7007, 2815.9263, 2780.2878]
2026-01-23 03:56:30,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 745.0, 1000.0, 1000.0]
2026-01-23 03:56:30,312 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1299 [DEBUG]: Training session finished
