2026-01-23 01:14:46,507 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-walker2d/DatasetOffice-sac-aug-mem5 
2026-01-23 01:14:46,507 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-walker2d/DatasetOffice-sac-aug-mem5 
2026-01-23 01:14:46,507 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x152bfff25bd0>}
2026-01-23 01:14:46,507 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1159 [DEBUG]: using device: cuda
2026-01-23 01:14:46,507 baseline-sac-noisy-walker2d:77 [WARNING]: args.memorize_actions != args.horizon: 5 != 32
2026-01-23 01:14:46,649 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1181 [INFO]: Creating new trainer
2026-01-23 01:14:46,666 baseline-sac-noisy-walker2d:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=47, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2026-01-23 01:14:46,666 baseline-sac-noisy-walker2d:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=53, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-23 01:14:47,555 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1242 [DEBUG]: Starting training session...
2026-01-23 01:14:47,555 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 1/100
2026-01-23 01:16:11,644 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:16:12,229 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: -14.60696 ± 3.625
2026-01-23 01:16:12,229 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [-11.838172, -13.356264, -18.244215, -15.812878, -17.250538, -9.850876, -19.050676, -11.844038, -9.390092, -19.431828]
2026-01-23 01:16:12,229 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [69.0, 69.0, 59.0, 64.0, 59.0, 69.0, 58.0, 68.0, 69.0, 59.0]
2026-01-23 01:16:12,229 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (-14.61) for latency DatasetOffice
2026-01-23 01:16:12,233 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 19 minutes, 43 seconds)
2026-01-23 01:17:44,336 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:17:46,562 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 163.78221 ± 207.909
2026-01-23 01:17:46,562 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2.7669172, -10.413727, 52.36069, 247.29831, 9.382131, 143.84956, 9.850127, 92.0245, 623.8707, 466.83286]
2026-01-23 01:17:46,562 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [200.0, 112.0, 230.0, 152.0, 170.0, 245.0, 176.0, 117.0, 640.0, 317.0]
2026-01-23 01:17:46,562 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (163.78) for latency DatasetOffice
2026-01-23 01:17:46,566 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 26 minutes, 11 seconds)
2026-01-23 01:19:17,306 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:19:18,645 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 32.64918 ± 49.824
2026-01-23 01:19:18,645 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [15.290683, 13.219502, 16.720926, 33.343334, 14.625499, 18.050905, 180.60132, 9.116804, 20.477419, 5.04541]
2026-01-23 01:19:18,645 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [143.0, 154.0, 119.0, 148.0, 93.0, 102.0, 314.0, 160.0, 93.0, 122.0]
2026-01-23 01:19:18,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 26 minutes, 5 seconds)
2026-01-23 01:20:50,967 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:20:52,967 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 201.89780 ± 188.944
2026-01-23 01:20:52,967 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [259.80188, 25.461653, 73.17111, 79.889694, 123.986755, 177.4316, 713.09717, 250.9409, 249.07689, 66.12022]
2026-01-23 01:20:52,967 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [154.0, 72.0, 137.0, 157.0, 168.0, 237.0, 649.0, 294.0, 157.0, 111.0]
2026-01-23 01:20:52,967 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (201.90) for latency DatasetOffice
2026-01-23 01:20:52,970 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 26 minutes, 9 seconds)
2026-01-23 01:22:25,261 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:22:27,190 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 221.29688 ± 122.643
2026-01-23 01:22:27,190 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [420.41638, 94.082306, 324.84857, 355.78552, 280.19562, 45.37012, 285.21252, 194.6297, 111.977295, 100.4506]
2026-01-23 01:22:27,190 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [354.0, 151.0, 215.0, 282.0, 212.0, 90.0, 206.0, 204.0, 181.0, 175.0]
2026-01-23 01:22:27,190 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (221.30) for latency DatasetOffice
2026-01-23 01:22:27,194 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 25 minutes, 33 seconds)
2026-01-23 01:23:57,515 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:23:58,983 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 186.53169 ± 148.418
2026-01-23 01:23:58,983 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [196.60417, 38.56578, 496.7886, 210.75482, 311.721, 345.26898, 34.359035, 54.876835, 107.372734, 69.00505]
2026-01-23 01:23:58,983 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [246.0, 70.0, 223.0, 132.0, 162.0, 200.0, 63.0, 107.0, 281.0, 121.0]
2026-01-23 01:23:58,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 26 minutes, 14 seconds)
2026-01-23 01:25:32,626 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:25:34,351 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 166.90851 ± 113.042
2026-01-23 01:25:34,352 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [162.5928, 314.1285, 398.2609, 119.71446, 269.61444, 102.47097, 39.425583, 106.43045, 77.4688, 78.978134]
2026-01-23 01:25:34,352 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [252.0, 273.0, 208.0, 226.0, 156.0, 195.0, 102.0, 163.0, 166.0, 125.0]
2026-01-23 01:25:34,354 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 25 minutes)
2026-01-23 01:27:05,559 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:27:07,970 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 350.09418 ± 152.464
2026-01-23 01:27:07,970 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [399.99664, 422.98746, 700.77673, 315.6654, 302.30704, 276.26395, 244.01382, 397.60385, 69.44095, 371.886]
2026-01-23 01:27:07,970 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [205.0, 236.0, 555.0, 140.0, 172.0, 310.0, 354.0, 185.0, 108.0, 297.0]
2026-01-23 01:27:07,970 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (350.09) for latency DatasetOffice
2026-01-23 01:27:07,974 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 23 minutes, 55 seconds)
2026-01-23 01:28:39,589 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:28:41,417 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 260.28055 ± 113.752
2026-01-23 01:28:41,417 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [74.16435, 114.55673, 267.9144, 404.7234, 324.43338, 306.17334, 239.91154, 122.344444, 352.6307, 395.95316]
2026-01-23 01:28:41,417 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [185.0, 150.0, 175.0, 243.0, 208.0, 184.0, 154.0, 231.0, 181.0, 283.0]
2026-01-23 01:28:41,420 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 22 minutes, 5 seconds)
2026-01-23 01:30:13,043 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:30:15,734 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 384.23065 ± 225.219
2026-01-23 01:30:15,734 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [270.31442, 358.8538, 568.4004, 318.57983, 70.68271, 275.24673, 241.67029, 434.0002, 353.3674, 951.1908]
2026-01-23 01:30:15,734 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [139.0, 240.0, 271.0, 218.0, 104.0, 160.0, 144.0, 313.0, 206.0, 1000.0]
2026-01-23 01:30:15,734 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (384.23) for latency DatasetOffice
2026-01-23 01:30:15,739 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 20 minutes, 33 seconds)
2026-01-23 01:31:52,570 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:31:54,252 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 269.74585 ± 114.195
2026-01-23 01:31:54,252 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [391.80762, 330.9211, 336.18964, 275.64377, 400.86325, 97.55431, 240.20726, 354.3317, 224.73694, 45.203033]
2026-01-23 01:31:54,252 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [203.0, 178.0, 207.0, 144.0, 309.0, 135.0, 125.0, 167.0, 269.0, 73.0]
2026-01-23 01:31:54,256 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 20 minutes, 59 seconds)
2026-01-23 01:33:22,647 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:33:24,868 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 333.82162 ± 139.394
2026-01-23 01:33:24,868 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [336.44022, 459.5465, 279.24762, 336.85428, 230.13095, 241.60182, 648.9341, 169.50858, 196.62413, 439.32822]
2026-01-23 01:33:24,868 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [258.0, 264.0, 253.0, 185.0, 140.0, 159.0, 473.0, 198.0, 173.0, 274.0]
2026-01-23 01:33:24,871 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 18 minutes, 1 second)
2026-01-23 01:34:56,631 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:34:59,437 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 404.71344 ± 178.929
2026-01-23 01:34:59,437 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [345.23407, 375.2513, 278.85516, 149.50308, 368.24048, 432.0803, 314.2251, 863.4301, 523.17633, 397.1382]
2026-01-23 01:34:59,437 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [178.0, 173.0, 126.0, 232.0, 241.0, 250.0, 289.0, 731.0, 482.0, 245.0]
2026-01-23 01:34:59,437 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (404.71) for latency DatasetOffice
2026-01-23 01:34:59,441 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 16 minutes, 43 seconds)
2026-01-23 01:36:30,125 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:36:32,128 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 275.84528 ± 164.476
2026-01-23 01:36:32,128 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [88.159935, 28.057611, 238.53877, 226.67636, 581.9421, 460.92078, 292.9503, 443.70807, 168.87837, 228.62032]
2026-01-23 01:36:32,128 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [181.0, 45.0, 128.0, 134.0, 487.0, 365.0, 161.0, 349.0, 153.0, 147.0]
2026-01-23 01:36:32,138 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 14 minutes, 56 seconds)
2026-01-23 01:38:11,712 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:38:14,311 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 386.38766 ± 223.739
2026-01-23 01:38:14,311 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [244.65034, 343.85825, 188.93556, 196.25964, 320.0772, 325.69885, 436.21042, 1013.0557, 393.69632, 401.43427]
2026-01-23 01:38:14,311 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [142.0, 223.0, 107.0, 122.0, 162.0, 165.0, 311.0, 1000.0, 221.0, 244.0]
2026-01-23 01:38:14,316 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 15 minutes, 35 seconds)
2026-01-23 01:39:43,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:39:44,484 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 147.24484 ± 125.384
2026-01-23 01:39:44,485 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [174.0131, 165.23874, 399.33066, 13.656904, 212.91945, 216.40341, 12.349272, 253.27249, 11.499724, 13.764701]
2026-01-23 01:39:44,485 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [125.0, 104.0, 231.0, 31.0, 137.0, 121.0, 28.0, 162.0, 28.0, 31.0]
2026-01-23 01:39:44,488 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 11 minutes, 39 seconds)
2026-01-23 01:41:15,245 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:41:17,841 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 426.97574 ± 82.217
2026-01-23 01:41:17,841 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [532.83276, 419.98315, 257.1726, 419.3721, 307.2524, 475.6843, 502.34683, 486.90564, 407.9211, 460.2862]
2026-01-23 01:41:17,841 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [372.0, 338.0, 136.0, 217.0, 205.0, 307.0, 313.0, 332.0, 251.0, 302.0]
2026-01-23 01:41:17,841 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (426.98) for latency DatasetOffice
2026-01-23 01:41:17,845 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 10 minutes, 51 seconds)
2026-01-23 01:42:52,991 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:42:55,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 359.14975 ± 193.022
2026-01-23 01:42:55,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [229.25443, 770.0988, 437.34525, 165.60602, 372.58466, 603.065, 359.48672, 122.63434, 337.03546, 194.38698]
2026-01-23 01:42:55,338 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [152.0, 583.0, 224.0, 183.0, 227.0, 446.0, 210.0, 147.0, 228.0, 110.0]
2026-01-23 01:42:55,342 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 10 minutes, 4 seconds)
2026-01-23 01:44:27,266 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:44:29,436 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 417.21411 ± 175.490
2026-01-23 01:44:29,436 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [303.02713, 529.70636, 666.36426, 567.16833, 647.12274, 208.738, 211.75096, 507.80692, 216.3922, 314.06454]
2026-01-23 01:44:29,436 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [148.0, 287.0, 368.0, 380.0, 409.0, 110.0, 115.0, 224.0, 146.0, 162.0]
2026-01-23 01:44:29,440 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 8 minutes, 52 seconds)
2026-01-23 01:46:01,596 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:46:03,388 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 296.53751 ± 101.646
2026-01-23 01:46:03,388 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [259.83002, 214.65169, 330.16696, 547.6621, 259.11267, 257.49673, 374.03055, 153.46521, 310.76202, 258.1971]
2026-01-23 01:46:03,388 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [136.0, 123.0, 148.0, 396.0, 155.0, 150.0, 278.0, 290.0, 140.0, 133.0]
2026-01-23 01:46:03,393 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 5 minutes, 5 seconds)
2026-01-23 01:47:35,950 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:47:37,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 386.90771 ± 146.633
2026-01-23 01:47:37,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [589.7148, 499.9419, 436.8309, 128.3935, 334.18732, 267.17224, 613.2371, 233.91693, 381.67288, 384.00955]
2026-01-23 01:47:37,798 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [372.0, 243.0, 267.0, 96.0, 161.0, 125.0, 300.0, 106.0, 163.0, 202.0]
2026-01-23 01:47:37,802 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 4 minutes, 38 seconds)
2026-01-23 01:49:10,786 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:49:12,052 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 285.71353 ± 48.986
2026-01-23 01:49:12,052 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [270.77435, 228.49652, 221.76486, 346.75714, 231.32318, 342.91415, 257.36316, 320.9235, 352.11154, 284.70667]
2026-01-23 01:49:12,052 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [121.0, 136.0, 109.0, 172.0, 113.0, 158.0, 128.0, 173.0, 137.0, 138.0]
2026-01-23 01:49:12,058 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 3 minutes, 17 seconds)
2026-01-23 01:51:03,848 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:51:06,091 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 366.92151 ± 220.728
2026-01-23 01:51:06,091 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [249.01103, 274.33688, 344.25983, 328.40952, 729.9942, 817.63257, 316.9294, 351.69934, 203.72217, 53.220375]
2026-01-23 01:51:06,091 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [128.0, 167.0, 164.0, 161.0, 464.0, 606.0, 218.0, 212.0, 148.0, 115.0]
2026-01-23 01:51:06,095 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 5 minutes, 57 seconds)
2026-01-23 01:52:40,247 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:52:42,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 428.64972 ± 159.276
2026-01-23 01:52:42,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [357.2864, 488.1707, 863.9996, 338.546, 340.32846, 458.49875, 297.10873, 289.7929, 393.8372, 458.92862]
2026-01-23 01:52:42,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [150.0, 232.0, 603.0, 148.0, 162.0, 219.0, 128.0, 156.0, 207.0, 255.0]
2026-01-23 01:52:42,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (428.65) for latency DatasetOffice
2026-01-23 01:52:42,353 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 4 minutes, 52 seconds)
2026-01-23 01:54:13,365 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:54:14,863 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 301.91443 ± 140.323
2026-01-23 01:54:14,863 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [437.51028, 272.96896, 256.73474, 356.76794, 168.68076, 197.93369, 182.34671, 447.89914, 577.1818, 121.12013]
2026-01-23 01:54:14,863 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [234.0, 145.0, 110.0, 148.0, 79.0, 126.0, 113.0, 242.0, 359.0, 83.0]
2026-01-23 01:54:14,869 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 2 minutes, 52 seconds)
2026-01-23 01:55:55,501 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:55:57,663 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 456.06729 ± 218.800
2026-01-23 01:55:57,663 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [321.23215, 405.6647, 300.8956, 321.57574, 434.59427, 365.67307, 431.42142, 883.30054, 226.35095, 869.96454]
2026-01-23 01:55:57,663 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [145.0, 196.0, 144.0, 169.0, 221.0, 183.0, 211.0, 479.0, 114.0, 465.0]
2026-01-23 01:55:57,663 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (456.07) for latency DatasetOffice
2026-01-23 01:55:57,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 3 minutes, 18 seconds)
2026-01-23 01:57:24,573 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:57:26,727 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 489.85748 ± 141.243
2026-01-23 01:57:26,728 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [472.2771, 448.1367, 820.5434, 462.79016, 697.7657, 444.02036, 398.1752, 394.01663, 364.74765, 396.1022]
2026-01-23 01:57:26,728 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [197.0, 210.0, 421.0, 249.0, 335.0, 212.0, 192.0, 176.0, 170.0, 183.0]
2026-01-23 01:57:26,728 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (489.86) for latency DatasetOffice
2026-01-23 01:57:26,733 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 22 seconds)
2026-01-23 01:58:57,146 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:58:59,034 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 392.31732 ± 135.675
2026-01-23 01:58:59,035 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [376.91583, 439.01566, 523.5416, 355.1192, 272.83197, 332.359, 460.12555, 261.14993, 692.3866, 209.72798]
2026-01-23 01:58:59,035 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [179.0, 232.0, 304.0, 193.0, 145.0, 182.0, 237.0, 143.0, 316.0, 126.0]
2026-01-23 01:58:59,039 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 29/100 (estimated time remaining: 1 hour, 53 minutes, 30 seconds)
2026-01-23 02:00:32,567 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:00:34,034 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 326.99323 ± 140.894
2026-01-23 02:00:34,034 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [121.772644, 485.78473, 158.70087, 248.58337, 355.98697, 306.16858, 519.7558, 336.46545, 202.87741, 533.83636]
2026-01-23 02:00:34,034 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [103.0, 228.0, 105.0, 129.0, 178.0, 133.0, 213.0, 165.0, 121.0, 239.0]
2026-01-23 02:00:34,041 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 30/100 (estimated time remaining: 1 hour, 51 minutes, 37 seconds)
2026-01-23 02:02:06,431 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:02:10,333 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 906.33655 ± 677.138
2026-01-23 02:02:10,333 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [1239.3726, 707.1006, 569.2502, 300.0813, 1878.2408, 2314.1362, 338.47763, 282.3868, 1066.1046, 368.2145]
2026-01-23 02:02:10,333 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [510.0, 319.0, 260.0, 170.0, 842.0, 988.0, 165.0, 157.0, 465.0, 195.0]
2026-01-23 02:02:10,333 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (906.34) for latency DatasetOffice
2026-01-23 02:02:10,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 50 minutes, 56 seconds)
2026-01-23 02:03:46,581 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:03:49,085 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 617.10297 ± 309.030
2026-01-23 02:03:49,085 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [1102.4026, 634.33, 401.5506, 526.19684, 1287.6499, 492.3429, 478.07388, 604.1055, 376.0852, 268.29263]
2026-01-23 02:03:49,086 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [431.0, 278.0, 204.0, 238.0, 484.0, 258.0, 230.0, 261.0, 200.0, 155.0]
2026-01-23 02:03:49,090 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 48 minutes, 25 seconds)
2026-01-23 02:05:18,780 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:05:20,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 348.96960 ± 111.677
2026-01-23 02:05:20,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [418.28113, 248.74762, 345.90756, 441.38962, 300.1022, 318.65826, 103.90488, 533.7664, 405.65378, 373.28445]
2026-01-23 02:05:20,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [182.0, 155.0, 178.0, 207.0, 130.0, 170.0, 129.0, 217.0, 186.0, 160.0]
2026-01-23 02:05:20,309 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 47 minutes, 20 seconds)
2026-01-23 02:06:56,165 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:06:57,871 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 434.50766 ± 75.895
2026-01-23 02:06:57,872 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [400.07736, 489.88217, 346.66058, 423.15515, 364.19568, 414.0357, 426.21906, 520.9079, 599.2877, 360.65506]
2026-01-23 02:06:57,872 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [173.0, 208.0, 167.0, 184.0, 168.0, 176.0, 177.0, 223.0, 255.0, 163.0]
2026-01-23 02:06:57,876 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 46 minutes, 56 seconds)
2026-01-23 02:08:25,939 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:08:29,980 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 981.11200 ± 739.727
2026-01-23 02:08:29,980 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [1602.9783, 2683.1165, 652.2579, 562.5524, 1767.4197, 680.85394, 305.2441, 775.4202, 401.31082, 379.96576]
2026-01-23 02:08:29,980 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [710.0, 1000.0, 329.0, 226.0, 702.0, 334.0, 177.0, 351.0, 212.0, 239.0]
2026-01-23 02:08:29,980 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (981.11) for latency DatasetOffice
2026-01-23 02:08:29,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 44 minutes, 42 seconds)
2026-01-23 02:10:02,062 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:10:03,869 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 512.30066 ± 161.426
2026-01-23 02:10:03,869 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [509.28326, 205.87996, 531.305, 805.45984, 533.84595, 403.32474, 699.72766, 515.64136, 342.49496, 576.044]
2026-01-23 02:10:03,869 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [208.0, 97.0, 221.0, 275.0, 221.0, 173.0, 252.0, 203.0, 158.0, 228.0]
2026-01-23 02:10:03,876 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 42 minutes, 36 seconds)
2026-01-23 02:11:36,309 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:11:38,181 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 486.28375 ± 401.781
2026-01-23 02:11:38,181 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [362.59256, 408.49207, 504.63538, 441.9898, 255.86765, 361.7775, 184.10141, 516.7437, 1643.0718, 183.56586]
2026-01-23 02:11:38,181 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [149.0, 187.0, 211.0, 213.0, 107.0, 163.0, 89.0, 216.0, 625.0, 98.0]
2026-01-23 02:11:38,186 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 40 minutes, 4 seconds)
2026-01-23 02:13:12,004 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:13:13,553 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 385.58939 ± 147.507
2026-01-23 02:13:13,553 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [209.82535, 468.31082, 220.4625, 282.0313, 473.90875, 180.03313, 369.58502, 577.85284, 470.73676, 603.14716]
2026-01-23 02:13:13,553 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [104.0, 206.0, 113.0, 143.0, 209.0, 95.0, 184.0, 220.0, 201.0, 269.0]
2026-01-23 02:13:13,558 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 39 minutes, 22 seconds)
2026-01-23 02:14:43,201 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:14:45,590 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 668.09216 ± 283.380
2026-01-23 02:14:45,590 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [679.63995, 692.11957, 1271.6964, 778.33575, 356.85574, 647.4086, 927.89484, 582.53265, 173.91925, 570.5192]
2026-01-23 02:14:45,590 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [248.0, 265.0, 522.0, 288.0, 168.0, 246.0, 339.0, 249.0, 100.0, 230.0]
2026-01-23 02:14:45,596 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 36 minutes, 39 seconds)
2026-01-23 02:16:18,621 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:16:20,606 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 543.04407 ± 130.888
2026-01-23 02:16:20,606 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [687.96906, 445.59915, 457.8585, 477.5973, 353.8636, 540.6701, 797.2221, 677.60205, 552.38025, 439.67807]
2026-01-23 02:16:20,606 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [314.0, 181.0, 187.0, 213.0, 160.0, 222.0, 291.0, 254.0, 213.0, 175.0]
2026-01-23 02:16:20,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 35 minutes, 41 seconds)
2026-01-23 02:17:53,256 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:17:54,840 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 360.36783 ± 204.388
2026-01-23 02:17:54,840 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [665.52075, 259.20975, 232.88109, 603.8493, 229.5807, 169.14423, 346.03308, 712.442, 240.82237, 144.19508]
2026-01-23 02:17:54,840 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [260.0, 131.0, 112.0, 257.0, 115.0, 103.0, 180.0, 390.0, 112.0, 88.0]
2026-01-23 02:17:54,845 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 34 minutes, 11 seconds)
2026-01-23 02:19:33,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:19:39,440 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1512.47131 ± 823.909
2026-01-23 02:19:39,440 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [513.8302, 632.76733, 1165.7311, 598.5703, 1048.209, 2507.4783, 2433.7703, 2346.3667, 1250.5, 2627.4905]
2026-01-23 02:19:39,440 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [204.0, 233.0, 506.0, 233.0, 427.0, 1000.0, 1000.0, 1000.0, 553.0, 1000.0]
2026-01-23 02:19:39,440 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (1512.47) for latency DatasetOffice
2026-01-23 02:19:39,446 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 34 minutes, 38 seconds)
2026-01-23 02:21:05,404 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:21:07,319 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 465.85004 ± 716.676
2026-01-23 02:21:07,320 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [359.6818, 147.10008, 2563.1465, 141.29985, 165.41774, 154.92126, 665.5953, 164.75804, 138.20288, 158.3775]
2026-01-23 02:21:07,320 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [180.0, 89.0, 1000.0, 84.0, 88.0, 87.0, 249.0, 91.0, 87.0, 87.0]
2026-01-23 02:21:07,325 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 31 minutes, 35 seconds)
2026-01-23 02:22:48,175 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:22:50,672 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 708.63855 ± 677.160
2026-01-23 02:22:50,672 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [337.5988, 328.43393, 763.09344, 779.5538, 648.48456, 775.946, 2630.339, 400.4527, 164.48386, 257.9997]
2026-01-23 02:22:50,672 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [145.0, 158.0, 250.0, 248.0, 225.0, 308.0, 1000.0, 167.0, 83.0, 113.0]
2026-01-23 02:22:50,680 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 32 minutes, 9 seconds)
2026-01-23 02:24:17,570 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:24:23,863 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1667.83398 ± 903.059
2026-01-23 02:24:23,863 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [1911.4728, 1641.7639, 1356.8337, 2532.002, 2706.8596, 240.40744, 721.4273, 2632.2253, 2523.2822, 412.0648]
2026-01-23 02:24:23,863 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [712.0, 661.0, 531.0, 1000.0, 1000.0, 111.0, 247.0, 1000.0, 1000.0, 165.0]
2026-01-23 02:24:23,863 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (1667.83) for latency DatasetOffice
2026-01-23 02:24:23,870 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 30 minutes, 12 seconds)
2026-01-23 02:26:00,837 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:26:09,659 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2396.49951 ± 717.025
2026-01-23 02:26:09,659 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2771.477, 2818.923, 2791.5884, 2744.3313, 378.0806, 1955.4309, 2435.1924, 2790.3486, 2612.0632, 2667.5608]
2026-01-23 02:26:09,660 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 161.0, 720.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:26:09,660 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (2396.50) for latency DatasetOffice
2026-01-23 02:26:09,670 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 30 minutes, 43 seconds)
2026-01-23 02:27:36,069 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:27:41,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1439.23889 ± 1172.717
2026-01-23 02:27:41,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2591.0452, 151.5456, 2620.534, 193.54828, 388.7162, 2594.161, 2609.0603, 250.68643, 2635.7285, 357.36295]
2026-01-23 02:27:41,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 90.0, 1000.0, 91.0, 187.0, 1000.0, 1000.0, 109.0, 1000.0, 221.0]
2026-01-23 02:27:41,674 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 26 minutes, 48 seconds)
2026-01-23 02:29:15,146 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:29:20,694 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1536.35022 ± 1205.078
2026-01-23 02:29:20,694 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2751.7546, 2750.1846, 445.57364, 2712.5046, 2781.5608, 259.01886, 268.6185, 278.99252, 411.50903, 2703.7852]
2026-01-23 02:29:20,694 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 181.0, 1000.0, 1000.0, 107.0, 110.0, 122.0, 169.0, 1000.0]
2026-01-23 02:29:20,701 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 27 minutes, 9 seconds)
2026-01-23 02:30:54,515 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:31:00,907 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1714.56873 ± 1179.387
2026-01-23 02:31:00,907 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2756.7542, 242.95546, 2677.2046, 2655.8118, 311.81107, 2625.833, 2675.4563, 319.0236, 210.413, 2670.4246]
2026-01-23 02:31:00,907 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 113.0, 1000.0, 1000.0, 135.0, 1000.0, 1000.0, 129.0, 111.0, 1000.0]
2026-01-23 02:31:00,914 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 24 minutes, 58 seconds)
2026-01-23 02:32:35,330 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:32:42,706 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2218.80835 ± 1227.984
2026-01-23 02:32:42,706 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2833.0952, 3099.525, 3007.9026, 3021.7932, 457.0, 3046.3347, 3095.8098, 289.29822, 295.20975, 3042.1172]
2026-01-23 02:32:42,706 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 979.0, 184.0, 1000.0, 1000.0, 130.0, 136.0, 1000.0]
2026-01-23 02:32:42,714 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 24 minutes, 48 seconds)
2026-01-23 02:34:17,060 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:34:21,243 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1202.37097 ± 1193.487
2026-01-23 02:34:21,243 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [1716.3643, 22.007235, 185.98201, 764.1358, 2882.5398, 259.57462, 2857.0393, 2917.0632, 204.26926, 214.73404]
2026-01-23 02:34:21,243 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [605.0, 33.0, 93.0, 300.0, 1000.0, 118.0, 1000.0, 1000.0, 96.0, 99.0]
2026-01-23 02:34:21,251 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 21 minutes, 55 seconds)
2026-01-23 02:35:58,101 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:36:06,327 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2334.61279 ± 943.543
2026-01-23 02:36:06,327 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2710.4326, 2818.9512, 2793.5618, 2857.4414, 2834.6782, 2785.096, 425.30307, 2831.0212, 472.98315, 2816.6611]
2026-01-23 02:36:06,327 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 188.0, 1000.0, 207.0, 1000.0]
2026-01-23 02:36:06,333 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 22 minutes, 25 seconds)
2026-01-23 02:37:38,428 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:37:42,310 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1156.55017 ± 1214.282
2026-01-23 02:37:42,310 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3016.5488, 349.25116, 325.0856, 398.00534, 233.46997, 3004.753, 326.92865, 276.41556, 642.33136, 2992.7131]
2026-01-23 02:37:42,311 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 145.0, 133.0, 161.0, 102.0, 1000.0, 141.0, 121.0, 236.0, 1000.0]
2026-01-23 02:37:42,317 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 20 minutes, 15 seconds)
2026-01-23 02:39:13,076 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:39:22,278 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2673.53784 ± 338.415
2026-01-23 02:39:22,278 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [1979.7781, 2888.0803, 2861.2986, 2018.9397, 2863.6501, 2827.5803, 2788.4316, 2872.658, 2817.3313, 2817.63]
2026-01-23 02:39:22,278 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [720.0, 1000.0, 1000.0, 718.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:39:22,279 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (2673.54) for latency DatasetOffice
2026-01-23 02:39:22,287 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 18 minutes, 32 seconds)
2026-01-23 02:40:49,321 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:40:56,434 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2226.04541 ± 1289.446
2026-01-23 02:40:56,434 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3066.9678, 3067.43, 3086.951, 3091.2605, 3070.2842, 3052.0112, 250.6456, 233.10587, 285.8831, 3055.9148]
2026-01-23 02:40:56,434 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 118.0, 110.0, 128.0, 1000.0]
2026-01-23 02:40:56,441 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 15 minutes, 42 seconds)
2026-01-23 02:42:29,809 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:42:37,810 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2600.08130 ± 939.381
2026-01-23 02:42:37,810 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3154.773, 3178.5479, 2055.6548, 871.7701, 3169.128, 815.0557, 3208.6062, 3164.1162, 3179.5557, 3203.6062]
2026-01-23 02:42:37,810 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 658.0, 277.0, 1000.0, 260.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:42:37,817 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 14 minutes, 29 seconds)
2026-01-23 02:44:11,352 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:44:21,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3110.07666 ± 32.579
2026-01-23 02:44:21,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3147.9058, 3149.2434, 3094.4587, 3123.4478, 3114.4944, 3142.1487, 3072.607, 3118.2986, 3043.147, 3095.0159]
2026-01-23 02:44:21,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:44:21,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (3110.08) for latency DatasetOffice
2026-01-23 02:44:21,123 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 12 minutes, 34 seconds)
2026-01-23 02:45:52,775 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:46:02,003 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2906.69678 ± 521.991
2026-01-23 02:46:02,003 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3075.3706, 3054.8574, 3120.2195, 3077.8489, 3048.996, 3094.6982, 3006.3083, 3147.7754, 1344.6776, 3096.214]
2026-01-23 02:46:02,003 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 451.0, 1000.0]
2026-01-23 02:46:02,010 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 11 minutes, 37 seconds)
2026-01-23 02:47:25,674 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:47:33,534 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2590.80298 ± 1034.400
2026-01-23 02:47:33,535 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3263.6853, 3262.9077, 3222.322, 3165.5408, 3260.356, 3200.4912, 2002.8431, 940.1742, 396.8496, 3192.8596]
2026-01-23 02:47:33,535 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 602.0, 327.0, 201.0, 1000.0]
2026-01-23 02:47:33,542 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 8 minutes, 46 seconds)
2026-01-23 02:49:04,595 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:49:12,276 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2587.47998 ± 1041.904
2026-01-23 02:49:12,276 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [574.4884, 3231.1934, 3321.6118, 3214.2732, 1612.2948, 3257.4136, 3192.5718, 3317.8745, 928.5438, 3224.5352]
2026-01-23 02:49:12,276 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [203.0, 1000.0, 1000.0, 1000.0, 512.0, 1000.0, 1000.0, 1000.0, 300.0, 1000.0]
2026-01-23 02:49:12,285 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 7 minutes, 45 seconds)
2026-01-23 02:50:49,685 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:50:57,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2796.97095 ± 1004.610
2026-01-23 02:50:57,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3295.0522, 616.9305, 3354.2239, 3267.0088, 3273.039, 973.34344, 3340.6667, 3257.4854, 3261.8416, 3330.117]
2026-01-23 02:50:57,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 211.0, 1000.0, 1000.0, 1000.0, 310.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:50:57,930 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 6 minutes, 40 seconds)
2026-01-23 02:52:22,295 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:52:29,749 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2545.62646 ± 983.033
2026-01-23 02:52:29,749 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3351.2651, 3273.6577, 3038.351, 3312.9048, 2276.042, 2078.7532, 1057.8372, 522.8123, 3292.998, 3251.6423]
2026-01-23 02:52:29,749 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 904.0, 1000.0, 700.0, 668.0, 369.0, 212.0, 1000.0, 1000.0]
2026-01-23 02:52:29,758 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 3 minutes, 31 seconds)
2026-01-23 02:54:06,389 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:54:15,321 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2900.27539 ± 708.009
2026-01-23 02:54:15,322 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3134.2957, 3159.4692, 3136.3726, 3192.6133, 3099.5159, 3079.6194, 3167.518, 780.3268, 3056.5493, 3196.4746]
2026-01-23 02:54:15,322 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 275.0, 1000.0, 1000.0]
2026-01-23 02:54:15,329 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 2 minutes, 29 seconds)
2026-01-23 02:55:47,882 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:55:57,501 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3237.58838 ± 38.777
2026-01-23 02:55:57,502 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3158.492, 3224.4814, 3221.9993, 3197.6265, 3221.292, 3269.5903, 3282.6855, 3271.1863, 3243.1353, 3285.395]
2026-01-23 02:55:57,502 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:55:57,502 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (3237.59) for latency DatasetOffice
2026-01-23 02:55:57,509 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 2 minutes, 9 seconds)
2026-01-23 02:57:28,522 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:57:37,486 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2836.20898 ± 770.066
2026-01-23 02:57:37,486 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3098.7388, 3072.5176, 3108.0273, 3036.8076, 3115.977, 3068.0002, 3123.3606, 3106.5789, 3104.8582, 527.2256]
2026-01-23 02:57:37,486 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 222.0]
2026-01-23 02:57:37,494 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 37 seconds)
2026-01-23 02:59:01,361 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:59:10,359 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2998.02979 ± 732.673
2026-01-23 02:59:10,365 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3246.739, 3235.9727, 3219.8608, 802.8729, 3253.7166, 3266.726, 3221.005, 3331.297, 3222.366, 3179.7415]
2026-01-23 02:59:10,365 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 304.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:59:10,374 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 66/100 (estimated time remaining: 57 minutes, 27 seconds)
2026-01-23 03:00:43,526 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:00:50,385 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2193.51709 ± 1305.268
2026-01-23 03:00:50,385 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3042.6453, 3154.2043, 2489.04, 571.6994, 20.302694, 101.49248, 3135.785, 3136.5715, 3120.884, 3162.547]
2026-01-23 03:00:50,385 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 803.0, 244.0, 32.0, 92.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:00:50,392 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 67/100 (estimated time remaining: 56 minutes, 44 seconds)
2026-01-23 03:02:17,423 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:02:27,067 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3116.51489 ± 37.877
2026-01-23 03:02:27,067 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3120.5981, 3175.6838, 3115.1497, 3061.0652, 3130.782, 3189.4531, 3091.01, 3091.3604, 3091.7104, 3098.3352]
2026-01-23 03:02:27,067 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:02:27,075 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 68/100 (estimated time remaining: 54 minutes, 5 seconds)
2026-01-23 03:03:58,084 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:04:07,845 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3237.66699 ± 32.932
2026-01-23 03:04:07,845 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3251.7725, 3232.2693, 3216.345, 3227.106, 3203.7231, 3303.9604, 3227.5735, 3291.0735, 3220.7002, 3202.147]
2026-01-23 03:04:07,845 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:04:07,846 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (3237.67) for latency DatasetOffice
2026-01-23 03:04:07,854 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 69/100 (estimated time remaining: 52 minutes, 18 seconds)
2026-01-23 03:05:45,541 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:05:55,201 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3235.72705 ± 53.364
2026-01-23 03:05:55,201 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3327.199, 3334.862, 3237.5415, 3247.3994, 3209.298, 3236.6733, 3208.5376, 3193.058, 3159.3342, 3203.365]
2026-01-23 03:05:55,201 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:05:55,210 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 70/100 (estimated time remaining: 51 minutes, 25 seconds)
2026-01-23 03:07:26,146 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:07:35,881 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3142.96558 ± 40.958
2026-01-23 03:07:35,882 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3222.166, 3152.88, 3189.1033, 3123.3914, 3148.3738, 3177.8628, 3124.0535, 3094.4092, 3096.1711, 3101.2449]
2026-01-23 03:07:35,882 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:07:35,889 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 71/100 (estimated time remaining: 50 minutes, 33 seconds)
2026-01-23 03:08:59,750 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:09:07,854 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2713.81396 ± 1124.285
2026-01-23 03:09:07,854 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3330.5613, 3289.999, 3233.405, 3292.2786, 3184.1648, 3267.8503, 879.5877, 106.77846, 3267.2227, 3286.2927]
2026-01-23 03:09:07,854 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 338.0, 71.0, 1000.0, 1000.0]
2026-01-23 03:09:07,863 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 72/100 (estimated time remaining: 48 minutes, 5 seconds)
2026-01-23 03:10:45,961 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:10:55,725 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3025.53271 ± 40.016
2026-01-23 03:10:55,725 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3002.0825, 2966.2778, 2977.1638, 3054.639, 3002.9248, 3085.7476, 2995.8179, 3037.1917, 3056.4207, 3077.063]
2026-01-23 03:10:55,726 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:10:55,733 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 73/100 (estimated time remaining: 47 minutes, 28 seconds)
2026-01-23 03:12:25,146 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:12:33,499 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2745.35205 ± 926.313
2026-01-23 03:12:33,499 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3209.6963, 3233.8486, 3153.6946, 1411.8143, 3191.7502, 3224.9036, 474.23874, 3106.2678, 3279.502, 3167.8052]
2026-01-23 03:12:33,499 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 467.0, 1000.0, 1000.0, 201.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:12:33,506 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 74/100 (estimated time remaining: 45 minutes, 30 seconds)
2026-01-23 03:14:06,356 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:14:13,803 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2290.28711 ± 1175.364
2026-01-23 03:14:13,803 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3041.7734, 1731.604, 19.812737, 116.36688, 2820.8647, 3086.9097, 2993.3577, 3003.1523, 3026.7607, 3062.27]
2026-01-23 03:14:13,803 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 605.0, 27.0, 128.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:14:13,811 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 75/100 (estimated time remaining: 43 minutes, 12 seconds)
2026-01-23 03:15:44,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:15:54,417 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3340.59058 ± 29.138
2026-01-23 03:15:54,418 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3331.655, 3338.9702, 3323.8105, 3364.9204, 3328.2808, 3378.5376, 3336.227, 3372.4517, 3358.2598, 3272.796]
2026-01-23 03:15:54,418 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:15:54,418 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (3340.59) for latency DatasetOffice
2026-01-23 03:15:54,427 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 76/100 (estimated time remaining: 41 minutes, 32 seconds)
2026-01-23 03:17:25,486 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:17:34,766 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3043.32764 ± 342.780
2026-01-23 03:17:34,767 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3173.7422, 3098.3398, 2023.0267, 3138.534, 3171.855, 3116.601, 3178.971, 3100.8765, 3246.2537, 3185.0767]
2026-01-23 03:17:34,767 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 636.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:17:34,774 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 77/100 (estimated time remaining: 40 minutes, 33 seconds)
2026-01-23 03:19:05,419 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:19:15,146 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3203.26904 ± 24.382
2026-01-23 03:19:15,146 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3134.6218, 3201.4705, 3211.754, 3207.3389, 3212.4436, 3229.206, 3200.2253, 3216.5999, 3217.2122, 3201.8167]
2026-01-23 03:19:15,146 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:19:15,154 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 78/100 (estimated time remaining: 38 minutes, 17 seconds)
2026-01-23 03:20:46,133 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:20:55,116 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2937.11743 ± 835.615
2026-01-23 03:20:55,116 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3217.1936, 3164.8582, 3196.4744, 3192.5461, 3268.5457, 3185.1755, 3192.5723, 433.74368, 3321.8428, 3198.2214]
2026-01-23 03:20:55,116 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 191.0, 1000.0, 1000.0]
2026-01-23 03:20:55,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 79/100 (estimated time remaining: 36 minutes, 47 seconds)
2026-01-23 03:22:20,550 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:22:29,176 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2802.85840 ± 847.140
2026-01-23 03:22:29,176 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3182.2664, 3188.4849, 3216.701, 3178.2112, 3178.557, 2068.4302, 469.02777, 3092.6309, 3240.4592, 3213.8164]
2026-01-23 03:22:29,177 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 693.0, 210.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:22:29,186 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 80/100 (estimated time remaining: 34 minutes, 40 seconds)
2026-01-23 03:24:00,181 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:24:09,813 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3275.69580 ± 50.206
2026-01-23 03:24:09,813 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3270.1558, 3210.833, 3248.2415, 3254.903, 3252.3464, 3308.205, 3302.7275, 3351.021, 3202.611, 3355.9128]
2026-01-23 03:24:09,813 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:24:09,822 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 81/100 (estimated time remaining: 33 minutes, 1 second)
2026-01-23 03:25:38,913 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:25:48,374 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3185.41553 ± 281.952
2026-01-23 03:25:48,374 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3285.3533, 3294.5496, 3279.6506, 3282.7617, 3303.0088, 2341.1626, 3238.0137, 3280.0398, 3289.301, 3260.315]
2026-01-23 03:25:48,374 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 724.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:25:48,384 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 82/100 (estimated time remaining: 31 minutes, 15 seconds)
2026-01-23 03:27:22,013 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:27:31,104 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2961.52881 ± 573.024
2026-01-23 03:27:31,104 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [1244.652, 3189.99, 3107.0527, 3149.7583, 3112.6016, 3143.208, 3153.2446, 3163.725, 3207.4294, 3143.6277]
2026-01-23 03:27:31,104 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [449.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:27:31,115 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 83/100 (estimated time remaining: 29 minutes, 45 seconds)
2026-01-23 03:29:02,180 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:29:11,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3403.34961 ± 27.122
2026-01-23 03:29:11,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3436.0142, 3422.623, 3422.7854, 3439.8088, 3403.9836, 3379.9333, 3395.2148, 3382.3481, 3347.183, 3403.5999]
2026-01-23 03:29:11,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:29:11,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (3403.35) for latency DatasetOffice
2026-01-23 03:29:11,999 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 84/100 (estimated time remaining: 28 minutes, 9 seconds)
2026-01-23 03:30:46,894 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:30:54,968 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2663.41089 ± 1170.470
2026-01-23 03:30:54,968 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3250.7249, 3289.2295, 3305.4019, 3200.095, 3238.2952, 3223.4832, 3196.071, 259.4744, 388.9295, 3282.4045]
2026-01-23 03:30:54,968 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 130.0, 181.0, 1000.0]
2026-01-23 03:30:54,978 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 85/100 (estimated time remaining: 26 minutes, 58 seconds)
2026-01-23 03:32:25,984 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:32:35,603 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3265.05225 ± 38.854
2026-01-23 03:32:35,603 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3182.1333, 3210.7922, 3297.187, 3308.1995, 3256.4712, 3274.8147, 3268.5894, 3299.341, 3296.0293, 3256.963]
2026-01-23 03:32:35,603 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:32:35,612 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 86/100 (estimated time remaining: 25 minutes, 17 seconds)
2026-01-23 03:34:06,559 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:34:15,588 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3015.52783 ± 359.133
2026-01-23 03:34:15,588 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3200.1494, 2537.7063, 3177.2688, 3151.141, 3193.9587, 3231.9453, 3216.6558, 2111.7234, 3148.4377, 3186.2922]
2026-01-23 03:34:15,588 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 794.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 671.0, 1000.0, 1000.0]
2026-01-23 03:34:15,599 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 87/100 (estimated time remaining: 23 minutes, 40 seconds)
2026-01-23 03:35:45,047 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:35:53,567 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2935.86719 ± 832.760
2026-01-23 03:35:53,567 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3371.1409, 2919.3604, 2001.4753, 773.56146, 3366.0369, 3373.536, 3369.921, 3409.4287, 3385.6846, 3388.5273]
2026-01-23 03:35:53,567 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 876.0, 605.0, 395.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:35:53,576 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 88/100 (estimated time remaining: 21 minutes, 46 seconds)
2026-01-23 03:37:24,547 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:37:33,292 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3068.84668 ± 1017.664
2026-01-23 03:37:33,292 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3426.5906, 3396.3948, 3421.8367, 3417.481, 16.356575, 3361.4895, 3411.6133, 3426.3315, 3412.4841, 3397.8882]
2026-01-23 03:37:33,292 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 28.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:37:33,302 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 89/100 (estimated time remaining: 20 minutes, 3 seconds)
2026-01-23 03:38:58,822 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:39:07,863 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3057.78198 ± 761.869
2026-01-23 03:39:07,863 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3331.2812, 774.303, 3326.839, 3316.4692, 3310.5833, 3238.3743, 3350.5085, 3339.4766, 3326.2002, 3263.7847]
2026-01-23 03:39:07,863 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 282.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:39:07,873 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 90/100 (estimated time remaining: 18 minutes, 4 seconds)
2026-01-23 03:40:39,632 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:40:49,263 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3197.99707 ± 13.370
2026-01-23 03:40:49,263 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3174.4258, 3204.6646, 3201.6538, 3189.1077, 3195.9688, 3203.0127, 3214.4646, 3212.2798, 3176.02, 3208.3723]
2026-01-23 03:40:49,263 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:40:49,278 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 91/100 (estimated time remaining: 16 minutes, 27 seconds)
2026-01-23 03:42:20,355 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:42:29,858 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3263.89868 ± 240.940
2026-01-23 03:42:29,859 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3358.4229, 3350.2344, 3345.2458, 3354.4678, 3352.768, 2542.5273, 3309.6975, 3357.5732, 3321.1616, 3346.8865]
2026-01-23 03:42:29,859 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 772.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:42:29,869 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 92/100 (estimated time remaining: 14 minutes, 49 seconds)
2026-01-23 03:43:58,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:44:06,971 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2869.34839 ± 773.959
2026-01-23 03:44:06,971 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3258.0515, 3347.0784, 3372.2168, 3350.4407, 3326.6292, 2280.4724, 1979.7487, 1054.548, 3359.364, 3364.935]
2026-01-23 03:44:06,972 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 693.0, 606.0, 368.0, 1000.0, 1000.0]
2026-01-23 03:44:06,981 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 9 seconds)
2026-01-23 03:45:37,985 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:45:46,801 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3019.93921 ± 961.231
2026-01-23 03:45:46,801 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3343.0415, 3329.502, 3336.0063, 3347.5425, 3350.3142, 3341.895, 3366.6584, 3330.0803, 136.49689, 3317.857]
2026-01-23 03:45:46,801 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 112.0, 1000.0]
2026-01-23 03:45:46,810 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 30 seconds)
2026-01-23 03:47:20,965 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:47:29,900 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2997.91675 ± 708.207
2026-01-23 03:47:29,900 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3246.2861, 3209.1768, 3310.554, 3248.8262, 876.80707, 3258.5183, 3197.123, 3255.3486, 3148.7378, 3227.7886]
2026-01-23 03:47:29,901 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 314.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:47:29,913 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 2 seconds)
2026-01-23 03:49:01,205 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:49:10,891 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3312.43994 ± 21.386
2026-01-23 03:49:10,891 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3357.5762, 3334.1042, 3301.639, 3288.2148, 3322.626, 3324.2026, 3289.8818, 3288.8005, 3312.2231, 3305.1282]
2026-01-23 03:49:10,891 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:49:10,902 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 21 seconds)
2026-01-23 03:50:38,804 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:50:48,356 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3377.49072 ± 73.618
2026-01-23 03:50:48,356 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3383.0698, 3371.083, 3420.1624, 3415.4739, 3438.837, 3427.0645, 3393.5107, 3356.4258, 3400.1716, 3169.1055]
2026-01-23 03:50:48,356 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:50:48,367 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 38 seconds)
2026-01-23 03:52:21,182 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:52:29,588 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2815.40674 ± 685.703
2026-01-23 03:52:29,588 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3250.2532, 3280.3206, 1142.1743, 2848.455, 3279.3572, 3284.1746, 3272.642, 3170.9487, 2644.7615, 1980.981]
2026-01-23 03:52:29,588 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 384.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 824.0, 625.0]
2026-01-23 03:52:29,598 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 1 second)
2026-01-23 03:54:01,361 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:54:10,933 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3299.38989 ± 24.717
2026-01-23 03:54:10,933 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3294.315, 3292.3965, 3320.1052, 3310.381, 3255.9998, 3312.2617, 3326.6997, 3310.1377, 3252.5945, 3319.0093]
2026-01-23 03:54:10,933 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:54:10,944 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 21 seconds)
2026-01-23 03:55:36,156 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:55:45,189 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3092.73242 ± 557.971
2026-01-23 03:55:45,189 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3246.3154, 3316.6287, 3296.6062, 3248.0457, 3189.1414, 3302.3828, 1422.6166, 3311.803, 3293.1382, 3300.647]
2026-01-23 03:55:45,189 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 456.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:55:45,199 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 39 seconds)
2026-01-23 03:57:15,951 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:57:25,454 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3366.73975 ± 30.158
2026-01-23 03:57:25,454 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3381.6519, 3351.4412, 3378.954, 3403.9526, 3338.2104, 3319.2795, 3417.513, 3387.903, 3344.8298, 3343.664]
2026-01-23 03:57:25,454 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:57:25,466 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1299 [DEBUG]: Training session finished
