2026-01-25 17:02:41,981 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-walker2d/DatasetOffice-sac
2026-01-25 17:02:41,981 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-walker2d/DatasetOffice-sac
2026-01-25 17:02:41,981 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x149c42697ad0>}
2026-01-25 17:02:41,981 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1159 [DEBUG]: using device: cuda
2026-01-25 17:02:42,122 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1181 [INFO]: Creating new trainer
2026-01-25 17:02:42,138 baseline-sac-noisy-walker2d:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=17, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2026-01-25 17:02:42,138 baseline-sac-noisy-walker2d:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-25 17:02:42,793 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1242 [DEBUG]: Starting training session...
2026-01-25 17:02:42,793 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 1/100
2026-01-25 17:04:05,470 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:04:06,340 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 56.61874 ± 111.533
2026-01-25 17:04:06,341 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [26.543455, -9.275061, 1.8574828, 26.354021, -18.065845, 1.3629146, 249.06493, -21.404156, 6.261997, 303.48767]
2026-01-25 17:04:06,341 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [109.0, 81.0, 91.0, 65.0, 77.0, 48.0, 176.0, 124.0, 50.0, 196.0]
2026-01-25 17:04:06,341 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (56.62) for latency DatasetOffice
2026-01-25 17:04:06,344 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 17 minutes, 51 seconds)
2026-01-25 17:05:36,274 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:05:37,631 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 147.43973 ± 132.724
2026-01-25 17:05:37,632 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [307.88913, 32.829037, 33.56621, 165.89014, 95.74051, 61.677418, 409.61975, 291.56854, 23.81141, 51.805126]
2026-01-25 17:05:37,632 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [203.0, 100.0, 92.0, 157.0, 168.0, 141.0, 273.0, 165.0, 160.0, 116.0]
2026-01-25 17:05:37,632 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (147.44) for latency DatasetOffice
2026-01-25 17:05:37,637 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 22 minutes, 47 seconds)
2026-01-25 17:07:09,470 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:07:10,366 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 113.96327 ± 122.366
2026-01-25 17:07:10,366 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [93.81749, 356.66913, 25.263182, 77.56765, 31.881805, 50.090508, 350.6368, 32.60676, 92.74876, 28.35054]
2026-01-25 17:07:10,366 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [71.0, 275.0, 58.0, 108.0, 63.0, 67.0, 176.0, 69.0, 114.0, 41.0]
2026-01-25 17:07:10,371 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 24 minutes, 11 seconds)
2026-01-25 17:08:39,242 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:08:40,327 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 85.24147 ± 95.485
2026-01-25 17:08:40,327 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [6.4665856, 312.9889, 39.96664, 66.77782, -3.2532218, 34.498707, 25.056053, 206.14487, 113.82086, 49.94744]
2026-01-25 17:08:40,327 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [44.0, 194.0, 133.0, 157.0, 73.0, 95.0, 94.0, 152.0, 150.0, 169.0]
2026-01-25 17:08:40,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 23 minutes)
2026-01-25 17:10:10,282 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:10:10,760 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 55.58361 ± 78.358
2026-01-25 17:10:10,760 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [33.70788, 0.36446604, 198.70627, 19.803375, 31.281996, 3.2514844, 14.599772, 16.894226, 14.553214, 222.67336]
2026-01-25 17:10:10,760 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [48.0, 11.0, 112.0, 34.0, 44.0, 29.0, 27.0, 30.0, 47.0, 177.0]
2026-01-25 17:10:10,763 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 21 minutes, 51 seconds)
2026-01-25 17:11:41,657 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:11:43,020 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 128.52835 ± 148.767
2026-01-25 17:11:43,020 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [72.17964, 374.31772, 459.5403, 11.964742, 88.725525, 31.876995, 30.063757, 38.788635, 51.768257, 126.05778]
2026-01-25 17:11:43,020 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [122.0, 276.0, 336.0, 77.0, 136.0, 167.0, 80.0, 106.0, 126.0, 177.0]
2026-01-25 17:11:43,026 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 23 minutes, 5 seconds)
2026-01-25 17:13:13,437 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:13:14,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 197.34970 ± 116.919
2026-01-25 17:13:14,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [33.056015, 232.7893, 215.90994, 73.31283, 293.17783, 63.18979, 116.47364, 253.88321, 269.37112, 422.3333]
2026-01-25 17:13:14,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [49.0, 126.0, 148.0, 126.0, 248.0, 142.0, 140.0, 189.0, 172.0, 262.0]
2026-01-25 17:13:14,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (197.35) for latency DatasetOffice
2026-01-25 17:13:14,826 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 21 minutes, 43 seconds)
2026-01-25 17:14:44,176 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:14:45,841 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 184.89876 ± 108.436
2026-01-25 17:14:45,841 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [106.32402, 266.18747, 256.88635, 134.56084, 95.38222, 38.49224, 319.60986, 38.048405, 258.63995, 334.85614]
2026-01-25 17:14:45,841 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [197.0, 160.0, 307.0, 126.0, 210.0, 143.0, 265.0, 106.0, 215.0, 182.0]
2026-01-25 17:14:45,851 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 19 minutes, 40 seconds)
2026-01-25 17:16:16,732 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:16:17,911 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 168.73489 ± 150.076
2026-01-25 17:16:17,911 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [191.02702, 261.21204, 321.79037, 51.204266, 35.36866, 504.85632, 59.00307, 187.49013, 10.852195, 64.5449]
2026-01-25 17:16:17,911 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [127.0, 183.0, 209.0, 84.0, 131.0, 257.0, 113.0, 144.0, 37.0, 109.0]
2026-01-25 17:16:17,914 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 18 minutes, 48 seconds)
2026-01-25 17:17:48,170 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:17:49,798 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 250.83353 ± 102.197
2026-01-25 17:17:49,798 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [245.58226, 290.20825, 244.43051, 287.0941, 400.5264, 32.036434, 272.2711, 290.80838, 105.245384, 340.13223]
2026-01-25 17:17:49,798 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [148.0, 238.0, 188.0, 250.0, 185.0, 84.0, 228.0, 152.0, 193.0, 209.0]
2026-01-25 17:17:49,798 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (250.83) for latency DatasetOffice
2026-01-25 17:17:49,802 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 17 minutes, 42 seconds)
2026-01-25 17:19:19,248 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:19:20,356 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 179.00117 ± 109.653
2026-01-25 17:19:20,357 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [10.787373, 302.40717, 320.84012, 260.93356, 290.7301, 164.81378, 41.106205, 44.332657, 181.43912, 172.62172]
2026-01-25 17:19:20,357 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [41.0, 167.0, 176.0, 177.0, 208.0, 116.0, 59.0, 63.0, 171.0, 125.0]
2026-01-25 17:19:20,360 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 15 minutes, 40 seconds)
2026-01-25 17:20:52,810 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:20:54,141 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 228.32674 ± 102.196
2026-01-25 17:20:54,141 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [324.23572, 329.19614, 27.35747, 302.196, 266.97516, 227.65361, 257.8151, 253.81209, 39.809353, 254.21675]
2026-01-25 17:20:54,141 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [200.0, 190.0, 76.0, 188.0, 151.0, 134.0, 177.0, 202.0, 75.0, 173.0]
2026-01-25 17:20:54,144 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 14 minutes, 44 seconds)
2026-01-25 17:22:22,528 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:22:23,426 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 129.37245 ± 99.318
2026-01-25 17:22:23,426 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [277.63702, 199.68053, 196.30363, 44.19218, 237.9372, 53.397232, 40.615856, 215.63249, 30.847857, -2.5195873]
2026-01-25 17:22:23,426 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [160.0, 138.0, 131.0, 54.0, 159.0, 105.0, 101.0, 132.0, 61.0, 18.0]
2026-01-25 17:22:23,430 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 12 minutes, 41 seconds)
2026-01-25 17:23:54,242 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:23:55,577 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 250.03841 ± 85.957
2026-01-25 17:23:55,577 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [167.35565, 380.97424, 275.77438, 116.3626, 242.35556, 349.08356, 267.29803, 348.97897, 191.64676, 160.5541]
2026-01-25 17:23:55,577 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [105.0, 210.0, 134.0, 106.0, 167.0, 217.0, 142.0, 181.0, 140.0, 141.0]
2026-01-25 17:23:55,580 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 11 minutes, 11 seconds)
2026-01-25 17:25:25,070 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:25:26,427 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 228.31071 ± 118.654
2026-01-25 17:25:26,427 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [183.32623, 195.62175, 210.1375, 152.09131, 251.48958, 50.968616, 533.8023, 287.81696, 177.696, 240.15692]
2026-01-25 17:25:26,427 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [139.0, 133.0, 136.0, 134.0, 157.0, 93.0, 247.0, 264.0, 139.0, 158.0]
2026-01-25 17:25:26,430 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 9 minutes, 22 seconds)
2026-01-25 17:26:57,618 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:26:58,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 269.57315 ± 92.617
2026-01-25 17:26:58,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [386.4605, 308.44928, 331.5019, 399.21326, 169.07892, 174.88536, 192.45462, 159.20947, 365.17334, 209.3048]
2026-01-25 17:26:58,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [168.0, 184.0, 191.0, 195.0, 120.0, 117.0, 241.0, 117.0, 158.0, 123.0]
2026-01-25 17:26:58,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (269.57) for latency DatasetOffice
2026-01-25 17:26:58,994 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 8 minutes, 25 seconds)
2026-01-25 17:28:28,132 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:28:29,580 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 311.02338 ± 131.816
2026-01-25 17:28:29,580 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [342.48178, 476.33276, 345.36514, 218.2218, 221.92041, 516.1961, 364.52072, 311.88898, 24.199696, 289.10635]
2026-01-25 17:28:29,580 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [198.0, 226.0, 200.0, 119.0, 135.0, 275.0, 178.0, 150.0, 32.0, 167.0]
2026-01-25 17:28:29,580 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (311.02) for latency DatasetOffice
2026-01-25 17:28:29,584 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 6 minutes)
2026-01-25 17:30:00,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:30:01,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 217.28142 ± 153.584
2026-01-25 17:30:01,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [431.4891, 391.43198, 224.86981, 362.87183, 243.27435, 9.902656, 283.50162, 6.83211, 2.6783469, 215.96236]
2026-01-25 17:30:01,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [240.0, 194.0, 129.0, 210.0, 152.0, 20.0, 178.0, 16.0, 16.0, 128.0]
2026-01-25 17:30:01,786 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 5 minutes, 17 seconds)
2026-01-25 17:31:30,733 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:31:31,904 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 240.80904 ± 49.457
2026-01-25 17:31:31,905 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [177.33646, 224.98041, 295.57312, 222.95491, 161.64664, 256.26242, 211.5422, 280.62753, 329.51312, 247.65347]
2026-01-25 17:31:31,905 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [114.0, 115.0, 169.0, 145.0, 99.0, 120.0, 123.0, 170.0, 171.0, 140.0]
2026-01-25 17:31:31,908 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 3 minutes, 12 seconds)
2026-01-25 17:33:02,231 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:33:03,564 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 321.35275 ± 105.024
2026-01-25 17:33:03,564 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [328.96924, 197.66054, 417.8599, 394.4688, 494.84348, 329.39276, 225.52303, 166.06897, 238.00887, 420.7321]
2026-01-25 17:33:03,564 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [166.0, 120.0, 188.0, 161.0, 212.0, 160.0, 117.0, 114.0, 129.0, 215.0]
2026-01-25 17:33:03,564 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (321.35) for latency DatasetOffice
2026-01-25 17:33:03,569 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 1 minute, 54 seconds)
2026-01-25 17:34:34,708 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:34:35,980 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 278.41705 ± 98.655
2026-01-25 17:34:35,980 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [263.80765, 369.81586, 70.47349, 253.61333, 207.45705, 240.19534, 419.33783, 235.0889, 398.50824, 325.87256]
2026-01-25 17:34:35,980 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [111.0, 183.0, 71.0, 130.0, 108.0, 129.0, 231.0, 112.0, 196.0, 203.0]
2026-01-25 17:34:35,984 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 20 seconds)
2026-01-25 17:36:05,873 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:36:07,156 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 282.23126 ± 129.107
2026-01-25 17:36:07,156 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [194.26274, 432.78827, 230.09193, 186.88403, 406.33524, 377.7107, 267.76263, 435.30588, 6.0009694, 285.17032]
2026-01-25 17:36:07,156 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [136.0, 193.0, 128.0, 117.0, 227.0, 188.0, 134.0, 214.0, 16.0, 166.0]
2026-01-25 17:36:07,160 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 23/100 (estimated time remaining: 1 hour, 58 minutes, 58 seconds)
2026-01-25 17:37:37,296 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:37:38,620 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 278.71344 ± 105.355
2026-01-25 17:37:38,620 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [261.60803, 237.08109, 493.72238, 213.40686, 192.52855, 444.59314, 140.25613, 313.49918, 223.8942, 266.5446]
2026-01-25 17:37:38,620 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [178.0, 128.0, 199.0, 123.0, 116.0, 225.0, 83.0, 205.0, 126.0, 156.0]
2026-01-25 17:37:38,625 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 24/100 (estimated time remaining: 1 hour, 57 minutes, 15 seconds)
2026-01-25 17:39:09,490 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:39:10,952 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 358.94400 ± 149.748
2026-01-25 17:39:10,952 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [449.18216, 402.81714, 206.30588, 234.8646, 346.6293, 600.3201, 171.3155, 498.009, 164.20413, 515.7921]
2026-01-25 17:39:10,952 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [190.0, 180.0, 118.0, 122.0, 161.0, 258.0, 105.0, 230.0, 102.0, 218.0]
2026-01-25 17:39:10,952 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (358.94) for latency DatasetOffice
2026-01-25 17:39:10,956 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 25/100 (estimated time remaining: 1 hour, 56 minutes, 17 seconds)
2026-01-25 17:40:42,206 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:40:43,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 338.47809 ± 105.769
2026-01-25 17:40:43,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [370.3979, 167.43861, 383.07855, 470.33423, 351.35776, 201.86598, 417.71884, 408.82056, 184.06166, 429.70642]
2026-01-25 17:40:43,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [197.0, 111.0, 165.0, 220.0, 164.0, 115.0, 185.0, 195.0, 112.0, 200.0]
2026-01-25 17:40:43,652 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 26/100 (estimated time remaining: 1 hour, 55 minutes, 1 second)
2026-01-25 17:42:12,340 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:42:14,123 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 450.78760 ± 45.378
2026-01-25 17:42:14,123 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [438.17557, 432.47202, 414.2212, 463.5396, 511.0587, 443.09372, 391.60358, 531.21954, 489.98618, 392.50568]
2026-01-25 17:42:14,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [178.0, 209.0, 159.0, 184.0, 260.0, 199.0, 206.0, 268.0, 220.0, 162.0]
2026-01-25 17:42:14,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (450.79) for latency DatasetOffice
2026-01-25 17:42:14,127 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 27/100 (estimated time remaining: 1 hour, 53 minutes)
2026-01-25 17:43:45,324 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:43:46,701 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 324.60989 ± 166.742
2026-01-25 17:43:46,701 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [411.8419, 449.5631, 18.38522, 315.5107, 320.13248, 448.18295, 262.66226, 574.5049, 51.503536, 393.81183]
2026-01-25 17:43:46,701 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [165.0, 199.0, 36.0, 151.0, 157.0, 182.0, 145.0, 288.0, 76.0, 180.0]
2026-01-25 17:43:46,706 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 28/100 (estimated time remaining: 1 hour, 51 minutes, 49 seconds)
2026-01-25 17:45:18,321 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:45:19,982 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 470.87515 ± 25.724
2026-01-25 17:45:19,982 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [506.54703, 476.35168, 440.7943, 445.01782, 452.13715, 453.52692, 462.7637, 457.71664, 518.08124, 495.81485]
2026-01-25 17:45:19,982 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [208.0, 193.0, 184.0, 173.0, 188.0, 182.0, 188.0, 189.0, 212.0, 199.0]
2026-01-25 17:45:19,982 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (470.88) for latency DatasetOffice
2026-01-25 17:45:19,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 29/100 (estimated time remaining: 1 hour, 50 minutes, 43 seconds)
2026-01-25 17:46:49,919 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:46:51,454 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 468.14584 ± 22.099
2026-01-25 17:46:51,454 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [437.66864, 503.5702, 460.18146, 482.8986, 469.5556, 466.38293, 430.28952, 476.96136, 496.29895, 457.65137]
2026-01-25 17:46:51,454 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [163.0, 197.0, 176.0, 185.0, 184.0, 183.0, 159.0, 187.0, 177.0, 171.0]
2026-01-25 17:46:51,458 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 30/100 (estimated time remaining: 1 hour, 48 minutes, 59 seconds)
2026-01-25 17:48:22,529 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:48:24,127 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 455.42645 ± 37.396
2026-01-25 17:48:24,127 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [531.86316, 456.4726, 455.87582, 485.13992, 380.55557, 448.38626, 428.029, 478.9875, 449.8336, 439.1212]
2026-01-25 17:48:24,127 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [232.0, 179.0, 183.0, 214.0, 138.0, 181.0, 172.0, 200.0, 184.0, 188.0]
2026-01-25 17:48:24,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 47 minutes, 26 seconds)
2026-01-25 17:49:53,961 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:49:55,476 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 370.63092 ± 217.117
2026-01-25 17:49:55,476 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [542.8689, 440.96964, 512.7888, 535.1001, 97.500145, 49.488186, 394.64423, 3.5992064, 589.1462, 540.20386]
2026-01-25 17:49:55,476 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [229.0, 201.0, 193.0, 201.0, 129.0, 58.0, 184.0, 59.0, 239.0, 242.0]
2026-01-25 17:49:55,483 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 46 minutes, 6 seconds)
2026-01-25 17:51:25,538 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:51:27,234 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 490.63437 ± 38.399
2026-01-25 17:51:27,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [411.4161, 528.83167, 494.54578, 526.13336, 506.2405, 504.1988, 431.7999, 491.79306, 479.32156, 532.06287]
2026-01-25 17:51:27,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [155.0, 227.0, 176.0, 203.0, 203.0, 209.0, 164.0, 182.0, 215.0, 211.0]
2026-01-25 17:51:27,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (490.63) for latency DatasetOffice
2026-01-25 17:51:27,241 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 44 minutes, 23 seconds)
2026-01-25 17:52:57,092 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:52:58,739 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 459.94083 ± 19.499
2026-01-25 17:52:58,740 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [449.39273, 457.10297, 431.4116, 468.78662, 453.84778, 471.77673, 451.82407, 461.6626, 508.2325, 445.3705]
2026-01-25 17:52:58,740 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [187.0, 193.0, 170.0, 189.0, 185.0, 197.0, 182.0, 186.0, 217.0, 179.0]
2026-01-25 17:52:58,745 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 42 minutes, 27 seconds)
2026-01-25 17:54:29,050 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:54:30,614 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 450.23447 ± 23.218
2026-01-25 17:54:30,615 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [458.5417, 466.24783, 425.21637, 422.64383, 437.4878, 460.20926, 476.1196, 425.21768, 494.18115, 436.47955]
2026-01-25 17:54:30,615 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [179.0, 199.0, 160.0, 163.0, 175.0, 190.0, 198.0, 169.0, 213.0, 176.0]
2026-01-25 17:54:30,620 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 41 minutes)
2026-01-25 17:56:01,980 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:56:03,489 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 394.66840 ± 141.956
2026-01-25 17:56:03,490 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [466.09512, 420.78696, 8.564598, 334.03903, 318.7675, 476.71014, 495.3501, 484.1982, 489.20135, 452.97095]
2026-01-25 17:56:03,490 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [172.0, 177.0, 25.0, 214.0, 183.0, 196.0, 186.0, 201.0, 199.0, 178.0]
2026-01-25 17:56:03,495 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 39 minutes, 31 seconds)
2026-01-25 17:57:34,111 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:57:35,827 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 491.62119 ± 20.460
2026-01-25 17:57:35,827 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [492.79535, 486.90878, 469.79947, 487.42926, 467.84567, 513.84174, 505.00232, 482.96396, 536.49524, 473.1302]
2026-01-25 17:57:35,827 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [185.0, 198.0, 186.0, 191.0, 194.0, 206.0, 197.0, 199.0, 216.0, 182.0]
2026-01-25 17:57:35,827 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (491.62) for latency DatasetOffice
2026-01-25 17:57:35,833 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 38 minutes, 12 seconds)
2026-01-25 17:59:06,655 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:59:08,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 489.20370 ± 26.619
2026-01-25 17:59:08,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [469.11786, 505.43332, 464.4964, 507.32254, 515.7337, 487.59787, 455.18506, 451.3112, 501.62115, 534.2185]
2026-01-25 17:59:08,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [170.0, 211.0, 179.0, 205.0, 238.0, 206.0, 186.0, 190.0, 218.0, 201.0]
2026-01-25 17:59:08,412 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 36 minutes, 50 seconds)
2026-01-25 18:00:37,991 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:00:39,618 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 426.67490 ± 127.961
2026-01-25 18:00:39,618 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [466.8504, 512.485, 476.0128, 486.17102, 467.6804, 436.09848, 356.73148, 508.0108, 492.26215, 64.44631]
2026-01-25 18:00:39,618 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [188.0, 239.0, 194.0, 208.0, 188.0, 191.0, 160.0, 214.0, 204.0, 105.0]
2026-01-25 18:00:39,623 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 35 minutes, 14 seconds)
2026-01-25 18:02:11,239 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:02:12,848 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 468.28052 ± 22.100
2026-01-25 18:02:12,848 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [483.2377, 483.8041, 453.43283, 441.90485, 470.8108, 457.02063, 429.49408, 471.5927, 508.893, 482.61475]
2026-01-25 18:02:12,848 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [190.0, 200.0, 184.0, 172.0, 191.0, 181.0, 161.0, 202.0, 224.0, 199.0]
2026-01-25 18:02:12,854 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 33 minutes, 59 seconds)
2026-01-25 18:03:43,214 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:03:44,864 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 462.59369 ± 74.511
2026-01-25 18:03:44,865 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [455.21646, 528.3797, 432.0297, 481.08286, 528.2647, 489.91608, 518.8815, 460.57285, 472.77664, 258.81635]
2026-01-25 18:03:44,865 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [174.0, 218.0, 167.0, 195.0, 208.0, 202.0, 209.0, 191.0, 202.0, 125.0]
2026-01-25 18:03:44,870 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 32 minutes, 16 seconds)
2026-01-25 18:05:14,426 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:05:16,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 464.10220 ± 20.163
2026-01-25 18:05:16,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [451.08212, 492.51453, 427.49496, 449.0459, 469.12915, 484.6904, 471.54794, 442.3631, 487.3819, 465.7722]
2026-01-25 18:05:16,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [180.0, 208.0, 165.0, 184.0, 195.0, 202.0, 200.0, 180.0, 209.0, 189.0]
2026-01-25 18:05:16,100 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 30 minutes, 31 seconds)
2026-01-25 18:06:45,980 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:06:47,897 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 468.99194 ± 27.520
2026-01-25 18:06:47,897 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [463.65823, 486.35583, 457.55505, 482.59348, 442.6403, 412.2469, 511.41565, 471.72546, 502.36176, 459.36673]
2026-01-25 18:06:47,897 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [205.0, 206.0, 187.0, 221.0, 191.0, 251.0, 236.0, 281.0, 241.0, 205.0]
2026-01-25 18:06:47,904 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 28 minutes, 50 seconds)
2026-01-25 18:08:18,294 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:08:20,060 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 526.32843 ± 38.412
2026-01-25 18:08:20,060 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [522.4671, 582.09344, 556.7193, 524.07495, 519.2728, 528.36847, 475.2973, 530.53516, 573.98486, 450.47076]
2026-01-25 18:08:20,060 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [194.0, 239.0, 200.0, 194.0, 208.0, 204.0, 197.0, 218.0, 232.0, 166.0]
2026-01-25 18:08:20,060 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (526.33) for latency DatasetOffice
2026-01-25 18:08:20,067 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 27 minutes, 29 seconds)
2026-01-25 18:09:50,149 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:09:51,688 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 448.38907 ± 34.729
2026-01-25 18:09:51,688 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [433.0732, 437.94815, 468.22675, 414.70844, 496.5812, 411.38272, 451.48706, 521.3782, 428.04507, 421.06003]
2026-01-25 18:09:51,688 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [160.0, 195.0, 221.0, 153.0, 199.0, 152.0, 164.0, 213.0, 157.0, 158.0]
2026-01-25 18:09:51,693 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 25 minutes, 39 seconds)
2026-01-25 18:11:23,774 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:11:25,457 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 522.03210 ± 48.748
2026-01-25 18:11:25,457 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [422.40732, 481.5364, 559.12714, 482.8025, 492.67795, 534.42523, 578.6213, 551.2244, 586.2929, 531.2058]
2026-01-25 18:11:25,457 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [166.0, 191.0, 199.0, 172.0, 181.0, 217.0, 212.0, 215.0, 234.0, 191.0]
2026-01-25 18:11:25,463 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 24 minutes, 26 seconds)
2026-01-25 18:12:54,699 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:12:56,353 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 526.54633 ± 25.928
2026-01-25 18:12:56,353 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [572.4183, 528.61005, 553.7096, 504.55856, 532.1169, 483.42444, 503.84644, 522.84467, 553.3537, 510.58102]
2026-01-25 18:12:56,353 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [223.0, 207.0, 216.0, 181.0, 192.0, 172.0, 184.0, 191.0, 200.0, 174.0]
2026-01-25 18:12:56,353 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (526.55) for latency DatasetOffice
2026-01-25 18:12:56,362 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 22 minutes, 50 seconds)
2026-01-25 18:14:26,470 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:14:28,155 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 476.00714 ± 93.117
2026-01-25 18:14:28,155 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [496.89462, 293.49643, 529.88403, 510.47098, 491.94046, 549.66876, 292.66647, 540.2887, 536.51935, 518.2412]
2026-01-25 18:14:28,156 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [192.0, 172.0, 202.0, 205.0, 183.0, 222.0, 162.0, 217.0, 220.0, 190.0]
2026-01-25 18:14:28,161 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 21 minutes, 18 seconds)
2026-01-25 18:15:58,312 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:16:00,074 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 539.66962 ± 23.468
2026-01-25 18:16:00,074 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [598.88666, 532.3742, 522.29236, 532.5025, 529.959, 559.05524, 525.3695, 550.05835, 534.55457, 511.6442]
2026-01-25 18:16:00,074 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [222.0, 213.0, 195.0, 193.0, 204.0, 218.0, 199.0, 219.0, 186.0, 194.0]
2026-01-25 18:16:00,074 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (539.67) for latency DatasetOffice
2026-01-25 18:16:00,081 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 19 minutes, 44 seconds)
2026-01-25 18:17:31,821 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:17:33,550 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 514.81030 ± 13.158
2026-01-25 18:17:33,550 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [494.7739, 528.1581, 502.86865, 506.4137, 520.49713, 519.6895, 501.01392, 539.26624, 523.9213, 511.50034]
2026-01-25 18:17:33,550 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [180.0, 220.0, 181.0, 197.0, 205.0, 198.0, 192.0, 211.0, 211.0, 192.0]
2026-01-25 18:17:33,559 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 18 minutes, 31 seconds)
2026-01-25 18:19:02,946 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:19:04,760 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 533.14020 ± 26.680
2026-01-25 18:19:04,760 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [476.74023, 555.1525, 515.4288, 570.4189, 537.92944, 554.57153, 550.335, 509.4074, 544.7901, 516.6281]
2026-01-25 18:19:04,760 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [172.0, 230.0, 195.0, 222.0, 221.0, 239.0, 206.0, 199.0, 227.0, 208.0]
2026-01-25 18:19:04,766 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 16 minutes, 33 seconds)
2026-01-25 18:20:34,750 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:20:36,694 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 591.54340 ± 58.166
2026-01-25 18:20:36,694 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [571.30164, 604.619, 579.6348, 606.24274, 590.12994, 608.8337, 611.6683, 596.386, 447.80988, 698.80756]
2026-01-25 18:20:36,694 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [204.0, 238.0, 216.0, 223.0, 229.0, 232.0, 230.0, 233.0, 192.0, 293.0]
2026-01-25 18:20:36,694 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (591.54) for latency DatasetOffice
2026-01-25 18:20:36,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 15 minutes, 11 seconds)
2026-01-25 18:22:06,604 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:22:08,616 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 575.72449 ± 53.256
2026-01-25 18:22:08,616 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [563.8657, 548.2519, 676.7305, 550.1735, 576.10455, 619.77997, 464.52026, 606.0118, 602.23975, 549.5662]
2026-01-25 18:22:08,616 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [204.0, 220.0, 303.0, 215.0, 229.0, 244.0, 218.0, 239.0, 241.0, 213.0]
2026-01-25 18:22:08,623 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 13 minutes, 40 seconds)
2026-01-25 18:23:36,847 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:23:38,765 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 588.45056 ± 93.098
2026-01-25 18:23:38,765 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [616.9659, 814.68463, 486.5085, 519.1921, 551.85156, 635.56305, 630.93774, 599.6433, 553.85144, 475.3071]
2026-01-25 18:23:38,765 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [218.0, 293.0, 182.0, 206.0, 199.0, 219.0, 240.0, 218.0, 218.0, 230.0]
2026-01-25 18:23:38,772 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 11 minutes, 51 seconds)
2026-01-25 18:25:09,120 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:25:11,036 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 590.02710 ± 51.718
2026-01-25 18:25:11,036 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [538.0828, 667.6233, 568.21124, 513.0848, 554.3901, 661.6368, 601.0139, 595.9157, 649.51794, 550.79443]
2026-01-25 18:25:11,036 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [195.0, 265.0, 217.0, 192.0, 225.0, 248.0, 217.0, 234.0, 256.0, 203.0]
2026-01-25 18:25:11,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 10 minutes, 8 seconds)
2026-01-25 18:26:39,930 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:26:41,935 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 618.54132 ± 41.036
2026-01-25 18:26:41,935 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [696.1341, 668.5739, 656.71765, 595.6475, 609.01447, 617.9649, 561.5337, 614.1396, 597.42755, 568.25903]
2026-01-25 18:26:41,935 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [279.0, 264.0, 241.0, 236.0, 242.0, 224.0, 204.0, 230.0, 235.0, 220.0]
2026-01-25 18:26:41,935 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (618.54) for latency DatasetOffice
2026-01-25 18:26:41,945 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 8 minutes, 34 seconds)
2026-01-25 18:28:11,700 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:28:13,364 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 410.80524 ± 259.531
2026-01-25 18:28:13,364 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [582.0616, 850.51044, 534.5304, 601.1589, 536.9252, 485.5184, 332.105, 38.00488, 125.090126, 22.147434]
2026-01-25 18:28:13,364 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [232.0, 381.0, 234.0, 264.0, 240.0, 209.0, 185.0, 36.0, 110.0, 50.0]
2026-01-25 18:28:13,371 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 6 minutes, 58 seconds)
2026-01-25 18:29:43,191 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:29:45,324 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 530.11053 ± 81.691
2026-01-25 18:29:45,324 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [589.81757, 535.24585, 587.6883, 580.92725, 497.53732, 554.2928, 594.94495, 309.64163, 562.56195, 488.4476]
2026-01-25 18:29:45,324 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [288.0, 215.0, 286.0, 275.0, 261.0, 258.0, 293.0, 149.0, 271.0, 215.0]
2026-01-25 18:29:45,330 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 5 minutes, 27 seconds)
2026-01-25 18:31:10,917 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:31:13,903 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 887.58710 ± 301.713
2026-01-25 18:31:13,903 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [553.7721, 600.3106, 901.8457, 1343.0369, 775.10785, 995.9793, 633.7087, 543.43304, 1170.7133, 1357.9631]
2026-01-25 18:31:13,903 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [249.0, 273.0, 351.0, 455.0, 284.0, 395.0, 276.0, 241.0, 435.0, 520.0]
2026-01-25 18:31:13,903 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (887.59) for latency DatasetOffice
2026-01-25 18:31:13,909 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 3 minutes, 43 seconds)
2026-01-25 18:32:43,788 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:32:46,180 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 649.68683 ± 155.353
2026-01-25 18:32:46,180 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [495.81497, 572.7973, 547.1831, 517.37714, 540.95074, 736.84546, 809.8765, 516.39087, 966.3523, 793.2792]
2026-01-25 18:32:46,180 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [236.0, 265.0, 254.0, 231.0, 247.0, 335.0, 340.0, 239.0, 373.0, 311.0]
2026-01-25 18:32:46,187 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 2 minutes, 12 seconds)
2026-01-25 18:34:15,322 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:34:17,582 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 638.38403 ± 23.430
2026-01-25 18:34:17,582 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [669.6945, 615.7205, 644.0984, 623.7849, 663.1759, 635.3675, 623.93066, 649.45825, 665.5661, 593.04395]
2026-01-25 18:34:17,582 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [251.0, 272.0, 257.0, 253.0, 261.0, 258.0, 243.0, 302.0, 287.0, 245.0]
2026-01-25 18:34:17,588 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 45 seconds)
2026-01-25 18:35:46,942 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:35:49,550 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 790.85889 ± 313.488
2026-01-25 18:35:49,550 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [826.44415, 1104.0652, 769.5867, 20.814516, 651.31616, 960.2771, 1153.825, 630.42035, 719.5326, 1072.3073]
2026-01-25 18:35:49,550 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [300.0, 432.0, 305.0, 59.0, 255.0, 360.0, 401.0, 264.0, 298.0, 366.0]
2026-01-25 18:35:49,557 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 62/100 (estimated time remaining: 59 minutes, 18 seconds)
2026-01-25 18:37:18,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:37:20,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 667.95392 ± 44.734
2026-01-25 18:37:20,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [639.0074, 749.9213, 624.4951, 666.1564, 654.8468, 700.96545, 629.4137, 613.4365, 663.8667, 737.42957]
2026-01-25 18:37:20,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [269.0, 275.0, 244.0, 263.0, 263.0, 283.0, 267.0, 248.0, 272.0, 279.0]
2026-01-25 18:37:20,581 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 63/100 (estimated time remaining: 57 minutes, 39 seconds)
2026-01-25 18:38:49,342 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:38:51,724 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 663.93927 ± 315.486
2026-01-25 18:38:51,724 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [657.10095, 1089.1426, 562.7746, 576.62836, 977.431, 776.6913, 46.041122, 280.7475, 1057.2028, 615.63226]
2026-01-25 18:38:51,724 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [253.0, 365.0, 240.0, 233.0, 528.0, 290.0, 88.0, 159.0, 363.0, 237.0]
2026-01-25 18:38:51,731 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 64/100 (estimated time remaining: 56 minutes, 27 seconds)
2026-01-25 18:40:19,929 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:40:21,480 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 449.26215 ± 237.877
2026-01-25 18:40:21,480 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [568.0816, 318.89767, 31.716946, 451.74936, 10.908997, 608.1239, 533.40466, 615.8994, 629.5113, 724.32776]
2026-01-25 18:40:21,480 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [211.0, 147.0, 74.0, 180.0, 44.0, 242.0, 199.0, 253.0, 259.0, 248.0]
2026-01-25 18:40:21,486 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 65/100 (estimated time remaining: 54 minutes, 38 seconds)
2026-01-25 18:41:51,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:41:53,539 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 600.96106 ± 211.703
2026-01-25 18:41:53,539 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [594.94714, 614.4796, 602.2959, 611.167, 755.6946, 607.0458, 629.3686, 705.2419, 18.449965, 870.91974]
2026-01-25 18:41:53,539 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [241.0, 259.0, 250.0, 245.0, 280.0, 251.0, 255.0, 284.0, 48.0, 345.0]
2026-01-25 18:41:53,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 66/100 (estimated time remaining: 53 minutes, 11 seconds)
2026-01-25 18:43:24,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:43:27,852 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1120.16040 ± 382.168
2026-01-25 18:43:27,852 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [1106.886, 1457.2167, 1032.0697, 1129.027, 692.0527, 2105.124, 896.7037, 965.6567, 803.525, 1013.34143]
2026-01-25 18:43:27,852 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [400.0, 496.0, 350.0, 396.0, 248.0, 671.0, 306.0, 366.0, 317.0, 380.0]
2026-01-25 18:43:27,852 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (1120.16) for latency DatasetOffice
2026-01-25 18:43:27,861 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 67/100 (estimated time remaining: 51 minutes, 56 seconds)
2026-01-25 18:44:53,104 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:44:56,164 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 985.33704 ± 397.500
2026-01-25 18:44:56,164 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [1057.6827, 1058.1527, 567.98047, 979.7113, 722.7898, 913.8371, 595.9855, 1443.9122, 619.41925, 1893.899]
2026-01-25 18:44:56,164 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [360.0, 381.0, 230.0, 382.0, 263.0, 344.0, 229.0, 457.0, 240.0, 630.0]
2026-01-25 18:44:56,171 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 68/100 (estimated time remaining: 50 minutes, 6 seconds)
2026-01-25 18:46:25,006 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:46:27,469 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 808.10376 ± 467.397
2026-01-25 18:46:27,469 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [1212.2255, 1677.6122, 554.3902, 1226.8119, 928.94196, 596.80334, 244.88295, 779.883, 4.3146343, 855.1717]
2026-01-25 18:46:27,469 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [366.0, 538.0, 207.0, 444.0, 311.0, 233.0, 121.0, 281.0, 23.0, 352.0]
2026-01-25 18:46:27,481 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 69/100 (estimated time remaining: 48 minutes, 36 seconds)
2026-01-25 18:47:58,190 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:48:02,315 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1398.66577 ± 956.542
2026-01-25 18:48:02,316 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [876.7419, 1643.5682, 1267.6033, 984.1322, 3121.1528, 745.18256, 588.8897, 3312.8645, 755.67523, 690.8486]
2026-01-25 18:48:02,316 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [316.0, 575.0, 427.0, 324.0, 1000.0, 276.0, 231.0, 1000.0, 285.0, 250.0]
2026-01-25 18:48:02,316 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (1398.67) for latency DatasetOffice
2026-01-25 18:48:02,322 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 70/100 (estimated time remaining: 47 minutes, 37 seconds)
2026-01-25 18:49:31,300 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:49:35,486 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1346.77979 ± 539.741
2026-01-25 18:49:35,486 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [892.8258, 972.51483, 2382.0588, 811.9324, 844.2227, 1206.6592, 1210.7373, 2296.845, 1382.4053, 1467.5968]
2026-01-25 18:49:35,486 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [327.0, 363.0, 780.0, 291.0, 306.0, 421.0, 440.0, 782.0, 487.0, 523.0]
2026-01-25 18:49:35,495 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 71/100 (estimated time remaining: 46 minutes, 11 seconds)
2026-01-25 18:51:08,940 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:51:14,214 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1850.79944 ± 995.572
2026-01-25 18:51:14,214 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3195.4504, 3046.2478, 2766.165, 539.4012, 1291.3481, 2683.397, 652.7455, 2330.508, 969.24713, 1033.4817]
2026-01-25 18:51:14,214 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 957.0, 855.0, 196.0, 419.0, 778.0, 241.0, 735.0, 329.0, 371.0]
2026-01-25 18:51:14,214 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (1850.80) for latency DatasetOffice
2026-01-25 18:51:14,222 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 72/100 (estimated time remaining: 45 minutes, 4 seconds)
2026-01-25 18:52:37,736 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:52:42,467 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1782.33362 ± 935.010
2026-01-25 18:52:42,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [1500.1299, 1617.0975, 1468.141, 3516.861, 699.5066, 1187.028, 1212.3258, 1037.4913, 2057.122, 3527.6338]
2026-01-25 18:52:42,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [460.0, 514.0, 471.0, 983.0, 244.0, 387.0, 381.0, 335.0, 593.0, 1000.0]
2026-01-25 18:52:42,477 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 73/100 (estimated time remaining: 43 minutes, 31 seconds)
2026-01-25 18:54:13,536 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:54:21,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2859.69116 ± 963.357
2026-01-25 18:54:21,547 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3600.0317, 738.5334, 3267.598, 3349.9612, 3429.7532, 3215.551, 3327.8655, 1174.2743, 3334.6033, 3158.7422]
2026-01-25 18:54:21,547 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 271.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 378.0, 1000.0, 1000.0]
2026-01-25 18:54:21,547 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (2859.69) for latency DatasetOffice
2026-01-25 18:54:21,554 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 74/100 (estimated time remaining: 42 minutes, 39 seconds)
2026-01-25 18:55:51,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:55:55,826 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1351.68286 ± 1380.801
2026-01-25 18:55:55,826 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [1007.91833, 3304.4348, 303.58755, 3309.3906, 1547.2822, 52.291103, 62.52272, 29.196232, 475.35358, 3424.8528]
2026-01-25 18:55:55,826 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [331.0, 1000.0, 137.0, 1000.0, 484.0, 92.0, 108.0, 47.0, 245.0, 1000.0]
2026-01-25 18:55:55,834 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 75/100 (estimated time remaining: 41 minutes, 2 seconds)
2026-01-25 18:57:26,329 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:57:34,716 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2999.77881 ± 629.053
2026-01-25 18:57:34,716 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3298.504, 3253.4187, 3218.0093, 1244.4147, 3370.2446, 2525.4788, 3311.9976, 3351.6187, 3189.8958, 3234.2058]
2026-01-25 18:57:34,716 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 383.0, 1000.0, 782.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:57:34,717 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (2999.78) for latency DatasetOffice
2026-01-25 18:57:34,725 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 76/100 (estimated time remaining: 39 minutes, 56 seconds)
2026-01-25 18:59:05,347 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:59:13,827 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3053.86621 ± 827.914
2026-01-25 18:59:13,827 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3284.2856, 3357.1108, 3366.1912, 3391.5276, 3346.4456, 3070.6191, 3384.0398, 3301.2043, 587.14233, 3450.095]
2026-01-25 18:59:13,827 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 226.0, 1000.0]
2026-01-25 18:59:13,827 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (3053.87) for latency DatasetOffice
2026-01-25 18:59:13,837 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 77/100 (estimated time remaining: 38 minutes, 22 seconds)
2026-01-25 19:00:37,314 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:00:40,938 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 1330.15771 ± 847.916
2026-01-25 19:00:40,939 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [1001.4805, 1784.0947, 1479.6918, 1460.9398, 1480.8986, 1359.5721, 3387.7622, 852.99677, 296.5391, 197.60173]
2026-01-25 19:00:40,939 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [331.0, 533.0, 442.0, 438.0, 443.0, 424.0, 936.0, 286.0, 148.0, 108.0]
2026-01-25 19:00:40,950 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 78/100 (estimated time remaining: 36 minutes, 40 seconds)
2026-01-25 19:02:16,719 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:02:23,968 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2668.59326 ± 1000.595
2026-01-25 19:02:23,968 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2298.466, 2959.1013, 3398.5906, 2162.4866, 3447.319, 177.95473, 3392.5723, 3424.9932, 1973.4142, 3451.035]
2026-01-25 19:02:23,968 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [666.0, 846.0, 1000.0, 620.0, 1000.0, 135.0, 1000.0, 1000.0, 592.0, 1000.0]
2026-01-25 19:02:23,977 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 79/100 (estimated time remaining: 35 minutes, 22 seconds)
2026-01-25 19:03:50,271 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:03:56,778 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2307.21533 ± 796.342
2026-01-25 19:03:56,778 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2454.4421, 3211.5642, 3399.714, 1973.5244, 2205.0288, 886.54694, 1895.7742, 1275.2317, 3249.9375, 2520.3906]
2026-01-25 19:03:56,778 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [729.0, 991.0, 1000.0, 601.0, 664.0, 317.0, 643.0, 408.0, 1000.0, 757.0]
2026-01-25 19:03:56,787 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 80/100 (estimated time remaining: 33 minutes, 40 seconds)
2026-01-25 19:05:27,144 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:05:28,971 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 617.47186 ± 758.093
2026-01-25 19:05:28,971 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [1052.9414, 1348.9833, 985.183, 18.17766, 3.2862756, -2.6240335, 392.7651, 6.0024686, 23.616932, 2346.3867]
2026-01-25 19:05:28,971 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [367.0, 421.0, 314.0, 32.0, 20.0, 29.0, 176.0, 32.0, 44.0, 684.0]
2026-01-25 19:05:28,981 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 81/100 (estimated time remaining: 31 minutes, 37 seconds)
2026-01-25 19:06:54,432 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:07:01,685 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2837.54321 ± 600.158
2026-01-25 19:07:01,685 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2583.549, 1743.973, 3102.1785, 3552.2317, 2809.1165, 3305.6362, 1770.9972, 3038.7527, 3388.7866, 3080.211]
2026-01-25 19:07:01,685 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [731.0, 519.0, 868.0, 1000.0, 781.0, 929.0, 542.0, 834.0, 951.0, 878.0]
2026-01-25 19:07:01,695 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 82/100 (estimated time remaining: 29 minutes, 37 seconds)
2026-01-25 19:08:35,463 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:08:43,280 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2927.28076 ± 794.491
2026-01-25 19:08:43,280 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3558.421, 1420.678, 3393.351, 1877.6713, 1890.576, 3426.3245, 3417.245, 3391.466, 3414.4976, 3482.5784]
2026-01-25 19:08:43,280 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 442.0, 1000.0, 552.0, 566.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:08:43,290 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 83/100 (estimated time remaining: 28 minutes, 56 seconds)
2026-01-25 19:10:14,113 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:10:20,644 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2344.29004 ± 1485.021
2026-01-25 19:10:20,644 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3503.916, 3329.506, 3452.0671, 3416.763, 3451.574, 2459.141, 6.305285, 13.730276, 349.77713, 3460.1213]
2026-01-25 19:10:20,644 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 739.0, 22.0, 25.0, 238.0, 1000.0]
2026-01-25 19:10:20,654 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 84/100 (estimated time remaining: 27 minutes)
2026-01-25 19:11:50,072 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:11:58,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3066.68262 ± 727.096
2026-01-25 19:11:58,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3372.5889, 3387.711, 3362.9685, 3443.1064, 3436.1812, 2472.5813, 3430.346, 3399.1199, 1049.7766, 3312.4453]
2026-01-25 19:11:58,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 985.0, 1000.0, 1000.0, 1000.0, 730.0, 1000.0, 1000.0, 359.0, 1000.0]
2026-01-25 19:11:58,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (3066.68) for latency DatasetOffice
2026-01-25 19:11:58,425 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 85/100 (estimated time remaining: 25 minutes, 41 seconds)
2026-01-25 19:13:28,232 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:13:36,044 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2700.21436 ± 959.216
2026-01-25 19:13:36,044 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3207.5996, 3290.5344, 2386.0186, 299.22055, 3303.2031, 3252.072, 3208.514, 3251.7053, 3214.8442, 1588.4333]
2026-01-25 19:13:36,044 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 743.0, 130.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 542.0]
2026-01-25 19:13:36,052 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 86/100 (estimated time remaining: 24 minutes, 21 seconds)
2026-01-25 19:15:07,535 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:15:14,810 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2703.98779 ± 940.611
2026-01-25 19:15:14,810 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3510.903, 2755.8208, 3492.6045, 2550.2004, 2153.401, 1134.669, 986.48206, 3472.1172, 3470.1562, 3513.5247]
2026-01-25 19:15:14,810 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 800.0, 1000.0, 732.0, 647.0, 372.0, 330.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:15:14,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 87/100 (estimated time remaining: 23 minutes)
2026-01-25 19:16:45,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:16:51,426 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2396.92407 ± 714.490
2026-01-25 19:16:51,426 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3492.6296, 2821.698, 3413.0354, 2665.5825, 2217.5996, 2692.5227, 1741.7089, 2022.908, 1186.033, 1715.5231]
2026-01-25 19:16:51,426 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 844.0, 1000.0, 786.0, 654.0, 801.0, 536.0, 620.0, 405.0, 519.0]
2026-01-25 19:16:51,435 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 88/100 (estimated time remaining: 21 minutes, 9 seconds)
2026-01-25 19:18:20,725 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:18:29,438 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3165.29785 ± 529.477
2026-01-25 19:18:29,439 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3352.6223, 3372.0476, 3398.79, 3353.326, 3418.6187, 3413.6875, 2965.3186, 3409.6309, 1623.327, 3345.6094]
2026-01-25 19:18:29,439 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 988.0, 1000.0, 1000.0, 1000.0, 1000.0, 869.0, 1000.0, 538.0, 1000.0]
2026-01-25 19:18:29,439 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (3165.30) for latency DatasetOffice
2026-01-25 19:18:29,449 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 89/100 (estimated time remaining: 19 minutes, 33 seconds)
2026-01-25 19:19:58,205 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:20:06,887 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3312.59912 ± 366.697
2026-01-25 19:20:06,887 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3448.1108, 3373.338, 3496.5588, 2301.5886, 3509.1267, 3027.7947, 3461.4397, 3590.5454, 3409.9087, 3507.584]
2026-01-25 19:20:06,887 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 671.0, 1000.0, 883.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:20:06,887 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (3312.60) for latency DatasetOffice
2026-01-25 19:20:06,895 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 90/100 (estimated time remaining: 17 minutes, 54 seconds)
2026-01-25 19:21:38,263 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:21:47,172 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3350.81445 ± 314.783
2026-01-25 19:21:47,172 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3515.8843, 3451.8696, 3482.0764, 3500.2937, 3443.7083, 3445.519, 3449.3796, 3381.5938, 3425.1235, 2412.6968]
2026-01-25 19:21:47,172 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 698.0]
2026-01-25 19:21:47,172 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (3350.81) for latency DatasetOffice
2026-01-25 19:21:47,180 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 91/100 (estimated time remaining: 16 minutes, 22 seconds)
2026-01-25 19:23:15,336 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:23:21,946 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2325.35986 ± 1424.795
2026-01-25 19:23:21,946 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3328.3408, 3167.3813, 3206.4314, 3308.777, 424.0128, 13.63405, 29.795097, 3231.408, 3257.7188, 3286.0974]
2026-01-25 19:23:21,946 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 167.0, 21.0, 50.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:23:21,954 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 92/100 (estimated time remaining: 14 minutes, 36 seconds)
2026-01-25 19:24:48,423 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:24:57,320 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3292.60693 ± 368.206
2026-01-25 19:24:57,320 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3501.2507, 3403.9475, 3475.5762, 3289.937, 3394.5156, 3402.77, 2203.554, 3390.7576, 3363.4922, 3500.271]
2026-01-25 19:24:57,320 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 656.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:24:57,330 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 93/100 (estimated time remaining: 12 minutes, 57 seconds)
2026-01-25 19:26:27,895 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:26:36,839 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3306.04028 ± 268.275
2026-01-25 19:26:36,839 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3393.3506, 3424.2456, 3476.0874, 2705.7148, 3472.1648, 3381.3574, 3505.242, 3456.5425, 3393.1157, 2852.5803]
2026-01-25 19:26:36,839 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 810.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 816.0]
2026-01-25 19:26:36,853 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 22 seconds)
2026-01-25 19:28:10,306 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:28:17,263 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2533.71338 ± 1204.034
2026-01-25 19:28:17,263 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [2862.5403, 13.63691, 413.66983, 3206.3853, 3437.8606, 2578.9307, 3397.9067, 3408.6975, 2616.5115, 3400.9944]
2026-01-25 19:28:17,263 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [886.0, 62.0, 154.0, 1000.0, 1000.0, 764.0, 1000.0, 1000.0, 802.0, 1000.0]
2026-01-25 19:28:17,272 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 95/100 (estimated time remaining: 9 minutes, 48 seconds)
2026-01-25 19:29:38,891 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:29:48,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3362.68408 ± 25.403
2026-01-25 19:29:48,340 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3350.1719, 3309.7122, 3396.1506, 3397.1306, 3387.415, 3353.2703, 3353.7163, 3375.5334, 3353.344, 3350.396]
2026-01-25 19:29:48,340 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:29:48,340 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1274 [INFO]: New best (3362.68) for latency DatasetOffice
2026-01-25 19:29:48,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 1 second)
2026-01-25 19:31:18,251 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:31:25,650 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2702.85400 ± 1064.042
2026-01-25 19:31:25,650 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3419.0596, 3393.198, 2678.8884, 142.8075, 2131.6106, 3403.1133, 3452.4302, 3433.3743, 3428.4504, 1545.608]
2026-01-25 19:31:25,650 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 802.0, 95.0, 642.0, 1000.0, 1000.0, 1000.0, 1000.0, 484.0]
2026-01-25 19:31:25,660 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 26 seconds)
2026-01-25 19:33:02,900 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:33:11,583 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3076.93433 ± 619.601
2026-01-25 19:33:11,583 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3310.1455, 3293.9846, 3218.3645, 3270.1143, 3236.0767, 3353.369, 3220.3755, 3304.073, 1222.9697, 3339.87]
2026-01-25 19:33:11,583 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 445.0, 1000.0]
2026-01-25 19:33:11,592 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 98/100 (estimated time remaining: 4 minutes, 56 seconds)
2026-01-25 19:34:34,827 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:34:43,566 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 3059.95142 ± 670.926
2026-01-25 19:34:43,566 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3284.116, 3215.0706, 3356.8489, 3374.434, 3280.519, 3275.5002, 3299.0833, 3199.0837, 3261.8096, 1053.0482]
2026-01-25 19:34:43,566 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 360.0]
2026-01-25 19:34:43,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 14 seconds)
2026-01-25 19:36:18,643 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:36:24,962 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2258.68408 ± 1530.229
2026-01-25 19:36:24,962 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3467.1409, 3479.3872, 3475.7686, 3467.5178, 3369.4753, 3486.1448, 1634.272, 145.64214, 40.61233, 20.880384]
2026-01-25 19:36:24,962 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 520.0, 176.0, 70.0, 56.0]
2026-01-25 19:36:24,972 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1247 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 37 seconds)
2026-01-25 19:37:53,071 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:38:01,205 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1269 [DEBUG]: Total Reward: 2925.48096 ± 990.338
2026-01-25 19:38:01,205 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1270 [DEBUG]: All rewards: [3400.8374, 3309.37, 2739.7114, 3355.66, 3277.7766, 5.2987638, 3380.3171, 3211.5295, 3206.8604, 3367.4514]
2026-01-25 19:38:01,205 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 802.0, 1000.0, 1000.0, 32.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:38:01,215 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1299 [DEBUG]: Training session finished
