2026-01-22 23:52:42,501 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-ant/DatasetOffice-sac-aug-mem2
2026-01-22 23:52:42,501 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-ant/DatasetOffice-sac-aug-mem2
2026-01-22 23:52:42,501 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x14de85231690>}
2026-01-22 23:52:42,501 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1159 [DEBUG]: using device: cuda
2026-01-22 23:52:42,501 baseline-sac-noisy-ant:77 [WARNING]: args.memorize_actions != args.horizon: 2 != 32
2026-01-22 23:52:42,643 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1181 [INFO]: Creating new trainer
2026-01-22 23:52:42,659 baseline-sac-noisy-ant:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=43, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2026-01-22 23:52:42,659 baseline-sac-noisy-ant:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=51, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-22 23:52:43,524 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1242 [DEBUG]: Starting training session...
2026-01-22 23:52:43,524 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 1/100
2026-01-22 23:54:14,168 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:54:17,817 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: -513.57092 ± 688.879
2026-01-22 23:54:17,818 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [-3.902417, -89.7448, -40.55866, -1678.5006, -122.726326, -36.718037, -1456.873, -34.04976, -1549.4503, -123.18492]
2026-01-22 23:54:17,818 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [12.0, 95.0, 44.0, 1000.0, 81.0, 32.0, 1000.0, 26.0, 1000.0, 65.0]
2026-01-22 23:54:17,818 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (-513.57) for latency DatasetOffice
2026-01-22 23:54:17,821 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 35 minutes, 35 seconds)
2026-01-22 23:55:47,749 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:55:51,217 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: -16.61887 ± 41.171
2026-01-22 23:55:51,217 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [3.3292615, -107.16854, 2.2796633, -19.104134, 0.6528414, 48.60457, 17.59747, -19.843977, -62.048542, -30.487316]
2026-01-22 23:55:51,217 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [252.0, 453.0, 1000.0, 208.0, 43.0, 624.0, 80.0, 105.0, 341.0, 319.0]
2026-01-22 23:55:51,217 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (-16.62) for latency DatasetOffice
2026-01-22 23:55:51,221 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 33 minutes, 17 seconds)
2026-01-22 23:57:23,108 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:57:25,751 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 56.80046 ± 62.820
2026-01-22 23:57:25,751 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [130.72762, -4.5127716, 187.54665, 29.153532, 9.793807, 56.175568, 103.852425, 4.780273, 65.60928, -15.12176]
2026-01-22 23:57:25,751 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [721.0, 123.0, 437.0, 75.0, 59.0, 153.0, 435.0, 48.0, 370.0, 188.0]
2026-01-22 23:57:25,751 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (56.80) for latency DatasetOffice
2026-01-22 23:57:25,754 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 32 minutes, 5 seconds)
2026-01-22 23:58:48,448 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:58:53,260 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 157.48386 ± 103.875
2026-01-22 23:58:53,260 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [297.4039, 47.197773, 10.56219, 166.35295, 265.73724, 248.50761, 18.880968, 265.20746, 96.86175, 158.12685]
2026-01-22 23:58:53,260 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 67.0, 76.0, 333.0, 549.0, 1000.0, 42.0, 1000.0, 220.0, 305.0]
2026-01-22 23:58:53,260 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (157.48) for latency DatasetOffice
2026-01-22 23:58:53,266 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 27 minutes, 53 seconds)
2026-01-23 00:00:24,923 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:00:30,138 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 163.44504 ± 126.573
2026-01-23 00:00:30,138 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [358.85278, 46.420643, 326.87122, 298.72745, 58.44038, 236.3297, 41.48446, 15.694263, 64.803215, 186.82648]
2026-01-23 00:00:30,138 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 183.0, 1000.0, 1000.0, 107.0, 1000.0, 92.0, 30.0, 74.0, 395.0]
2026-01-23 00:00:30,138 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (163.45) for latency DatasetOffice
2026-01-23 00:00:30,143 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 27 minutes, 45 seconds)
2026-01-23 00:01:58,986 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:02:05,459 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 241.43637 ± 149.048
2026-01-23 00:02:05,460 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [80.0215, 362.81088, 171.21147, 77.55053, 9.507729, 467.18152, 388.64297, 196.09804, 279.81256, 381.5267]
2026-01-23 00:02:05,460 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [224.0, 1000.0, 277.0, 125.0, 77.0, 1000.0, 1000.0, 398.0, 1000.0, 1000.0]
2026-01-23 00:02:05,460 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (241.44) for latency DatasetOffice
2026-01-23 00:02:05,463 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 26 minutes, 31 seconds)
2026-01-23 00:03:37,647 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:03:44,006 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 277.39981 ± 180.169
2026-01-23 00:03:44,007 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [23.887308, 24.232653, 194.96432, 428.25824, 482.84592, 409.6873, 182.23071, 487.58185, 106.378395, 433.9315]
2026-01-23 00:03:44,007 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [92.0, 48.0, 283.0, 1000.0, 1000.0, 1000.0, 284.0, 1000.0, 214.0, 1000.0]
2026-01-23 00:03:44,007 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (277.40) for latency DatasetOffice
2026-01-23 00:03:44,011 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 26 minutes, 33 seconds)
2026-01-23 00:05:18,065 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:05:25,597 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 320.70279 ± 158.801
2026-01-23 00:05:25,597 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [569.88965, 360.99313, 210.67094, 454.9642, 308.12756, -2.5259006, 410.0632, 461.33023, 263.23477, 170.27988]
2026-01-23 00:05:25,597 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 298.0, 1000.0, 1000.0, 21.0, 1000.0, 1000.0, 434.0, 308.0]
2026-01-23 00:05:25,597 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (320.70) for latency DatasetOffice
2026-01-23 00:05:25,600 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 27 minutes, 9 seconds)
2026-01-23 00:06:58,615 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:07:06,507 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 429.70322 ± 254.655
2026-01-23 00:07:06,507 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [343.6995, 481.30466, 643.10425, 31.975763, 63.672558, 632.0175, 662.2744, 147.09032, 773.0605, 518.83264]
2026-01-23 00:07:06,507 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 44.0, 98.0, 1000.0, 908.0, 196.0, 1000.0, 1000.0]
2026-01-23 00:07:06,507 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (429.70) for latency DatasetOffice
2026-01-23 00:07:06,512 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 29 minutes, 37 seconds)
2026-01-23 00:08:34,787 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:08:43,417 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 483.79410 ± 191.882
2026-01-23 00:08:43,417 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [643.6404, 709.7469, 472.2171, 192.92032, 496.2111, 507.70236, 316.25455, 506.72748, 198.59798, 793.92303]
2026-01-23 00:08:43,417 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 207.0, 1000.0, 1000.0, 514.0, 1000.0, 289.0, 1000.0]
2026-01-23 00:08:43,417 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (483.79) for latency DatasetOffice
2026-01-23 00:08:43,421 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 27 minutes, 59 seconds)
2026-01-23 00:10:09,532 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:10:15,777 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 385.86249 ± 199.152
2026-01-23 00:10:15,777 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [147.09685, 647.18866, 155.61534, 408.73096, 440.01715, 675.77747, 253.79645, 173.53897, 318.53522, 638.32776]
2026-01-23 00:10:15,777 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [226.0, 1000.0, 204.0, 534.0, 890.0, 1000.0, 311.0, 219.0, 472.0, 1000.0]
2026-01-23 00:10:15,781 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 25 minutes, 27 seconds)
2026-01-23 00:11:51,391 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:12:01,968 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 646.36340 ± 71.611
2026-01-23 00:12:01,968 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [634.9896, 683.6526, 497.27908, 703.2989, 625.4878, 738.97754, 585.78156, 622.3505, 748.78455, 623.0324]
2026-01-23 00:12:01,968 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 971.0, 1000.0, 1000.0, 1000.0, 1000.0, 691.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:12:01,969 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (646.36) for latency DatasetOffice
2026-01-23 00:12:01,973 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 26 minutes, 4 seconds)
2026-01-23 00:13:32,063 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:13:42,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 690.34064 ± 151.836
2026-01-23 00:13:42,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [781.2107, 684.19366, 521.14935, 725.9163, 811.7153, 858.4543, 717.2918, 714.7493, 773.94855, 314.7771]
2026-01-23 00:13:42,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 373.0]
2026-01-23 00:13:42,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (690.34) for latency DatasetOffice
2026-01-23 00:13:42,353 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 24 minutes, 3 seconds)
2026-01-23 00:15:07,207 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:15:18,147 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 817.02026 ± 78.875
2026-01-23 00:15:18,147 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [884.75684, 829.3953, 801.0718, 780.1267, 812.0339, 842.4922, 605.5645, 854.59015, 856.61005, 903.5617]
2026-01-23 00:15:18,147 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:15:18,147 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (817.02) for latency DatasetOffice
2026-01-23 00:15:18,151 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 20 minutes, 56 seconds)
2026-01-23 00:16:50,130 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:16:58,173 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 532.24628 ± 315.394
2026-01-23 00:16:58,173 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [98.65196, 833.2519, 66.54675, 633.50684, 549.89417, 66.548485, 884.57825, 788.773, 587.78296, 812.92883]
2026-01-23 00:16:58,173 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [127.0, 1000.0, 96.0, 1000.0, 1000.0, 67.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:16:58,179 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 20 minutes, 10 seconds)
2026-01-23 00:18:31,446 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:18:40,696 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 743.88318 ± 279.284
2026-01-23 00:18:40,697 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1018.30646, 822.3212, 958.28955, 995.8407, 500.1814, 856.1376, 92.339355, 670.99634, 554.1249, 970.2941]
2026-01-23 00:18:40,697 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 843.0, 1000.0, 1000.0, 507.0, 1000.0, 96.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:18:40,703 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 21 minutes, 22 seconds)
2026-01-23 00:20:14,693 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:20:22,867 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 636.75159 ± 306.982
2026-01-23 00:20:22,868 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [236.97334, 757.557, 716.0756, 861.53107, 298.51764, 743.96155, 895.8102, 1115.5792, 99.31935, 642.1911]
2026-01-23 00:20:22,868 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [214.0, 1000.0, 1000.0, 1000.0, 268.0, 1000.0, 1000.0, 1000.0, 71.0, 1000.0]
2026-01-23 00:20:22,872 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 18 minutes, 34 seconds)
2026-01-23 00:21:44,740 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:21:54,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 813.14166 ± 193.112
2026-01-23 00:21:54,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [703.1106, 1091.8665, 415.43808, 835.648, 700.1032, 1011.1003, 989.0934, 959.4234, 711.8084, 713.8248]
2026-01-23 00:21:54,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 406.0, 1000.0, 615.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:21:54,635 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 14 minutes, 33 seconds)
2026-01-23 00:23:25,335 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:23:34,150 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 719.91669 ± 300.857
2026-01-23 00:23:34,150 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [357.34784, 146.20587, 701.26434, 975.61224, 751.20496, 1120.8976, 665.114, 801.75714, 541.0721, 1138.6908]
2026-01-23 00:23:34,150 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [307.0, 129.0, 1000.0, 853.0, 714.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:23:34,155 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 13 minutes, 55 seconds)
2026-01-23 00:25:06,863 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:25:11,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 428.09146 ± 355.808
2026-01-23 00:25:11,132 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [23.07014, 290.08102, 337.36346, 611.6802, 173.02242, 748.54724, 1242.1106, 262.41602, 568.71375, 23.909437]
2026-01-23 00:25:11,132 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [17.0, 264.0, 288.0, 547.0, 182.0, 1000.0, 1000.0, 212.0, 481.0, 19.0]
2026-01-23 00:25:11,138 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 11 minutes, 27 seconds)
2026-01-23 00:26:39,722 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:26:45,506 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 590.36462 ± 434.668
2026-01-23 00:26:45,506 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [381.48798, 880.4064, 985.0315, 1132.4625, 770.2365, 111.79269, 163.98224, 23.276608, 222.70773, 1232.2622]
2026-01-23 00:26:45,506 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [348.0, 793.0, 842.0, 1000.0, 1000.0, 74.0, 141.0, 30.0, 253.0, 1000.0]
2026-01-23 00:26:45,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 7 minutes, 39 seconds)
2026-01-23 00:28:20,707 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:28:26,684 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 464.52231 ± 313.497
2026-01-23 00:28:26,684 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [309.47827, 769.2303, 313.36365, 18.583107, 840.56573, 638.85333, 817.59546, 7.5673895, 201.4236, 728.56213]
2026-01-23 00:28:26,684 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [317.0, 1000.0, 234.0, 16.0, 1000.0, 1000.0, 668.0, 19.0, 141.0, 1000.0]
2026-01-23 00:28:26,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 5 minutes, 47 seconds)
2026-01-23 00:29:56,279 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:30:02,698 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 698.44611 ± 366.756
2026-01-23 00:30:02,698 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [223.45558, 790.5739, 1258.9167, 846.9397, 817.1546, 501.93808, 22.722635, 686.19104, 627.8526, 1208.716]
2026-01-23 00:30:02,699 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [186.0, 1000.0, 1000.0, 1000.0, 608.0, 315.0, 16.0, 541.0, 472.0, 919.0]
2026-01-23 00:30:02,707 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 5 minutes, 16 seconds)
2026-01-23 00:31:29,178 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:31:36,783 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 890.47839 ± 471.862
2026-01-23 00:31:36,783 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [421.53165, 1463.3627, 1412.7354, 990.44794, 507.0714, 949.1684, 1321.5759, 133.93336, 360.47122, 1344.486]
2026-01-23 00:31:36,783 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [272.0, 1000.0, 1000.0, 1000.0, 381.0, 1000.0, 1000.0, 89.0, 251.0, 1000.0]
2026-01-23 00:31:36,783 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (890.48) for latency DatasetOffice
2026-01-23 00:31:36,788 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 2 minutes, 16 seconds)
2026-01-23 00:33:11,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:33:21,070 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 771.98431 ± 226.002
2026-01-23 00:33:21,070 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [904.6642, 415.58096, 837.6778, 683.07495, 1126.8961, 1088.4265, 695.5812, 836.47864, 437.77243, 693.6903]
2026-01-23 00:33:21,070 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 322.0, 674.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 384.0, 1000.0]
2026-01-23 00:33:21,077 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 2 minutes, 29 seconds)
2026-01-23 00:34:54,729 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:35:02,897 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 895.28107 ± 390.773
2026-01-23 00:35:02,897 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1099.664, 228.78586, 1168.7025, 1059.0575, 1362.8804, 1305.7968, 382.06088, 379.29846, 923.4058, 1043.1578]
2026-01-23 00:35:02,897 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [829.0, 201.0, 1000.0, 1000.0, 1000.0, 1000.0, 296.0, 257.0, 1000.0, 1000.0]
2026-01-23 00:35:02,897 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (895.28) for latency DatasetOffice
2026-01-23 00:35:02,904 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 2 minutes, 41 seconds)
2026-01-23 00:36:33,111 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:36:38,817 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 540.19031 ± 334.727
2026-01-23 00:36:38,817 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [62.393326, 419.78006, 490.63156, 841.7431, 243.5776, 849.5678, 884.0831, 553.1828, 1019.31165, 37.632866]
2026-01-23 00:36:38,817 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [56.0, 289.0, 362.0, 1000.0, 174.0, 1000.0, 1000.0, 403.0, 1000.0, 37.0]
2026-01-23 00:36:38,822 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 28/100 (estimated time remaining: 1 hour, 59 minutes, 45 seconds)
2026-01-23 00:38:10,550 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:38:19,195 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 787.57684 ± 269.355
2026-01-23 00:38:19,195 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [730.8238, 732.5047, 745.81085, 595.3806, 210.09915, 963.0264, 1218.0509, 1049.3662, 640.8055, 989.9005]
2026-01-23 00:38:19,196 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 393.0, 204.0, 1000.0, 1000.0, 806.0, 455.0, 1000.0]
2026-01-23 00:38:19,202 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 29/100 (estimated time remaining: 1 hour, 59 minutes, 9 seconds)
2026-01-23 00:39:42,152 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:39:50,487 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 922.27087 ± 489.604
2026-01-23 00:39:50,487 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [758.15845, 1397.839, 90.767235, 21.577078, 1045.6229, 1304.5809, 1116.3358, 806.4238, 1142.2731, 1539.1306]
2026-01-23 00:39:50,487 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [522.0, 1000.0, 72.0, 18.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:39:50,487 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (922.27) for latency DatasetOffice
2026-01-23 00:39:50,493 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 30/100 (estimated time remaining: 1 hour, 56 minutes, 50 seconds)
2026-01-23 00:41:21,261 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:41:31,948 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1192.05359 ± 213.256
2026-01-23 00:41:31,948 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [865.0356, 1400.0143, 1365.1772, 1337.3429, 919.1984, 1376.138, 1351.0132, 913.7613, 1050.7894, 1342.066]
2026-01-23 00:41:31,948 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 749.0, 1000.0]
2026-01-23 00:41:31,948 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (1192.05) for latency DatasetOffice
2026-01-23 00:41:31,955 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 54 minutes, 32 seconds)
2026-01-23 00:43:03,647 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:43:10,744 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 689.02698 ± 481.328
2026-01-23 00:43:10,744 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [500.23676, 112.696556, 808.8646, 27.455324, 1263.3265, 767.556, 20.858772, 1122.3679, 1390.0477, 876.86]
2026-01-23 00:43:10,744 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [334.0, 101.0, 1000.0, 18.0, 1000.0, 1000.0, 18.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:43:10,750 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 52 minutes, 12 seconds)
2026-01-23 00:44:44,707 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:44:52,637 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 962.39050 ± 511.138
2026-01-23 00:44:52,638 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1336.3739, 847.18176, 1545.0164, 1192.7777, 203.252, 1484.3774, 184.49832, 1494.4066, 947.92926, 388.0915]
2026-01-23 00:44:52,638 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [855.0, 1000.0, 1000.0, 1000.0, 163.0, 1000.0, 120.0, 1000.0, 1000.0, 255.0]
2026-01-23 00:44:52,644 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 51 minutes, 55 seconds)
2026-01-23 00:46:17,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:46:23,737 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 862.59552 ± 606.626
2026-01-23 00:46:23,737 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [126.87395, 1220.1771, 1411.3578, 153.96014, 1014.65784, 1510.4277, 105.69287, 168.67554, 1498.7178, 1415.4148]
2026-01-23 00:46:23,737 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [83.0, 817.0, 1000.0, 111.0, 1000.0, 1000.0, 70.0, 126.0, 1000.0, 838.0]
2026-01-23 00:46:23,743 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 48 minutes, 12 seconds)
2026-01-23 00:47:54,754 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:48:02,561 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 761.15997 ± 292.923
2026-01-23 00:48:02,561 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [776.64526, 952.9417, 509.7055, 534.1309, 696.2961, 1448.8556, 906.1307, 730.2007, 306.1563, 750.53735]
2026-01-23 00:48:02,561 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 642.0, 369.0, 426.0, 620.0, 1000.0, 1000.0, 1000.0, 188.0, 1000.0]
2026-01-23 00:48:02,567 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 48 minutes, 15 seconds)
2026-01-23 00:49:39,369 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:49:45,273 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 771.44482 ± 598.458
2026-01-23 00:49:45,274 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [432.40042, 710.88574, 677.9107, 1487.2593, 1117.1848, 1556.7522, 60.573685, 1576.3475, 94.035866, 1.0979892]
2026-01-23 00:49:45,274 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [284.0, 1000.0, 428.0, 980.0, 693.0, 1000.0, 38.0, 1000.0, 54.0, 17.0]
2026-01-23 00:49:45,280 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 46 minutes, 53 seconds)
2026-01-23 00:51:09,572 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:51:16,046 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 704.73376 ± 515.332
2026-01-23 00:51:16,046 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [738.2115, 895.7182, 437.1612, 804.4713, 1547.3835, 9.895861, 1009.4919, 122.369064, 67.16038, 1415.4747]
2026-01-23 00:51:16,046 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 553.0, 306.0, 1000.0, 935.0, 17.0, 1000.0, 108.0, 59.0, 1000.0]
2026-01-23 00:51:16,051 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 43 minutes, 31 seconds)
2026-01-23 00:52:51,778 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:52:59,953 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 748.55536 ± 382.261
2026-01-23 00:52:59,954 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1007.58765, 308.2196, 994.1268, 778.6604, 930.56824, 280.8711, 663.21063, 117.3647, 1388.7617, 1016.18335]
2026-01-23 00:52:59,954 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 222.0, 1000.0, 1000.0, 1000.0, 193.0, 1000.0, 75.0, 823.0, 1000.0]
2026-01-23 00:52:59,962 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 42 minutes, 20 seconds)
2026-01-23 00:54:27,345 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:54:35,352 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 890.58301 ± 419.324
2026-01-23 00:54:35,352 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1388.7996, 184.29538, 744.87775, 914.2298, 897.23114, 110.26064, 1037.9431, 1312.2262, 1020.10596, 1295.8604]
2026-01-23 00:54:35,352 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 113.0, 1000.0, 1000.0, 1000.0, 100.0, 636.0, 924.0, 658.0, 1000.0]
2026-01-23 00:54:35,359 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 41 minutes, 36 seconds)
2026-01-23 00:56:04,049 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:56:10,631 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 806.88007 ± 407.947
2026-01-23 00:56:10,631 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1068.2628, 962.6195, 919.99335, 334.49274, 535.89795, 1559.4176, 602.93146, 1098.9152, 924.52216, 61.748318]
2026-01-23 00:56:10,631 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 563.0, 222.0, 315.0, 1000.0, 419.0, 1000.0, 535.0, 40.0]
2026-01-23 00:56:10,638 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 39 minutes, 14 seconds)
2026-01-23 00:57:45,917 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:57:55,658 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1165.01990 ± 305.554
2026-01-23 00:57:55,658 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1108.38, 1495.0182, 746.20654, 781.9644, 1242.6753, 963.3739, 1531.2399, 964.5, 1691.1603, 1125.6802]
2026-01-23 00:57:55,659 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 593.0, 567.0, 782.0, 1000.0, 923.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:57:55,666 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 38 minutes, 4 seconds)
2026-01-23 00:59:26,853 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:59:32,605 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 659.65454 ± 408.928
2026-01-23 00:59:32,605 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [542.9308, 70.98252, 670.7906, 283.12152, 861.44147, 766.3682, 20.129755, 1195.9406, 1245.539, 939.30115]
2026-01-23 00:59:32,606 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [330.0, 49.0, 1000.0, 188.0, 1000.0, 469.0, 17.0, 1000.0, 749.0, 563.0]
2026-01-23 00:59:32,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 37 minutes, 39 seconds)
2026-01-23 01:01:04,708 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:01:11,562 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 840.46094 ± 350.580
2026-01-23 01:01:11,562 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1378.3198, 1055.7977, 808.05035, 896.87103, 651.04724, 778.31976, 23.444736, 918.81384, 1223.6383, 670.3071]
2026-01-23 01:01:11,562 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 629.0, 623.0, 557.0, 377.0, 1000.0, 18.0, 1000.0, 773.0, 404.0]
2026-01-23 01:01:11,571 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 35 minutes, 2 seconds)
2026-01-23 01:02:40,277 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:02:47,371 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 873.24933 ± 477.905
2026-01-23 01:02:47,371 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1180.4904, 630.5504, 1158.9309, 193.56627, 1005.7386, 1331.6559, 1182.2214, 1557.8347, 183.56015, 307.9443]
2026-01-23 01:02:47,371 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 396.0, 681.0, 137.0, 1000.0, 1000.0, 1000.0, 1000.0, 135.0, 185.0]
2026-01-23 01:02:47,378 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 33 minutes, 29 seconds)
2026-01-23 01:04:10,901 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:04:17,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 829.87439 ± 422.297
2026-01-23 01:04:17,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1685.6581, 71.92913, 621.1354, 1220.1846, 671.90753, 844.02344, 660.30597, 604.0756, 1226.178, 693.34686]
2026-01-23 01:04:17,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 47.0, 397.0, 1000.0, 1000.0, 440.0, 338.0, 402.0, 1000.0, 1000.0]
2026-01-23 01:04:17,996 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 30 minutes, 58 seconds)
2026-01-23 01:05:48,384 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:05:55,209 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 775.56616 ± 404.083
2026-01-23 01:05:55,209 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [964.4387, 387.50153, 1036.866, 1603.6351, 767.07965, 418.98596, 467.49466, 977.00146, 980.0462, 152.6122]
2026-01-23 01:05:55,209 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 232.0, 1000.0, 1000.0, 486.0, 273.0, 256.0, 1000.0, 1000.0, 90.0]
2026-01-23 01:05:55,217 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 27 minutes, 55 seconds)
2026-01-23 01:07:26,423 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:07:30,416 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 484.17612 ± 349.194
2026-01-23 01:07:30,416 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1014.2495, 235.47704, 284.02, 1203.9457, 299.8091, 305.6848, 660.5161, 66.93455, 267.40814, 503.71616]
2026-01-23 01:07:30,416 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 135.0, 173.0, 1000.0, 198.0, 245.0, 379.0, 47.0, 255.0, 338.0]
2026-01-23 01:07:30,425 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 26 minutes)
2026-01-23 01:09:02,377 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:09:11,409 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1101.45251 ± 529.128
2026-01-23 01:09:11,409 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [811.6508, 1948.38, 1546.4426, 701.31854, 1793.9219, 751.38165, 1140.4193, 966.0512, 103.48272, 1251.4763]
2026-01-23 01:09:11,409 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 842.0, 1000.0, 1000.0, 454.0, 1000.0, 1000.0, 49.0, 1000.0]
2026-01-23 01:09:11,417 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 24 minutes, 46 seconds)
2026-01-23 01:10:43,533 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:10:51,469 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 993.69080 ± 573.986
2026-01-23 01:10:51,469 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1821.8806, 533.2596, 49.561817, 723.27264, 1676.8978, 405.88605, 990.66144, 751.5699, 1671.8651, 1312.0538]
2026-01-23 01:10:51,469 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 350.0, 32.0, 408.0, 1000.0, 1000.0, 517.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:10:51,476 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 23 minutes, 54 seconds)
2026-01-23 01:12:24,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:12:32,138 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 943.46130 ± 636.013
2026-01-23 01:12:32,138 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [774.5613, 1861.4628, 1015.22345, 1698.7034, 1172.6661, 7.1886654, 222.00748, 917.44244, 127.48706, 1637.8716]
2026-01-23 01:12:32,138 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 632.0, 1000.0, 674.0, 16.0, 165.0, 1000.0, 84.0, 1000.0]
2026-01-23 01:12:32,147 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 24 minutes)
2026-01-23 01:13:59,969 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:14:07,570 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1061.36890 ± 593.859
2026-01-23 01:14:07,570 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [906.98083, 453.982, 76.992386, 1583.5431, 1606.2229, 213.57353, 1756.4872, 1576.783, 1022.39124, 1416.7322]
2026-01-23 01:14:07,570 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 248.0, 40.0, 1000.0, 1000.0, 123.0, 1000.0, 1000.0, 617.0, 972.0]
2026-01-23 01:14:07,580 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 22 minutes, 3 seconds)
2026-01-23 01:15:40,380 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:15:49,195 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 974.39636 ± 500.102
2026-01-23 01:15:49,195 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [714.9059, 173.62453, 790.6941, 1323.0449, 702.4089, 1235.8328, 253.23889, 1308.739, 1503.5745, 1737.9006]
2026-01-23 01:15:49,195 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 91.0, 1000.0, 1000.0, 1000.0, 1000.0, 148.0, 768.0, 1000.0, 1000.0]
2026-01-23 01:15:49,203 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 21 minutes, 28 seconds)
2026-01-23 01:17:19,982 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:17:30,040 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1226.02917 ± 593.603
2026-01-23 01:17:30,040 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [21.13389, 889.28064, 1466.8488, 1715.7491, 1950.9946, 1851.0189, 774.7593, 1771.4468, 762.90753, 1056.1521]
2026-01-23 01:17:30,040 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [36.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:17:30,040 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (1226.03) for latency DatasetOffice
2026-01-23 01:17:30,048 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 19 minutes, 46 seconds)
2026-01-23 01:18:58,577 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:19:06,019 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1016.31628 ± 584.500
2026-01-23 01:19:06,019 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1703.8534, 203.9271, 104.64451, 454.61288, 935.0463, 846.3535, 1683.4524, 1179.2219, 1760.9277, 1291.123]
2026-01-23 01:19:06,019 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 123.0, 58.0, 305.0, 1000.0, 1000.0, 1000.0, 651.0, 1000.0, 700.0]
2026-01-23 01:19:06,027 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 17 minutes, 28 seconds)
2026-01-23 01:20:36,085 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:20:43,174 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 940.49579 ± 686.893
2026-01-23 01:20:43,174 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1603.5165, 1927.6678, 1622.549, 450.86572, 1151.0162, 59.1705, 1573.1636, 235.92891, 762.3875, 18.692005]
2026-01-23 01:20:43,174 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 293.0, 1000.0, 29.0, 1000.0, 131.0, 1000.0, 16.0]
2026-01-23 01:20:43,182 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 15 minutes, 17 seconds)
2026-01-23 01:22:14,261 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:22:24,572 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1206.74451 ± 425.853
2026-01-23 01:22:24,572 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1334.1034, 749.4853, 1671.3099, 1392.7429, 1059.8102, 1247.4385, 753.6916, 1790.7428, 438.83493, 1629.2861]
2026-01-23 01:22:24,572 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 794.0, 1000.0, 736.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:22:24,588 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 14 minutes, 33 seconds)
2026-01-23 01:23:55,787 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:24:06,180 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1373.80798 ± 357.747
2026-01-23 01:24:06,180 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1735.4255, 1077.8553, 920.65643, 1815.8608, 1099.2335, 1788.6555, 1196.823, 1795.4012, 1386.0352, 922.13336]
2026-01-23 01:24:06,180 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 696.0, 1000.0, 739.0, 1000.0]
2026-01-23 01:24:06,180 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (1373.81) for latency DatasetOffice
2026-01-23 01:24:06,190 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 12 minutes, 53 seconds)
2026-01-23 01:25:35,105 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:25:38,628 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 575.99042 ± 486.629
2026-01-23 01:25:38,628 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [80.4353, 1508.6879, 1292.2731, 559.57434, 353.95367, 11.18219, 21.519218, 676.03485, 783.1393, 473.1046]
2026-01-23 01:25:38,628 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [56.0, 1000.0, 788.0, 297.0, 211.0, 16.0, 17.0, 407.0, 406.0, 249.0]
2026-01-23 01:25:38,638 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 10 minutes, 1 second)
2026-01-23 01:27:05,814 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:27:10,227 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 629.96539 ± 559.764
2026-01-23 01:27:10,227 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1949.2219, 1105.6642, 90.03637, 58.85213, 185.5864, 466.63382, 195.46771, 577.77856, 678.6296, 991.783]
2026-01-23 01:27:10,227 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 42.0, 34.0, 85.0, 201.0, 113.0, 316.0, 335.0, 1000.0]
2026-01-23 01:27:10,236 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 7 minutes, 47 seconds)
2026-01-23 01:28:44,084 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:28:47,603 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 477.96979 ± 387.156
2026-01-23 01:28:47,604 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [365.44766, 1343.768, 154.10129, 946.9884, 433.5335, 40.20926, 666.3751, 270.7234, 453.5579, 104.99336]
2026-01-23 01:28:47,604 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [199.0, 1000.0, 71.0, 1000.0, 235.0, 24.0, 342.0, 168.0, 254.0, 65.0]
2026-01-23 01:28:47,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 6 minutes, 12 seconds)
2026-01-23 01:30:14,976 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:30:21,677 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 991.64111 ± 661.932
2026-01-23 01:30:21,677 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [37.224934, 673.9222, 853.70935, 408.61447, 1824.1141, 1859.019, 1766.2224, 91.45834, 1017.5315, 1384.5947]
2026-01-23 01:30:21,677 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [20.0, 375.0, 496.0, 243.0, 1000.0, 1000.0, 1000.0, 56.0, 1000.0, 1000.0]
2026-01-23 01:30:21,687 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 3 minutes, 36 seconds)
2026-01-23 01:31:59,588 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:32:07,156 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1181.12634 ± 529.147
2026-01-23 01:32:07,156 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1078.4854, 871.24695, 1134.3379, 632.4163, 1553.6344, 121.478134, 1238.8346, 2068.285, 1724.591, 1387.954]
2026-01-23 01:32:07,156 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [571.0, 411.0, 1000.0, 346.0, 1000.0, 61.0, 648.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:32:07,167 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 2 minutes, 31 seconds)
2026-01-23 01:33:28,793 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:33:36,080 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1031.73010 ± 624.830
2026-01-23 01:33:36,080 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [448.6788, 1945.5604, 661.00684, 854.19916, 1518.4667, 1301.6857, 335.10477, 993.4223, 195.65991, 2063.5159]
2026-01-23 01:33:36,080 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [224.0, 1000.0, 357.0, 1000.0, 834.0, 1000.0, 182.0, 1000.0, 99.0, 1000.0]
2026-01-23 01:33:36,089 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 28 seconds)
2026-01-23 01:35:10,834 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:35:15,880 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 702.20312 ± 442.428
2026-01-23 01:35:15,880 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [260.26852, 406.96487, 94.24766, 1672.0409, 533.16846, 755.3152, 379.36667, 1033.0055, 982.2194, 905.4339]
2026-01-23 01:35:15,880 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [144.0, 226.0, 47.0, 1000.0, 311.0, 346.0, 184.0, 1000.0, 1000.0, 482.0]
2026-01-23 01:35:15,889 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 64/100 (estimated time remaining: 59 minutes, 53 seconds)
2026-01-23 01:36:41,474 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:36:49,029 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1061.85706 ± 623.807
2026-01-23 01:36:49,029 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1645.6438, 1879.6243, 725.6455, 1949.8201, 61.409092, 1304.124, 854.88696, 331.10562, 538.9921, 1327.3198]
2026-01-23 01:36:49,029 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 870.0, 1000.0, 1000.0, 35.0, 784.0, 1000.0, 280.0, 334.0, 725.0]
2026-01-23 01:36:49,040 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 65/100 (estimated time remaining: 57 minutes, 46 seconds)
2026-01-23 01:38:26,870 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:38:32,686 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 753.64417 ± 589.949
2026-01-23 01:38:32,686 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [893.2433, 1193.3536, 357.86398, 1775.4401, 303.27548, 1232.7974, 143.16753, 191.61688, 35.824345, 1409.8589]
2026-01-23 01:38:32,686 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 182.0, 1000.0, 168.0, 1000.0, 82.0, 138.0, 26.0, 781.0]
2026-01-23 01:38:32,696 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 66/100 (estimated time remaining: 57 minutes, 17 seconds)
2026-01-23 01:39:56,694 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:40:00,934 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 734.79504 ± 513.845
2026-01-23 01:40:00,934 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [415.33386, 13.301284, 1498.1141, 1247.5382, 263.2838, 1308.8506, 888.1399, 498.49588, 1111.7002, 103.19267]
2026-01-23 01:40:00,934 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [228.0, 16.0, 727.0, 638.0, 140.0, 1000.0, 442.0, 278.0, 510.0, 59.0]
2026-01-23 01:40:00,945 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 67/100 (estimated time remaining: 53 minutes, 41 seconds)
2026-01-23 01:41:32,510 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:41:36,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 725.75287 ± 656.879
2026-01-23 01:41:36,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [121.201355, 392.60043, 109.76684, 2042.0106, 48.52504, 1772.7188, 853.8491, 751.7561, 820.862, 344.2387]
2026-01-23 01:41:36,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [75.0, 199.0, 85.0, 1000.0, 25.0, 1000.0, 475.0, 375.0, 401.0, 170.0]
2026-01-23 01:41:36,525 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 68/100 (estimated time remaining: 52 minutes, 50 seconds)
2026-01-23 01:43:09,656 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:43:16,752 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1040.42041 ± 575.245
2026-01-23 01:43:16,752 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [2020.2838, 886.4745, 805.1081, 946.3479, 260.39746, 1939.0183, 1108.2919, 277.2764, 1437.5022, 723.50397]
2026-01-23 01:43:16,752 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 426.0, 1000.0, 548.0, 135.0, 1000.0, 1000.0, 153.0, 1000.0, 382.0]
2026-01-23 01:43:16,760 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 69/100 (estimated time remaining: 51 minutes, 17 seconds)
2026-01-23 01:44:44,249 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:44:47,351 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 504.90057 ± 404.759
2026-01-23 01:44:47,352 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [316.55594, 558.0564, 951.3547, 123.25458, 687.62164, 297.0298, 1466.6499, 181.26535, 261.20712, 206.01077]
2026-01-23 01:44:47,352 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [153.0, 281.0, 555.0, 61.0, 421.0, 177.0, 1000.0, 84.0, 162.0, 100.0]
2026-01-23 01:44:47,360 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 70/100 (estimated time remaining: 49 minutes, 25 seconds)
2026-01-23 01:46:23,796 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:46:28,736 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 801.77289 ± 718.610
2026-01-23 01:46:28,737 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [69.44373, 433.78452, 59.74097, 498.97134, 90.02869, 977.58075, 1673.9926, 420.77335, 1963.0441, 1830.369]
2026-01-23 01:46:28,737 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [43.0, 244.0, 31.0, 228.0, 53.0, 1000.0, 856.0, 225.0, 1000.0, 1000.0]
2026-01-23 01:46:28,747 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 71/100 (estimated time remaining: 47 minutes, 36 seconds)
2026-01-23 01:47:57,859 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:48:03,792 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 795.98376 ± 484.426
2026-01-23 01:48:03,792 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [236.58244, 856.3807, 1507.4543, 16.735168, 824.55066, 837.91425, 768.3436, 247.94588, 1193.2994, 1470.6312]
2026-01-23 01:48:03,792 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [232.0, 626.0, 822.0, 15.0, 1000.0, 411.0, 1000.0, 96.0, 572.0, 726.0]
2026-01-23 01:48:03,801 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 72/100 (estimated time remaining: 46 minutes, 40 seconds)
2026-01-23 01:49:33,384 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:49:35,860 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 489.81689 ± 375.593
2026-01-23 01:49:35,861 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1487.0605, 293.84174, 698.6739, 172.62788, 273.00522, 360.47818, 322.28128, 584.90393, 134.59427, 570.702]
2026-01-23 01:49:35,861 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [742.0, 139.0, 357.0, 96.0, 151.0, 180.0, 154.0, 286.0, 78.0, 318.0]
2026-01-23 01:49:35,870 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 73/100 (estimated time remaining: 44 minutes, 44 seconds)
2026-01-23 01:51:01,343 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:51:10,031 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 876.77802 ± 507.536
2026-01-23 01:51:10,032 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [956.0415, 1056.2815, 170.35754, 997.2797, 771.5139, 83.6731, 1990.0629, 735.54254, 788.68695, 1218.3405]
2026-01-23 01:51:10,032 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 82.0, 1000.0, 1000.0, 67.0, 968.0, 1000.0, 1000.0, 666.0]
2026-01-23 01:51:10,041 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 74/100 (estimated time remaining: 42 minutes, 35 seconds)
2026-01-23 01:52:44,992 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:52:48,639 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 555.98499 ± 391.001
2026-01-23 01:52:48,639 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [42.77909, 13.408124, 423.9881, 995.2583, 894.2794, 999.915, 1026.5984, 89.89991, 467.21228, 606.51117]
2026-01-23 01:52:48,639 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [26.0, 16.0, 209.0, 475.0, 416.0, 728.0, 1000.0, 62.0, 260.0, 267.0]
2026-01-23 01:52:48,649 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 75/100 (estimated time remaining: 41 minutes, 42 seconds)
2026-01-23 01:54:16,588 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:54:23,169 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1161.19360 ± 747.167
2026-01-23 01:54:23,170 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [677.4551, 690.2157, 1827.0792, 1956.9216, 212.37485, 2157.4954, 693.3311, 54.250088, 1361.2037, 1981.6097]
2026-01-23 01:54:23,170 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [304.0, 358.0, 931.0, 1000.0, 107.0, 1000.0, 395.0, 39.0, 1000.0, 1000.0]
2026-01-23 01:54:23,180 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 76/100 (estimated time remaining: 39 minutes, 32 seconds)
2026-01-23 01:55:52,326 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:55:58,881 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1136.81946 ± 660.021
2026-01-23 01:55:58,881 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [537.4865, 549.11426, 303.09644, 1261.0559, 2072.6565, 2121.1628, 878.1999, 874.2901, 738.7011, 2032.431]
2026-01-23 01:55:58,881 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [349.0, 315.0, 111.0, 545.0, 1000.0, 1000.0, 1000.0, 451.0, 341.0, 1000.0]
2026-01-23 01:55:58,890 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 77/100 (estimated time remaining: 38 minutes)
2026-01-23 01:57:29,189 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:57:35,206 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 715.76434 ± 373.394
2026-01-23 01:57:35,207 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1113.17, 127.43721, 1082.1205, 620.23425, 739.43933, 168.13496, 443.09583, 631.32104, 968.0892, 1264.6008]
2026-01-23 01:57:35,207 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 76.0, 485.0, 308.0, 361.0, 92.0, 248.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:57:35,218 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 78/100 (estimated time remaining: 36 minutes, 45 seconds)
2026-01-23 01:59:09,971 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:59:15,483 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 640.21570 ± 485.920
2026-01-23 01:59:15,483 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [87.57207, 152.31006, 1035.4302, 720.50977, 173.18324, 1146.7356, 847.2518, 356.10596, 1603.054, 280.0042]
2026-01-23 01:59:15,483 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [55.0, 103.0, 1000.0, 1000.0, 85.0, 1000.0, 471.0, 160.0, 1000.0, 180.0]
2026-01-23 01:59:15,494 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 79/100 (estimated time remaining: 35 minutes, 35 seconds)
2026-01-23 02:00:43,006 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:00:48,890 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 889.44202 ± 711.371
2026-01-23 02:00:48,890 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [774.1506, 8.376539, 2045.4279, 1997.9364, 1385.1576, 137.25899, 718.23865, 520.86426, 97.69443, 1209.314]
2026-01-23 02:00:48,890 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [408.0, 14.0, 1000.0, 947.0, 709.0, 94.0, 1000.0, 248.0, 58.0, 1000.0]
2026-01-23 02:00:48,900 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 80/100 (estimated time remaining: 33 minutes, 37 seconds)
2026-01-23 02:02:20,122 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:02:29,178 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1228.52625 ± 403.493
2026-01-23 02:02:29,179 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1334.2505, 1699.7018, 1665.9039, 1377.5856, 565.98944, 1790.6869, 649.1752, 1090.2335, 1133.284, 978.45197]
2026-01-23 02:02:29,179 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [684.0, 944.0, 1000.0, 1000.0, 311.0, 1000.0, 1000.0, 1000.0, 1000.0, 517.0]
2026-01-23 02:02:29,188 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 81/100 (estimated time remaining: 32 minutes, 24 seconds)
2026-01-23 02:03:58,758 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:04:04,881 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1116.22229 ± 597.674
2026-01-23 02:04:04,881 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1867.2688, 899.18915, 731.28046, 19.433016, 605.66724, 861.8141, 1909.0215, 1921.8369, 1159.5325, 1187.1786]
2026-01-23 02:04:04,881 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [968.0, 430.0, 392.0, 16.0, 319.0, 425.0, 1000.0, 962.0, 739.0, 567.0]
2026-01-23 02:04:04,893 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 82/100 (estimated time remaining: 30 minutes, 46 seconds)
2026-01-23 02:05:40,394 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:05:45,098 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 904.05792 ± 717.501
2026-01-23 02:05:45,098 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [2240.2651, 367.33472, 358.5855, 1778.9117, 429.23688, 265.4333, 1651.3743, 5.1047997, 1007.32666, 937.00616]
2026-01-23 02:05:45,098 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 171.0, 183.0, 1000.0, 277.0, 140.0, 734.0, 14.0, 454.0, 513.0]
2026-01-23 02:05:45,108 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 83/100 (estimated time remaining: 29 minutes, 23 seconds)
2026-01-23 02:07:08,220 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:07:14,879 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1085.79968 ± 692.533
2026-01-23 02:07:14,879 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1422.3333, 300.58142, 59.037926, 553.7641, 2208.582, 856.8194, 2025.3197, 1730.9603, 795.68, 904.9186]
2026-01-23 02:07:14,879 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 163.0, 40.0, 268.0, 1000.0, 1000.0, 967.0, 1000.0, 386.0, 453.0]
2026-01-23 02:07:14,889 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 84/100 (estimated time remaining: 27 minutes, 9 seconds)
2026-01-23 02:08:46,691 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:08:54,241 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1217.22241 ± 647.639
2026-01-23 02:08:54,241 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [821.7681, 79.6198, 1685.2373, 1406.8883, 2138.5613, 1306.6212, 1479.085, 589.7614, 2088.3762, 576.30426]
2026-01-23 02:08:54,241 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 54.0, 807.0, 607.0, 1000.0, 1000.0, 1000.0, 272.0, 1000.0, 262.0]
2026-01-23 02:08:54,253 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 85/100 (estimated time remaining: 25 minutes, 53 seconds)
2026-01-23 02:10:23,550 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:10:31,712 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1328.72229 ± 629.941
2026-01-23 02:10:31,712 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [255.03229, 729.6349, 1050.1147, 1968.8082, 2015.5902, 1884.7078, 1846.3232, 482.5869, 1276.5681, 1777.8568]
2026-01-23 02:10:31,712 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [153.0, 1000.0, 552.0, 1000.0, 1000.0, 1000.0, 1000.0, 270.0, 589.0, 1000.0]
2026-01-23 02:10:31,722 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 86/100 (estimated time remaining: 24 minutes, 7 seconds)
2026-01-23 02:12:08,614 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:12:13,611 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 907.46130 ± 657.373
2026-01-23 02:12:13,611 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [858.2611, 14.314482, 1168.5066, 1384.4296, 468.19147, 66.986916, 1136.9113, 292.72733, 1569.693, 2114.5923]
2026-01-23 02:12:13,611 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [418.0, 16.0, 537.0, 1000.0, 239.0, 66.0, 513.0, 153.0, 824.0, 1000.0]
2026-01-23 02:12:13,622 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 87/100 (estimated time remaining: 22 minutes, 48 seconds)
2026-01-23 02:13:38,065 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:13:44,533 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1160.95129 ± 736.703
2026-01-23 02:13:44,534 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [2171.143, 272.31964, 2249.4888, 909.61615, 1529.5693, 329.70218, 1551.7153, 79.66061, 940.7804, 1575.5178]
2026-01-23 02:13:44,534 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 124.0, 1000.0, 390.0, 648.0, 161.0, 1000.0, 46.0, 1000.0, 731.0]
2026-01-23 02:13:44,544 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 88/100 (estimated time remaining: 20 minutes, 46 seconds)
2026-01-23 02:15:14,304 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:15:21,591 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1019.87830 ± 723.145
2026-01-23 02:15:21,591 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [807.51746, 39.076176, 2048.482, 1947.7471, 1174.7786, 971.4213, 1073.5696, 1861.8185, 78.1707, 196.2022]
2026-01-23 02:15:21,591 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 27.0, 1000.0, 1000.0, 1000.0, 1000.0, 502.0, 916.0, 60.0, 120.0]
2026-01-23 02:15:21,603 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 89/100 (estimated time remaining: 19 minutes, 28 seconds)
2026-01-23 02:16:52,047 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:16:58,587 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1153.93237 ± 645.062
2026-01-23 02:16:58,587 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [2420.2803, 1602.4492, 1006.2556, 67.48831, 122.979774, 1204.1882, 1382.0266, 1331.5613, 1182.7069, 1219.387]
2026-01-23 02:16:58,588 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 516.0, 49.0, 71.0, 493.0, 1000.0, 1000.0, 449.0, 510.0]
2026-01-23 02:16:58,601 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 90/100 (estimated time remaining: 17 minutes, 45 seconds)
2026-01-23 02:18:30,139 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:18:38,220 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 890.33087 ± 321.156
2026-01-23 02:18:38,220 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [127.7311, 1063.8936, 488.31512, 1178.7614, 1061.7358, 893.5891, 795.9869, 1147.4877, 1116.7004, 1029.1077]
2026-01-23 02:18:38,220 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [80.0, 1000.0, 239.0, 537.0, 1000.0, 1000.0, 1000.0, 512.0, 1000.0, 1000.0]
2026-01-23 02:18:38,231 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 91/100 (estimated time remaining: 16 minutes, 13 seconds)
2026-01-23 02:20:11,040 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:20:15,979 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 887.84363 ± 601.767
2026-01-23 02:20:15,979 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [337.9053, 304.809, 465.17538, 570.6351, 887.75024, 692.79987, 2407.0579, 887.288, 844.17487, 1480.8407]
2026-01-23 02:20:15,979 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [191.0, 173.0, 254.0, 259.0, 1000.0, 344.0, 1000.0, 377.0, 433.0, 651.0]
2026-01-23 02:20:15,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 92/100 (estimated time remaining: 14 minutes, 28 seconds)
2026-01-23 02:21:48,475 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:21:55,039 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 916.24451 ± 433.195
2026-01-23 02:21:55,039 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [793.7477, 1810.3949, 592.1989, 598.2805, 1063.3821, 933.68744, 666.7516, 1351.9865, 193.06541, 1158.95]
2026-01-23 02:21:55,039 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 970.0, 308.0, 266.0, 1000.0, 503.0, 309.0, 688.0, 123.0, 1000.0]
2026-01-23 02:21:55,050 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 4 seconds)
2026-01-23 02:23:24,526 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:23:29,611 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 817.44983 ± 604.625
2026-01-23 02:23:29,612 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [277.44363, 718.12085, 120.618805, 437.68515, 1182.2323, 2268.405, 694.542, 356.62717, 793.3733, 1325.4503]
2026-01-23 02:23:29,612 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [126.0, 1000.0, 64.0, 201.0, 502.0, 1000.0, 313.0, 184.0, 344.0, 1000.0]
2026-01-23 02:23:29,627 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 23 seconds)
2026-01-23 02:24:54,259 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:24:58,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 594.68085 ± 448.994
2026-01-23 02:24:58,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1150.7373, 605.3286, 480.4684, 297.45227, 199.22739, 185.97855, 1518.944, 72.07955, 469.40277, 967.19]
2026-01-23 02:24:58,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 309.0, 211.0, 150.0, 126.0, 106.0, 638.0, 37.0, 187.0, 1000.0]
2026-01-23 02:24:58,313 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 95/100 (estimated time remaining: 9 minutes, 35 seconds)
2026-01-23 02:26:33,634 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:26:39,204 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 763.28125 ± 553.049
2026-01-23 02:26:39,205 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [114.62256, 177.13672, 531.0381, 1870.9419, 1045.6873, 1417.6628, 685.0059, 1121.8064, 373.425, 295.48608]
2026-01-23 02:26:39,205 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [39.0, 92.0, 211.0, 1000.0, 430.0, 1000.0, 1000.0, 1000.0, 193.0, 113.0]
2026-01-23 02:26:39,215 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes)
2026-01-23 02:28:10,994 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:28:16,196 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 960.43665 ± 672.550
2026-01-23 02:28:16,196 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [725.5091, 1216.4646, 965.82654, 76.83146, 2454.212, 117.67855, 1065.177, 376.30258, 1087.5789, 1518.7854]
2026-01-23 02:28:16,196 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [366.0, 576.0, 1000.0, 35.0, 1000.0, 56.0, 592.0, 172.0, 484.0, 633.0]
2026-01-23 02:28:16,210 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 24 seconds)
2026-01-23 02:29:47,695 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:29:52,485 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 614.04199 ± 483.328
2026-01-23 02:29:52,485 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1327.8789, 802.8364, 1441.707, 359.90985, 161.49554, 105.04847, 981.2971, 9.638336, 614.3949, 336.2135]
2026-01-23 02:29:52,485 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 625.0, 154.0, 99.0, 61.0, 1000.0, 16.0, 322.0, 208.0]
2026-01-23 02:29:52,495 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 98/100 (estimated time remaining: 4 minutes, 46 seconds)
2026-01-23 02:31:17,068 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:31:22,193 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 945.88879 ± 713.312
2026-01-23 02:31:22,194 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1989.3959, 620.8643, 638.492, 1286.719, 288.98322, 295.82178, 244.21657, 1726.246, 2082.422, 285.72693]
2026-01-23 02:31:22,194 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 302.0, 302.0, 1000.0, 182.0, 175.0, 103.0, 684.0, 1000.0, 142.0]
2026-01-23 02:31:22,205 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 9 seconds)
2026-01-23 02:32:52,760 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:32:58,364 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1049.37610 ± 701.834
2026-01-23 02:32:58,364 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [586.3165, 1977.1454, 1256.3441, 1676.6649, 1885.2651, 655.43097, 1771.5829, 258.74545, 126.50388, 299.76215]
2026-01-23 02:32:58,364 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [393.0, 939.0, 1000.0, 688.0, 903.0, 296.0, 826.0, 114.0, 88.0, 128.0]
2026-01-23 02:32:58,377 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 36 seconds)
2026-01-23 02:34:33,129 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:34:35,826 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 429.24982 ± 323.672
2026-01-23 02:34:35,826 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [178.63461, 88.47865, 581.8393, 938.6797, 21.739382, 422.893, 308.02448, 1050.3347, 410.45728, 291.4172]
2026-01-23 02:34:35,826 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [81.0, 48.0, 247.0, 1000.0, 17.0, 200.0, 145.0, 515.0, 219.0, 103.0]
2026-01-23 02:34:35,841 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1299 [DEBUG]: Training session finished
