2026-01-22 23:52:06,522 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-ant/DatasetOffice-sac-aug-mem1  
2026-01-22 23:52:06,522 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-ant/DatasetOffice-sac-aug-mem1  
2026-01-22 23:52:06,522 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x14a983b95690>}
2026-01-22 23:52:06,522 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1159 [DEBUG]: using device: cuda
2026-01-22 23:52:06,522 baseline-sac-noisy-ant:77 [WARNING]: args.memorize_actions != args.horizon: 1 != 32
2026-01-22 23:52:06,663 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1181 [INFO]: Creating new trainer
2026-01-22 23:52:06,681 baseline-sac-noisy-ant:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=35, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2026-01-22 23:52:06,681 baseline-sac-noisy-ant:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=43, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-22 23:52:07,439 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1242 [DEBUG]: Starting training session...
2026-01-22 23:52:07,439 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 1/100
2026-01-22 23:53:35,680 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:53:39,403 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: -647.92938 ± 569.648
2026-01-22 23:53:39,403 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [-251.90173, -1022.4735, -548.1652, -454.59433, -2176.7358, -632.2374, -673.0703, -97.4657, -183.27254, -439.37723]
2026-01-22 23:53:39,403 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [138.0, 606.0, 307.0, 283.0, 1000.0, 368.0, 392.0, 60.0, 107.0, 288.0]
2026-01-22 23:53:39,403 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (-647.93) for latency DatasetOffice
2026-01-22 23:53:39,408 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 31 minutes, 44 seconds)
2026-01-22 23:55:08,656 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:55:11,680 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: -87.64396 ± 124.212
2026-01-22 23:55:11,680 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [-8.6138525, -68.11147, -45.597775, -34.105064, 19.753408, -320.93463, -31.388578, -32.709793, -342.8687, -11.863085]
2026-01-22 23:55:11,680 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [30.0, 192.0, 169.0, 108.0, 140.0, 1000.0, 115.0, 68.0, 1000.0, 12.0]
2026-01-22 23:55:11,680 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (-87.64) for latency DatasetOffice
2026-01-22 23:55:11,685 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 30 minutes, 28 seconds)
2026-01-22 23:56:48,594 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:56:53,745 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 58.98851 ± 39.822
2026-01-22 23:56:53,745 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [44.45656, 23.215122, 13.68016, 66.369865, 135.71526, 85.24848, 46.876152, 100.58831, -0.7320726, 74.46722]
2026-01-22 23:56:53,745 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [239.0, 142.0, 28.0, 144.0, 1000.0, 1000.0, 105.0, 1000.0, 102.0, 1000.0]
2026-01-22 23:56:53,745 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (58.99) for latency DatasetOffice
2026-01-22 23:56:53,748 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 34 minutes, 17 seconds)
2026-01-22 23:58:21,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:58:28,814 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 234.10037 ± 144.667
2026-01-22 23:58:28,815 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [402.88174, 253.2933, 237.80013, 114.87939, 331.17313, 11.068501, 19.817701, 236.08789, 480.55643, 253.44557]
2026-01-22 23:58:28,815 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 173.0, 1000.0, 27.0, 35.0, 1000.0, 1000.0, 557.0]
2026-01-22 23:58:28,815 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (234.10) for latency DatasetOffice
2026-01-22 23:58:28,817 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 32 minutes, 33 seconds)
2026-01-23 00:00:04,320 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:00:08,055 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 140.40027 ± 122.812
2026-01-23 00:00:08,055 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [263.72733, 259.29694, 23.20858, 65.80469, 17.00743, 168.10728, 62.666878, 88.79406, 404.1086, 51.280865]
2026-01-23 00:00:08,055 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 566.0, 58.0, 81.0, 27.0, 394.0, 174.0, 135.0, 1000.0, 76.0]
2026-01-23 00:00:08,058 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 32 minutes, 11 seconds)
2026-01-23 00:01:40,109 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:01:45,385 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 205.43954 ± 166.160
2026-01-23 00:01:45,385 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [346.63425, 147.75557, 128.1062, 21.492174, 21.437647, 370.05835, 388.94836, 490.25955, 85.46093, 54.24245]
2026-01-23 00:01:45,385 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 429.0, 238.0, 22.0, 49.0, 1000.0, 1000.0, 1000.0, 119.0, 97.0]
2026-01-23 00:01:45,389 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 32 minutes, 16 seconds)
2026-01-23 00:03:12,945 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:03:21,369 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 332.01016 ± 157.067
2026-01-23 00:03:21,370 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [329.3057, 624.2196, 417.87894, 369.1104, 39.056717, 295.3598, 452.4537, 409.73172, 243.22586, 139.75891]
2026-01-23 00:03:21,370 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 67.0, 1000.0, 1000.0, 1000.0, 350.0, 215.0]
2026-01-23 00:03:21,370 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (332.01) for latency DatasetOffice
2026-01-23 00:03:21,372 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 31 minutes, 48 seconds)
2026-01-23 00:04:52,362 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:05:00,692 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 489.79816 ± 225.408
2026-01-23 00:05:00,692 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [647.1798, 694.40265, 726.58716, 568.76086, 129.76039, 611.5886, 164.57263, 369.15375, 253.87985, 732.0958]
2026-01-23 00:05:00,692 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 129.0, 1000.0, 226.0, 1000.0, 251.0, 1000.0]
2026-01-23 00:05:00,692 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (489.80) for latency DatasetOffice
2026-01-23 00:05:00,696 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 29 minutes, 19 seconds)
2026-01-23 00:06:30,536 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:06:37,880 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 519.65411 ± 295.182
2026-01-23 00:06:37,880 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [686.4484, 405.642, 740.1795, 809.0194, 217.87608, 14.661596, 609.7486, 776.2546, 99.01307, 837.6972]
2026-01-23 00:06:37,880 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 565.0, 939.0, 1000.0, 223.0, 35.0, 821.0, 1000.0, 147.0, 1000.0]
2026-01-23 00:06:37,880 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (519.65) for latency DatasetOffice
2026-01-23 00:06:37,885 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 28 minutes, 21 seconds)
2026-01-23 00:08:17,203 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:08:23,680 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 429.23584 ± 303.065
2026-01-23 00:08:23,680 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [612.03735, 275.4519, 791.71594, 16.538706, 750.8963, 838.7589, 569.2681, 33.044678, 135.32626, 269.32025]
2026-01-23 00:08:23,680 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 377.0, 1000.0, 29.0, 1000.0, 1000.0, 1000.0, 131.0, 197.0, 322.0]
2026-01-23 00:08:23,683 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 28 minutes, 41 seconds)
2026-01-23 00:09:52,440 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:09:58,603 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 445.34088 ± 278.098
2026-01-23 00:09:58,603 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [682.84534, 827.6009, 89.990234, 232.90558, 276.64243, 521.63574, 559.0405, 865.92413, 35.00346, 361.82077]
2026-01-23 00:09:58,603 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 93.0, 281.0, 310.0, 1000.0, 639.0, 1000.0, 35.0, 378.0]
2026-01-23 00:09:58,606 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 26 minutes, 19 seconds)
2026-01-23 00:11:26,657 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:11:32,342 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 415.26581 ± 306.028
2026-01-23 00:11:32,342 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [111.605194, 745.86615, 54.28582, 50.06367, 224.83481, 880.9649, 817.126, 295.70804, 613.35254, 358.85107]
2026-01-23 00:11:32,342 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [168.0, 1000.0, 49.0, 47.0, 285.0, 1000.0, 1000.0, 296.0, 1000.0, 327.0]
2026-01-23 00:11:32,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 24 minutes, 1 second)
2026-01-23 00:13:08,685 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:13:18,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 650.27734 ± 155.079
2026-01-23 00:13:18,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [864.6653, 477.429, 472.16324, 478.12167, 652.27045, 565.0155, 952.66833, 667.19867, 657.8897, 715.35156]
2026-01-23 00:13:18,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 524.0, 649.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:13:18,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (650.28) for latency DatasetOffice
2026-01-23 00:13:18,994 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 24 minutes, 30 seconds)
2026-01-23 00:14:53,533 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:15:01,754 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 461.03555 ± 240.483
2026-01-23 00:15:01,754 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [618.064, 592.6171, 575.61975, 670.8275, 24.31705, 612.2421, 692.7292, 286.74594, 36.761066, 500.43164]
2026-01-23 00:15:01,754 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 19.0, 1000.0, 1000.0, 283.0, 41.0, 1000.0]
2026-01-23 00:15:01,758 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 24 minutes, 26 seconds)
2026-01-23 00:16:26,491 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:16:34,394 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 605.06885 ± 352.280
2026-01-23 00:16:34,395 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [593.7638, 238.59619, 741.48627, 42.815582, 964.4429, 759.60645, 1176.1945, 755.4953, 686.4271, 91.860374]
2026-01-23 00:16:34,395 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 205.0, 1000.0, 32.0, 1000.0, 764.0, 1000.0, 1000.0, 1000.0, 77.0]
2026-01-23 00:16:34,398 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 19 minutes, 2 seconds)
2026-01-23 00:18:09,059 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:18:18,878 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 711.26190 ± 243.018
2026-01-23 00:18:18,879 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1296.0676, 617.37885, 444.53973, 773.14154, 956.1297, 707.5295, 470.4305, 728.1428, 576.26715, 542.99176]
2026-01-23 00:18:18,879 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 363.0, 1000.0, 1000.0, 1000.0, 386.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:18:18,879 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (711.26) for latency DatasetOffice
2026-01-23 00:18:18,882 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 20 minutes, 4 seconds)
2026-01-23 00:19:55,829 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:20:03,060 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 511.85474 ± 287.074
2026-01-23 00:20:03,060 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [628.03766, 755.4182, 798.2909, 367.59045, 145.15977, 150.1963, 911.2806, 341.4451, 198.64336, 822.4849]
2026-01-23 00:20:03,061 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [890.0, 1000.0, 1000.0, 317.0, 152.0, 1000.0, 779.0, 275.0, 193.0, 1000.0]
2026-01-23 00:20:03,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 21 minutes, 17 seconds)
2026-01-23 00:21:30,463 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:21:37,590 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 557.44073 ± 317.284
2026-01-23 00:21:37,590 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1074.1984, 51.632908, 714.20703, 785.3407, 423.1245, 465.84753, 80.99372, 719.4026, 871.8578, 387.80237]
2026-01-23 00:21:37,590 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 61.0, 1000.0, 645.0, 310.0, 1000.0, 83.0, 1000.0, 1000.0, 301.0]
2026-01-23 00:21:37,594 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 16 minutes, 17 seconds)
2026-01-23 00:23:11,715 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:23:21,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 926.65381 ± 334.116
2026-01-23 00:23:21,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [886.1864, 668.528, 910.4982, 140.43321, 1240.8787, 770.2662, 1173.2621, 1072.5419, 1011.2235, 1392.7198]
2026-01-23 00:23:21,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 127.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:23:21,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (926.65) for latency DatasetOffice
2026-01-23 00:23:21,995 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 15 minutes, 3 seconds)
2026-01-23 00:24:46,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:24:54,577 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 695.71619 ± 412.462
2026-01-23 00:24:54,577 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [668.34393, 130.54921, 283.26724, 768.7675, 1367.3934, 831.45447, 52.661385, 940.9774, 705.8977, 1207.8495]
2026-01-23 00:24:54,577 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 100.0, 201.0, 1000.0, 1000.0, 606.0, 57.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:24:54,584 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 13 minutes, 22 seconds)
2026-01-23 00:26:29,543 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:26:37,061 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 817.10168 ± 406.483
2026-01-23 00:26:37,061 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [778.06476, 663.47174, 1352.3041, 1440.249, 34.570675, 1221.7238, 629.35254, 924.67163, 581.7905, 544.8176]
2026-01-23 00:26:37,061 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [599.0, 475.0, 1000.0, 1000.0, 24.0, 922.0, 1000.0, 1000.0, 398.0, 446.0]
2026-01-23 00:26:37,065 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 11 minutes, 11 seconds)
2026-01-23 00:28:07,711 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:28:15,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 859.88983 ± 495.149
2026-01-23 00:28:15,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [846.0916, 973.2895, 1532.59, 532.55396, 1394.1635, 725.0763, 30.975573, 129.60637, 1440.34, 994.2115]
2026-01-23 00:28:15,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 652.0, 1000.0, 336.0, 1000.0, 1000.0, 32.0, 105.0, 1000.0, 1000.0]
2026-01-23 00:28:15,634 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 8 minutes, 4 seconds)
2026-01-23 00:29:53,875 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:30:02,205 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1001.16931 ± 432.975
2026-01-23 00:30:02,206 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [954.9972, 1673.2156, 781.9915, 1144.7799, 1527.2601, 223.77509, 1145.9813, 363.72485, 1209.9287, 986.0395]
2026-01-23 00:30:02,206 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 491.0, 1000.0, 1000.0, 169.0, 1000.0, 253.0, 1000.0, 641.0]
2026-01-23 00:30:02,206 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (1001.17) for latency DatasetOffice
2026-01-23 00:30:02,211 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 9 minutes, 31 seconds)
2026-01-23 00:31:30,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:31:39,241 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 984.35089 ± 478.364
2026-01-23 00:31:39,241 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1231.2042, 688.0383, 1531.3776, 1369.0425, 1224.287, 40.786842, 951.72125, 1458.5448, 1078.1266, 270.3799]
2026-01-23 00:31:39,241 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 889.0, 1000.0, 36.0, 1000.0, 1000.0, 1000.0, 183.0]
2026-01-23 00:31:39,246 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 5 minutes, 58 seconds)
2026-01-23 00:33:13,404 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:33:24,259 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1426.67749 ± 190.168
2026-01-23 00:33:24,259 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1495.6387, 1521.0176, 1439.3324, 1095.6027, 1029.5596, 1549.8607, 1557.3872, 1620.7849, 1529.4938, 1428.0981]
2026-01-23 00:33:24,259 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:33:24,259 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (1426.68) for latency DatasetOffice
2026-01-23 00:33:24,264 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 7 minutes, 25 seconds)
2026-01-23 00:34:55,852 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:35:03,713 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 691.30286 ± 385.956
2026-01-23 00:35:03,713 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [703.79193, 15.625733, 789.55505, 962.50574, 603.2126, 754.73676, 1311.6837, 36.98304, 690.1839, 1044.75]
2026-01-23 00:35:03,713 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 16.0, 1000.0, 1000.0, 336.0, 1000.0, 791.0, 36.0, 1000.0, 1000.0]
2026-01-23 00:35:03,719 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 4 minutes, 58 seconds)
2026-01-23 00:36:29,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:36:37,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1080.85693 ± 604.676
2026-01-23 00:36:37,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1926.2057, 1318.7699, 168.19235, 1070.1377, 517.4662, 1818.1741, 1396.4448, 803.3897, 1582.371, 207.41827]
2026-01-23 00:36:37,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 790.0, 88.0, 1000.0, 377.0, 1000.0, 1000.0, 1000.0, 1000.0, 122.0]
2026-01-23 00:36:37,812 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 2 minutes, 11 seconds)
2026-01-23 00:38:11,598 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:38:20,927 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1385.50452 ± 558.933
2026-01-23 00:38:20,927 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1898.3164, 1660.7863, 1579.5117, 15.943262, 968.09247, 1945.1613, 1106.3341, 1200.1068, 1639.6224, 1841.1714]
2026-01-23 00:38:20,927 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 28.0, 571.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:38:20,932 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 29/100 (estimated time remaining: 1 hour, 59 minutes, 41 seconds)
2026-01-23 00:39:51,505 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:39:59,665 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1065.70178 ± 492.156
2026-01-23 00:39:59,665 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1255.5098, 84.80345, 1332.6079, 1794.9938, 998.60376, 1836.0726, 732.7205, 889.1289, 840.5632, 892.0141]
2026-01-23 00:39:59,665 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 42.0, 1000.0, 1000.0, 549.0, 1000.0, 1000.0, 1000.0, 479.0, 471.0]
2026-01-23 00:39:59,671 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 30/100 (estimated time remaining: 1 hour, 58 minutes, 26 seconds)
2026-01-23 00:41:36,042 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:41:39,669 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 602.75018 ± 421.086
2026-01-23 00:41:39,669 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [489.62222, 455.73672, 1722.4474, 28.551355, 714.5982, 670.72614, 460.05905, 263.0796, 577.55383, 645.12787]
2026-01-23 00:41:39,669 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [277.0, 268.0, 1000.0, 21.0, 429.0, 349.0, 286.0, 166.0, 300.0, 411.0]
2026-01-23 00:41:39,676 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 55 minutes, 35 seconds)
2026-01-23 00:43:09,344 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:43:15,561 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 921.19904 ± 718.152
2026-01-23 00:43:15,562 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [471.56735, 1726.4097, 62.310135, 76.206474, 159.0095, 737.08856, 1848.6221, 702.02356, 1906.6549, 1522.0978]
2026-01-23 00:43:15,562 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [277.0, 882.0, 37.0, 46.0, 93.0, 1000.0, 1000.0, 388.0, 1000.0, 1000.0]
2026-01-23 00:43:15,566 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 53 minutes, 7 seconds)
2026-01-23 00:44:47,671 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:44:56,583 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1197.01160 ± 694.839
2026-01-23 00:44:56,583 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1768.2296, 656.8808, 1899.9551, 1558.7281, 2022.7928, 850.80316, 289.18066, 920.21545, 51.363705, 1951.9679]
2026-01-23 00:44:56,583 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 141.0, 1000.0, 45.0, 1000.0]
2026-01-23 00:44:56,591 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 53 minutes, 3 seconds)
2026-01-23 00:46:25,463 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:46:33,455 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1046.45972 ± 691.742
2026-01-23 00:46:33,455 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1778.0864, 785.63837, 1999.536, 1192.6876, 887.3103, 441.09937, 1909.019, 8.176909, 83.516884, 1379.5277]
2026-01-23 00:46:33,456 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 951.0, 1000.0, 1000.0, 234.0, 1000.0, 20.0, 51.0, 1000.0]
2026-01-23 00:46:33,462 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 49 minutes, 59 seconds)
2026-01-23 00:48:05,260 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:48:15,483 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1380.09155 ± 454.327
2026-01-23 00:48:15,484 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1814.9038, 643.8675, 898.2973, 1591.5587, 1802.4491, 1519.4369, 677.7596, 1544.9111, 1347.226, 1960.5059]
2026-01-23 00:48:15,484 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 387.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:48:15,491 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 49 minutes, 4 seconds)
2026-01-23 00:49:40,733 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:49:47,672 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1092.39062 ± 697.718
2026-01-23 00:49:47,672 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1887.9753, 788.50714, 763.69525, 1876.7797, 183.59917, 99.084335, 1517.4169, 2051.0205, 1333.6083, 422.2193]
2026-01-23 00:49:47,672 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [996.0, 1000.0, 386.0, 1000.0, 103.0, 54.0, 1000.0, 1000.0, 641.0, 253.0]
2026-01-23 00:49:47,679 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 45 minutes, 44 seconds)
2026-01-23 00:51:23,034 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:51:33,191 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1276.96021 ± 493.278
2026-01-23 00:51:33,192 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1874.8253, 687.25507, 849.7616, 2066.8142, 1684.8042, 1039.8258, 628.4101, 1735.8237, 1155.9382, 1046.1443]
2026-01-23 00:51:33,192 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 401.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 868.0, 1000.0, 1000.0]
2026-01-23 00:51:33,199 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 46 minutes, 9 seconds)
2026-01-23 00:53:03,374 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:53:12,433 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1563.27173 ± 685.743
2026-01-23 00:53:12,433 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [2202.74, 910.6141, 272.46176, 2130.1257, 563.4176, 2007.1549, 1455.6105, 1923.8276, 2064.9663, 2101.7993]
2026-01-23 00:53:12,433 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 159.0, 1000.0, 303.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:53:12,433 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (1563.27) for latency DatasetOffice
2026-01-23 00:53:12,438 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 44 minutes, 7 seconds)
2026-01-23 00:54:38,926 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:54:46,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1029.66187 ± 481.976
2026-01-23 00:54:46,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [868.12775, 1405.6125, 2053.4167, 800.36395, 552.4876, 1215.0186, 328.30902, 723.7242, 887.3272, 1462.2306]
2026-01-23 00:54:46,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [455.0, 1000.0, 1000.0, 1000.0, 306.0, 1000.0, 161.0, 364.0, 1000.0, 1000.0]
2026-01-23 00:54:46,804 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 41 minutes, 57 seconds)
2026-01-23 00:56:22,603 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:56:30,688 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1522.78882 ± 699.818
2026-01-23 00:56:30,688 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [940.0551, 2007.0087, 2085.5076, 2048.7083, 1928.724, 9.878736, 1700.303, 2007.2505, 593.83923, 1906.6129]
2026-01-23 00:56:30,688 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [465.0, 961.0, 1000.0, 1000.0, 1000.0, 17.0, 799.0, 1000.0, 312.0, 1000.0]
2026-01-23 00:56:30,693 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 40 minutes, 41 seconds)
2026-01-23 00:58:01,375 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:58:10,328 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1544.82349 ± 682.949
2026-01-23 00:58:10,328 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1947.0206, 645.4856, 1921.0536, 84.03042, 2078.7058, 1928.0864, 908.6933, 2054.2417, 1873.3451, 2007.5719]
2026-01-23 00:58:10,328 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 338.0, 1000.0, 47.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:58:10,334 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 40 minutes, 31 seconds)
2026-01-23 00:59:36,993 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:59:45,861 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1233.72192 ± 575.942
2026-01-23 00:59:45,862 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1437.0757, 1205.7269, 1995.9836, 197.18674, 1413.797, 1931.784, 483.41083, 1298.4453, 1700.8354, 672.9732]
2026-01-23 00:59:45,862 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 564.0, 1000.0, 102.0, 699.0, 1000.0, 1000.0, 1000.0, 908.0, 1000.0]
2026-01-23 00:59:45,867 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 36 minutes, 53 seconds)
2026-01-23 01:01:10,681 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:01:20,003 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1635.76233 ± 303.554
2026-01-23 01:01:20,004 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1094.4792, 1442.7926, 1895.8945, 2113.1357, 1981.9628, 1666.2777, 1458.5183, 1503.1887, 1352.2578, 1849.115]
2026-01-23 01:01:20,004 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [510.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 647.0, 647.0, 1000.0]
2026-01-23 01:01:20,004 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (1635.76) for latency DatasetOffice
2026-01-23 01:01:20,010 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 34 minutes, 15 seconds)
2026-01-23 01:02:52,294 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:03:02,444 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1922.30432 ± 502.959
2026-01-23 01:03:02,444 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [879.24603, 2160.653, 1920.301, 2182.8123, 2399.336, 2258.281, 2143.2886, 1007.62805, 2134.3076, 2137.1885]
2026-01-23 01:03:02,444 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 530.0, 1000.0, 1000.0]
2026-01-23 01:03:02,444 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (1922.30) for latency DatasetOffice
2026-01-23 01:03:02,450 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 34 minutes, 10 seconds)
2026-01-23 01:04:32,541 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:04:39,127 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1228.80835 ± 755.910
2026-01-23 01:04:39,127 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1699.8325, 1211.5074, 2154.693, 303.54114, 1707.4178, 2043.0181, 43.695053, 738.377, 405.95917, 1980.0426]
2026-01-23 01:04:39,127 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [751.0, 558.0, 1000.0, 186.0, 1000.0, 1000.0, 58.0, 404.0, 239.0, 1000.0]
2026-01-23 01:04:39,134 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 31 minutes, 10 seconds)
2026-01-23 01:06:09,387 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:06:19,286 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1695.45337 ± 422.029
2026-01-23 01:06:19,286 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1114.4631, 2073.5613, 1228.9312, 2065.7502, 1543.5802, 1129.6533, 2125.2188, 1443.0109, 2084.8406, 2145.5242]
2026-01-23 01:06:19,286 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [508.0, 1000.0, 617.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:06:19,294 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 29 minutes, 38 seconds)
2026-01-23 01:07:53,001 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:08:00,223 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1574.55591 ± 864.875
2026-01-23 01:08:00,223 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [2071.0986, 2472.967, 239.13118, 324.75772, 2426.2493, 2375.9836, 2360.47, 1344.3663, 553.4261, 1577.1106]
2026-01-23 01:08:00,223 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 113.0, 149.0, 1000.0, 1000.0, 1000.0, 590.0, 234.0, 674.0]
2026-01-23 01:08:00,229 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 28 minutes, 59 seconds)
2026-01-23 01:09:23,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:09:34,086 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1888.96936 ± 487.514
2026-01-23 01:09:34,086 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [2097.4294, 2144.9395, 863.77606, 2310.8608, 2195.8694, 2267.3813, 1739.9237, 2081.0464, 1061.0946, 2127.373]
2026-01-23 01:09:34,086 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 716.0, 924.0, 1000.0, 1000.0]
2026-01-23 01:09:34,093 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 27 minutes, 17 seconds)
2026-01-23 01:11:05,084 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:11:12,483 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1316.79041 ± 808.381
2026-01-23 01:11:12,483 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [187.34477, 2340.993, 1777.8713, 875.3064, 942.7906, 2347.6423, 233.52467, 2136.4548, 1757.7571, 568.2193]
2026-01-23 01:11:12,483 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [110.0, 1000.0, 1000.0, 375.0, 1000.0, 1000.0, 121.0, 1000.0, 1000.0, 258.0]
2026-01-23 01:11:12,490 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 24 minutes, 56 seconds)
2026-01-23 01:12:39,383 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:12:48,125 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1735.02405 ± 889.070
2026-01-23 01:12:48,125 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [52.62083, 30.155628, 2275.7236, 2465.7126, 1716.2351, 2410.154, 1789.6746, 1870.9761, 2198.5957, 2540.3918]
2026-01-23 01:12:48,125 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [40.0, 27.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:12:48,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 23 minutes, 7 seconds)
2026-01-23 01:14:19,507 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:14:30,338 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1639.17969 ± 632.538
2026-01-23 01:14:30,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [747.3574, 1057.6069, 2291.0837, 1071.1294, 792.863, 2190.0295, 2340.6614, 1531.5719, 2208.985, 2160.5085]
2026-01-23 01:14:30,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:14:30,346 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 21 minutes, 50 seconds)
2026-01-23 01:16:03,264 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:16:12,775 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1879.13281 ± 839.783
2026-01-23 01:16:12,776 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1667.0706, 2321.7239, 937.3761, 2481.7566, 98.03943, 2539.687, 1100.0835, 2576.2861, 2549.6948, 2519.6091]
2026-01-23 01:16:12,776 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [716.0, 1000.0, 1000.0, 1000.0, 53.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:16:12,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 20 minutes, 27 seconds)
2026-01-23 01:17:44,723 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:17:54,746 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 2194.44678 ± 654.148
2026-01-23 01:17:54,746 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [2588.369, 2720.2969, 2654.7974, 2653.4753, 2534.1836, 2426.9543, 2469.5364, 1801.9084, 603.3756, 1491.5742]
2026-01-23 01:17:54,746 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 258.0, 1000.0]
2026-01-23 01:17:54,746 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (2194.45) for latency DatasetOffice
2026-01-23 01:17:54,753 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 20 minutes, 6 seconds)
2026-01-23 01:19:24,247 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:19:33,485 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1976.04272 ± 768.332
2026-01-23 01:19:33,486 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [2584.2834, 2442.1162, 1334.4023, 387.12494, 2333.7212, 2778.789, 1594.6552, 2703.731, 2460.7515, 1140.854]
2026-01-23 01:19:33,486 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 200.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 521.0]
2026-01-23 01:19:33,493 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 18 minutes, 29 seconds)
2026-01-23 01:21:03,262 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:21:12,994 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 2149.38770 ± 554.158
2026-01-23 01:21:12,994 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [2527.438, 2660.4395, 2473.1238, 977.2871, 2533.5374, 2576.565, 1631.35, 1854.6625, 1628.6655, 2630.8079]
2026-01-23 01:21:12,994 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 420.0, 1000.0, 1000.0, 680.0, 938.0, 1000.0, 1000.0]
2026-01-23 01:21:13,002 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 17 minutes, 24 seconds)
2026-01-23 01:22:43,402 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:22:52,033 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 2049.41113 ± 808.465
2026-01-23 01:22:52,033 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [2627.3354, 2711.7822, 179.72098, 2442.9182, 2696.8865, 1193.8865, 2391.5708, 2497.8394, 2416.1982, 1335.9711]
2026-01-23 01:22:52,033 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 91.0, 1000.0, 1000.0, 514.0, 1000.0, 1000.0, 1000.0, 473.0]
2026-01-23 01:22:52,042 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 15 minutes, 15 seconds)
2026-01-23 01:24:21,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:24:30,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 2094.57007 ± 851.919
2026-01-23 01:24:30,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [868.2055, 2445.578, 2776.182, 2677.6125, 2846.467, 2492.2366, 1128.2793, 2583.8906, 2651.168, 476.08145]
2026-01-23 01:24:30,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [341.0, 1000.0, 1000.0, 1000.0, 1000.0, 909.0, 431.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:24:30,762 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 13 minutes, 2 seconds)
2026-01-23 01:25:54,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:26:03,967 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 2128.77246 ± 770.071
2026-01-23 01:26:03,969 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [2701.9717, 3028.2075, 1530.8633, 2849.4834, 1193.9093, 2186.5264, 817.56085, 1431.2833, 2777.902, 2770.017]
2026-01-23 01:26:03,969 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 273.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:26:03,975 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 10 minutes, 7 seconds)
2026-01-23 01:27:41,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:27:51,226 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 2717.22095 ± 457.113
2026-01-23 01:27:51,226 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1365.5337, 2744.5696, 2947.1843, 2895.6912, 2796.348, 2821.571, 2836.9011, 3037.9163, 2865.7385, 2860.7566]
2026-01-23 01:27:51,226 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [480.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:27:51,226 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (2717.22) for latency DatasetOffice
2026-01-23 01:27:51,234 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 9 minutes, 41 seconds)
2026-01-23 01:29:15,579 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:29:25,843 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1975.17029 ± 793.282
2026-01-23 01:29:25,844 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [2800.2947, 2271.8203, 3012.7432, 2104.8145, 2903.5798, 1330.4874, 1421.7844, 427.85977, 1326.0052, 2152.3123]
2026-01-23 01:29:25,844 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 691.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 701.0]
2026-01-23 01:29:25,853 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 7 minutes, 21 seconds)
2026-01-23 01:30:59,445 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:31:08,678 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 2324.78052 ± 875.107
2026-01-23 01:31:08,678 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [3091.8716, 3054.9092, 2918.9204, 2634.9377, 3132.1172, 1062.6818, 965.0032, 2780.2969, 2591.8318, 1015.2336]
2026-01-23 01:31:08,678 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 309.0, 1000.0, 1000.0, 352.0]
2026-01-23 01:31:08,687 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 6 minutes, 13 seconds)
2026-01-23 01:32:32,656 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:32:40,310 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1956.57715 ± 937.420
2026-01-23 01:32:40,310 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [2505.8608, 3004.4111, 2641.3533, 1230.7675, 2358.1697, 781.7388, 1421.9352, 2878.739, 2602.162, 140.63269]
2026-01-23 01:32:40,310 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [948.0, 1000.0, 1000.0, 430.0, 1000.0, 262.0, 530.0, 1000.0, 1000.0, 79.0]
2026-01-23 01:32:40,317 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 3 minutes, 38 seconds)
2026-01-23 01:34:08,399 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:34:18,266 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 2383.79175 ± 907.867
2026-01-23 01:34:18,266 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [2603.9626, 2728.739, 2436.1948, 1161.0062, 2648.0925, 2912.2034, 181.91696, 2953.0344, 3181.0967, 3031.6729]
2026-01-23 01:34:18,266 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 90.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:34:18,282 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 2 minutes, 36 seconds)
2026-01-23 01:35:52,541 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:35:58,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1565.69177 ± 1109.827
2026-01-23 01:35:58,654 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [180.23532, 1349.9369, 2917.0913, 3032.1892, 2164.3193, 1687.9397, 600.8119, 740.30853, 2971.54, 12.546096]
2026-01-23 01:35:58,654 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [103.0, 525.0, 1000.0, 1000.0, 737.0, 1000.0, 221.0, 223.0, 1000.0, 29.0]
2026-01-23 01:35:58,662 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 6 seconds)
2026-01-23 01:37:23,718 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:37:33,067 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 2727.51294 ± 751.786
2026-01-23 01:37:33,068 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [733.46844, 2895.5176, 3186.2297, 3183.2693, 3080.1086, 3062.3608, 1950.0858, 3201.0437, 3102.839, 2880.203]
2026-01-23 01:37:33,068 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [292.0, 968.0, 1000.0, 1000.0, 1000.0, 1000.0, 657.0, 1000.0, 1000.0, 956.0]
2026-01-23 01:37:33,068 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (2727.51) for latency DatasetOffice
2026-01-23 01:37:33,077 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 65/100 (estimated time remaining: 58 minutes, 28 seconds)
2026-01-23 01:39:09,327 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:39:15,558 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1463.99146 ± 1082.643
2026-01-23 01:39:15,559 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [602.6917, 80.11531, 713.0039, 845.5405, 2937.4944, 2185.8354, 193.46312, 1738.1776, 2032.0792, 3311.5125]
2026-01-23 01:39:15,559 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [200.0, 37.0, 217.0, 1000.0, 1000.0, 704.0, 234.0, 766.0, 712.0, 1000.0]
2026-01-23 01:39:15,566 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 66/100 (estimated time remaining: 56 minutes, 48 seconds)
2026-01-23 01:40:47,497 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:40:57,344 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 2461.98364 ± 996.382
2026-01-23 01:40:57,345 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [3068.264, 2764.2432, 2177.956, 469.4175, 3156.2798, 2994.303, 2981.8313, 3258.2432, 638.07416, 3111.2224]
2026-01-23 01:40:57,345 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 190.0, 1000.0]
2026-01-23 01:40:57,355 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 67/100 (estimated time remaining: 56 minutes, 19 seconds)
2026-01-23 01:42:21,296 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:42:26,273 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1418.44055 ± 1250.772
2026-01-23 01:42:26,273 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [185.99367, 1131.34, 3047.4856, 104.332146, 209.08897, 278.51163, 2302.1455, 3149.9412, 730.04193, 3045.5244]
2026-01-23 01:42:26,273 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [67.0, 351.0, 1000.0, 45.0, 120.0, 128.0, 795.0, 1000.0, 217.0, 1000.0]
2026-01-23 01:42:26,280 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 68/100 (estimated time remaining: 53 minutes, 40 seconds)
2026-01-23 01:44:01,074 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:44:11,769 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 2612.67383 ± 851.727
2026-01-23 01:44:11,769 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [2343.6672, 3194.4382, 302.55283, 2898.069, 3102.9263, 3026.1755, 2492.7876, 2241.7036, 3358.0002, 3166.418]
2026-01-23 01:44:11,769 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:44:11,778 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 69/100 (estimated time remaining: 52 minutes, 35 seconds)
2026-01-23 01:45:43,239 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:45:50,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 2175.51514 ± 1307.251
2026-01-23 01:45:50,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [2806.0964, 3001.8438, 9.949947, 3311.2004, 161.48267, 3167.5183, 3356.3887, 713.4288, 1881.2517, 3345.9883]
2026-01-23 01:45:50,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [924.0, 1000.0, 17.0, 1000.0, 69.0, 949.0, 1000.0, 241.0, 503.0, 1000.0]
2026-01-23 01:45:50,361 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 70/100 (estimated time remaining: 51 minutes, 23 seconds)
2026-01-23 01:47:13,676 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:47:19,926 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1829.94690 ± 1119.545
2026-01-23 01:47:19,927 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [579.2389, -2.1261759, 2488.2742, 1658.0474, 3134.7969, 327.88495, 2092.4395, 3123.0774, 3033.699, 1864.1361]
2026-01-23 01:47:19,927 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [205.0, 24.0, 818.0, 537.0, 1000.0, 145.0, 734.0, 1000.0, 1000.0, 570.0]
2026-01-23 01:47:19,934 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 71/100 (estimated time remaining: 48 minutes, 26 seconds)
2026-01-23 01:48:58,465 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:49:09,239 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 2741.07959 ± 421.755
2026-01-23 01:49:09,239 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [2037.8937, 2849.5105, 2704.5188, 3152.4016, 2773.101, 3076.824, 2989.159, 1908.681, 2696.145, 3222.5613]
2026-01-23 01:49:09,239 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:49:09,239 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (2741.08) for latency DatasetOffice
2026-01-23 01:49:09,249 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 72/100 (estimated time remaining: 47 minutes, 32 seconds)
2026-01-23 01:50:37,166 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:50:45,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 2130.50049 ± 1195.337
2026-01-23 01:50:45,302 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [3137.3713, 43.548206, 3216.9111, 3280.7485, 1902.5714, 1450.0393, 243.90596, 3042.576, 1704.394, 3282.9373]
2026-01-23 01:50:45,302 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 26.0, 1000.0, 1000.0, 1000.0, 466.0, 88.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:50:45,311 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 73/100 (estimated time remaining: 46 minutes, 34 seconds)
2026-01-23 01:52:10,130 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:52:17,697 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1752.32739 ± 1132.006
2026-01-23 01:52:17,697 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [2940.1194, 2933.322, 1710.6693, 91.896996, 608.10516, 847.84424, 3404.3328, 1214.9303, 868.91876, 2903.1328]
2026-01-23 01:52:17,697 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 41.0, 1000.0, 302.0, 1000.0, 386.0, 319.0, 1000.0]
2026-01-23 01:52:17,706 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 74/100 (estimated time remaining: 43 minutes, 44 seconds)
2026-01-23 01:53:50,595 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:54:00,646 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 2662.51611 ± 832.857
2026-01-23 01:54:00,647 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [2677.9878, 1025.6935, 2606.4102, 3301.1492, 3346.5437, 3483.019, 3288.6953, 2052.4194, 3376.759, 1466.4829]
2026-01-23 01:54:00,647 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 327.0, 1000.0, 1000.0, 997.0, 1000.0, 1000.0, 1000.0, 995.0, 1000.0]
2026-01-23 01:54:00,656 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 75/100 (estimated time remaining: 42 minutes, 29 seconds)
2026-01-23 01:55:33,141 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:55:40,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 2118.32080 ± 1106.104
2026-01-23 01:55:40,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [824.2293, 3091.2024, 688.8719, 1038.696, 2724.4314, 2319.901, 3352.5374, 3074.7856, 703.05023, 3365.5034]
2026-01-23 01:55:40,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [270.0, 1000.0, 211.0, 301.0, 1000.0, 732.0, 1000.0, 958.0, 1000.0, 1000.0]
2026-01-23 01:55:40,996 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 76/100 (estimated time remaining: 41 minutes, 45 seconds)
2026-01-23 01:57:05,624 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:57:16,186 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 3153.22974 ± 349.927
2026-01-23 01:57:16,186 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [3448.064, 3290.0151, 2553.8826, 2469.3208, 3589.3215, 3278.5862, 3396.2048, 3123.1538, 3084.4443, 3299.3044]
2026-01-23 01:57:16,186 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:57:16,186 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (3153.23) for latency DatasetOffice
2026-01-23 01:57:16,194 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 77/100 (estimated time remaining: 38 minutes, 57 seconds)
2026-01-23 01:58:46,003 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:58:55,127 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 2627.05078 ± 932.141
2026-01-23 01:58:55,127 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [3414.6375, 3338.921, 1738.0331, 3334.2295, 3062.8655, 708.01556, 3042.1748, 3002.874, 3286.397, 1342.362]
2026-01-23 01:58:55,127 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 483.0, 1000.0, 1000.0, 226.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:58:55,137 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 78/100 (estimated time remaining: 37 minutes, 33 seconds)
2026-01-23 02:00:28,002 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:00:36,557 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 2561.61475 ± 1108.343
2026-01-23 02:00:36,557 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [3271.2332, 3564.8606, 1951.0863, 1370.093, 224.22916, 3403.686, 3435.7104, 3500.696, 1693.2217, 3201.3318]
2026-01-23 02:00:36,558 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 617.0, 375.0, 91.0, 1000.0, 1000.0, 1000.0, 1000.0, 981.0]
2026-01-23 02:00:36,567 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 79/100 (estimated time remaining: 36 minutes, 34 seconds)
2026-01-23 02:02:14,052 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:02:23,565 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 2582.99683 ± 1061.967
2026-01-23 02:02:23,565 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [3400.7593, 3234.7327, 941.3801, 3510.8088, 3344.808, 449.4823, 2227.9336, 3293.1648, 3358.3599, 2068.5388]
2026-01-23 02:02:23,565 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 311.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 645.0]
2026-01-23 02:02:23,578 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 80/100 (estimated time remaining: 35 minutes, 12 seconds)
2026-01-23 02:03:52,762 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:04:02,740 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 3188.26709 ± 499.410
2026-01-23 02:04:02,740 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [3125.1108, 3679.6597, 3443.3076, 3507.613, 3108.205, 3456.497, 2628.8855, 1962.8514, 3406.8728, 3563.669]
2026-01-23 02:04:02,740 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 823.0, 566.0, 1000.0, 1000.0]
2026-01-23 02:04:02,740 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (3188.27) for latency DatasetOffice
2026-01-23 02:04:02,751 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 81/100 (estimated time remaining: 33 minutes, 27 seconds)
2026-01-23 02:05:33,043 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:05:41,092 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 2556.98364 ± 1233.320
2026-01-23 02:05:41,092 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1341.3905, 3319.202, 3535.483, 3330.912, 3261.2122, 3614.6572, 884.94794, 5.6572747, 3121.836, 3154.5396]
2026-01-23 02:05:41,092 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [442.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 290.0, 17.0, 984.0, 1000.0]
2026-01-23 02:05:41,103 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 82/100 (estimated time remaining: 31 minutes, 58 seconds)
2026-01-23 02:07:06,089 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:07:14,557 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 2226.95850 ± 1259.114
2026-01-23 02:07:14,557 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [3407.748, 3472.2507, 253.70534, 1064.8054, 3259.0098, 2202.7122, 245.40033, 3277.3074, 1716.0907, 3370.555]
2026-01-23 02:07:14,557 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 96.0, 374.0, 1000.0, 1000.0, 1000.0, 1000.0, 494.0, 1000.0]
2026-01-23 02:07:14,569 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 83/100 (estimated time remaining: 29 minutes, 57 seconds)
2026-01-23 02:08:42,894 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:08:51,008 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 2650.55225 ± 1155.196
2026-01-23 02:08:51,009 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [3504.0671, 3620.5798, 2263.3235, 1000.2627, 3399.4346, 2154.1362, 193.15674, 3494.229, 3396.7917, 3479.5386]
2026-01-23 02:08:51,009 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 619.0, 291.0, 1000.0, 706.0, 89.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:08:51,017 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 84/100 (estimated time remaining: 28 minutes, 1 second)
2026-01-23 02:10:23,287 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:10:30,396 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 2253.28613 ± 897.619
2026-01-23 02:10:30,396 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [2039.8917, 1577.962, 3383.596, 2815.5696, 1405.1619, 3165.456, 552.6166, 3439.7898, 1921.773, 2231.0457]
2026-01-23 02:10:30,396 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [717.0, 480.0, 990.0, 811.0, 439.0, 1000.0, 170.0, 1000.0, 636.0, 634.0]
2026-01-23 02:10:30,409 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 85/100 (estimated time remaining: 25 minutes, 57 seconds)
2026-01-23 02:11:59,089 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:12:07,703 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 2756.09131 ± 1180.894
2026-01-23 02:12:07,703 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [3512.97, 3401.7688, 3133.5208, 3398.7908, 3293.757, 3307.094, 859.7174, 5.48201, 3313.2695, 3334.5432]
2026-01-23 02:12:07,703 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 239.0, 17.0, 1000.0, 1000.0]
2026-01-23 02:12:07,712 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 86/100 (estimated time remaining: 24 minutes, 14 seconds)
2026-01-23 02:13:40,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:13:50,523 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 3378.08398 ± 293.607
2026-01-23 02:13:50,523 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [3496.5906, 3461.9326, 3538.4846, 3474.3123, 3361.8386, 3568.349, 3517.6655, 3359.6152, 2518.6611, 3483.3894]
2026-01-23 02:13:50,523 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 776.0, 1000.0]
2026-01-23 02:13:50,523 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (3378.08) for latency DatasetOffice
2026-01-23 02:13:50,533 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 87/100 (estimated time remaining: 22 minutes, 50 seconds)
2026-01-23 02:15:16,504 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:15:24,731 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 2578.12061 ± 1239.374
2026-01-23 02:15:24,732 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [161.12038, 3540.47, 3538.7803, 3657.9568, 3477.807, 3120.69, 607.7388, 1729.8953, 3466.0205, 2480.724]
2026-01-23 02:15:24,732 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [88.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 223.0, 477.0, 1000.0, 1000.0]
2026-01-23 02:15:24,740 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 88/100 (estimated time remaining: 21 minutes, 14 seconds)
2026-01-23 02:16:55,939 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:17:05,035 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 2975.66479 ± 971.615
2026-01-23 02:17:05,035 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [3521.0598, 489.2878, 3711.8704, 3505.9397, 3085.7944, 3318.0996, 1834.8175, 3191.7864, 3584.9448, 3513.047]
2026-01-23 02:17:05,035 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 153.0, 1000.0, 972.0, 999.0, 1000.0, 548.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:17:05,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 89/100 (estimated time remaining: 19 minutes, 45 seconds)
2026-01-23 02:18:37,186 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:18:46,152 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 2666.04053 ± 1215.017
2026-01-23 02:18:46,152 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [3072.943, 959.99756, 3239.3762, 1152.7659, 3535.984, 426.7071, 3610.062, 3407.743, 3684.8713, 3569.9543]
2026-01-23 02:18:46,152 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 287.0, 1000.0, 1000.0, 1000.0, 140.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:18:46,163 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 90/100 (estimated time remaining: 18 minutes, 10 seconds)
2026-01-23 02:20:17,227 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:20:24,325 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 2223.36890 ± 940.792
2026-01-23 02:20:24,325 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [2334.1067, 3363.009, 3487.566, 1794.599, 344.88458, 1657.2546, 3283.0735, 1921.1101, 2578.4915, 1469.5936]
2026-01-23 02:20:24,325 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [639.0, 1000.0, 1000.0, 559.0, 120.0, 431.0, 1000.0, 751.0, 1000.0, 409.0]
2026-01-23 02:20:24,334 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 91/100 (estimated time remaining: 16 minutes, 33 seconds)
2026-01-23 02:21:49,694 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:21:56,484 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 2004.61548 ± 998.069
2026-01-23 02:21:56,484 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [3392.4392, 3203.1458, 1210.485, 1060.4485, 1904.7463, 617.557, 1933.3821, 1711.8464, 1377.0657, 3635.037]
2026-01-23 02:21:56,484 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 343.0, 597.0, 187.0, 525.0, 484.0, 381.0, 1000.0]
2026-01-23 02:21:56,495 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 92/100 (estimated time remaining: 14 minutes, 34 seconds)
2026-01-23 02:23:31,402 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:23:40,383 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 2975.51147 ± 1175.803
2026-01-23 02:23:40,383 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [3486.875, 3727.6777, 3488.4663, 2150.1343, 3419.0393, 3614.7715, 2049.749, 1.0196542, 3878.255, 3939.1262]
2026-01-23 02:23:40,383 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 589.0, 1000.0, 1000.0, 1000.0, 24.0, 1000.0, 1000.0]
2026-01-23 02:23:40,393 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 13 seconds)
2026-01-23 02:25:10,408 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:25:20,586 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 2652.76611 ± 1015.108
2026-01-23 02:25:20,586 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [3476.9758, 789.6907, 3464.1843, 1418.8303, 1610.012, 3256.1663, 2028.7788, 3440.944, 3480.6814, 3561.3984]
2026-01-23 02:25:20,586 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 503.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:25:20,600 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 33 seconds)
2026-01-23 02:26:43,254 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:26:48,760 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1378.30530 ± 1220.673
2026-01-23 02:26:48,760 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [808.06647, 3579.2278, 27.48221, 1031.2583, 919.5179, 3643.2744, 1300.9592, 511.05093, 1809.3029, 152.9124]
2026-01-23 02:26:48,760 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [248.0, 1000.0, 38.0, 1000.0, 287.0, 1000.0, 372.0, 164.0, 1000.0, 73.0]
2026-01-23 02:26:48,770 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 95/100 (estimated time remaining: 9 minutes, 39 seconds)
2026-01-23 02:28:17,937 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:28:27,158 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 2974.60425 ± 1024.102
2026-01-23 02:28:27,159 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [3461.4675, 1769.966, 3137.8257, 3334.954, 3280.2288, 374.605, 3922.8298, 3731.314, 3440.477, 3292.376]
2026-01-23 02:28:27,159 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 516.0, 1000.0, 1000.0, 1000.0, 211.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:28:27,168 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 2 seconds)
2026-01-23 02:29:59,272 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:30:07,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 2633.72412 ± 905.537
2026-01-23 02:30:07,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [2957.254, 2500.606, 2492.6072, 1889.397, 945.03827, 3306.486, 3791.5857, 3497.5315, 1462.4225, 3494.314]
2026-01-23 02:30:07,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 761.0, 571.0, 310.0, 1000.0, 1000.0, 1000.0, 499.0, 937.0]
2026-01-23 02:30:07,793 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 33 seconds)
2026-01-23 02:31:40,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:31:49,298 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 2673.97656 ± 1242.679
2026-01-23 02:31:49,298 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [3711.3306, 3748.9036, 3627.024, 1122.3915, 1990.6263, 3570.3657, 929.79865, 3660.6606, 750.7056, 3627.958]
2026-01-23 02:31:49,298 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 303.0, 542.0, 1000.0, 262.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:31:49,309 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 98/100 (estimated time remaining: 4 minutes, 53 seconds)
2026-01-23 02:33:17,276 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:33:23,533 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1997.58081 ± 1319.963
2026-01-23 02:33:23,533 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [3684.5066, 3862.9846, 276.1159, 1701.433, 1843.0654, 1035.7314, 1782.665, 55.204468, 3806.8457, 1927.2563]
2026-01-23 02:33:23,533 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [920.0, 1000.0, 94.0, 547.0, 537.0, 337.0, 1000.0, 37.0, 1000.0, 508.0]
2026-01-23 02:33:23,551 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 13 seconds)
2026-01-23 02:34:54,159 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:35:01,600 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 2344.34839 ± 1168.324
2026-01-23 02:35:01,600 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1824.1895, 2904.5325, 3444.9473, 2081.308, 789.0199, 27.995926, 3709.7495, 1986.9358, 3203.233, 3471.5752]
2026-01-23 02:35:01,600 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [683.0, 1000.0, 1000.0, 582.0, 236.0, 32.0, 1000.0, 648.0, 1000.0, 1000.0]
2026-01-23 02:35:01,617 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 38 seconds)
2026-01-23 02:36:29,187 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:36:36,273 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 2428.02368 ± 1390.171
2026-01-23 02:36:36,273 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [300.99615, 474.25677, 3527.1904, 603.2135, 3464.819, 3772.6084, 3360.8286, 2797.313, 2008.0977, 3970.9126]
2026-01-23 02:36:36,273 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [99.0, 139.0, 1000.0, 175.0, 1000.0, 1000.0, 1000.0, 762.0, 591.0, 1000.0]
2026-01-23 02:36:36,284 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1299 [DEBUG]: Training session finished
