2026-01-25 17:02:41,849 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-ant/DatasetOffice-sac
2026-01-25 17:02:41,849 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-ant/DatasetOffice-sac
2026-01-25 17:02:41,849 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x145fb2a72ad0>}
2026-01-25 17:02:41,849 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1159 [DEBUG]: using device: cuda
2026-01-25 17:02:41,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1181 [INFO]: Creating new trainer
2026-01-25 17:02:41,994 baseline-sac-noisy-ant:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=27, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2026-01-25 17:02:41,994 baseline-sac-noisy-ant:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-25 17:02:42,683 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1242 [DEBUG]: Starting training session...
2026-01-25 17:02:42,683 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 1/100
2026-01-25 17:04:12,289 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:04:14,791 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: -304.76859 ± 536.366
2026-01-25 17:04:14,791 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [-5.588251, -19.08758, -20.347496, -130.02235, -1403.9886, -11.282429, -91.1988, -1344.6619, -6.4888053, -15.019454]
2026-01-25 17:04:14,791 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [26.0, 29.0, 29.0, 110.0, 1000.0, 17.0, 53.0, 1000.0, 11.0, 15.0]
2026-01-25 17:04:14,792 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (-304.77) for latency DatasetOffice
2026-01-25 17:04:14,796 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 31 minutes, 59 seconds)
2026-01-25 17:05:51,703 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:05:54,377 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: -48.56083 ± 70.163
2026-01-25 17:05:54,377 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [-133.15419, 0.50910676, -226.72841, 8.263056, -25.74651, -7.4341516, -12.363794, -26.802904, -24.583689, -37.56685]
2026-01-25 17:05:54,377 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 17.0, 1000.0, 74.0, 177.0, 26.0, 23.0, 25.0, 58.0, 42.0]
2026-01-25 17:05:54,377 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (-48.56) for latency DatasetOffice
2026-01-25 17:05:54,379 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 36 minutes, 33 seconds)
2026-01-25 17:07:22,891 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:07:25,474 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: -17.20424 ± 40.066
2026-01-25 17:07:25,474 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [-32.558304, -9.951051, -25.496227, -44.511818, 9.120269, 34.972557, -34.037544, 57.290997, -90.145905, -36.725346]
2026-01-25 17:07:25,474 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [442.0, 35.0, 227.0, 187.0, 78.0, 73.0, 158.0, 106.0, 1000.0, 168.0]
2026-01-25 17:07:25,474 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (-17.20) for latency DatasetOffice
2026-01-25 17:07:25,477 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 32 minutes, 23 seconds)
2026-01-25 17:08:58,368 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:08:59,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 4.01266 ± 30.945
2026-01-25 17:08:59,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [-16.638256, 3.4732568, 47.631184, -9.966, -28.93791, 18.60655, 69.422714, -31.673187, -7.358543, -4.433192]
2026-01-25 17:08:59,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [57.0, 60.0, 144.0, 85.0, 187.0, 158.0, 222.0, 161.0, 253.0, 83.0]
2026-01-25 17:08:59,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (4.01) for latency DatasetOffice
2026-01-25 17:08:59,788 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 30 minutes, 50 seconds)
2026-01-25 17:10:35,908 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:10:37,909 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 4.18037 ± 24.438
2026-01-25 17:10:37,910 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [15.501501, -27.106127, 8.227196, 14.4302845, 28.449549, -30.55864, -29.384588, 46.231007, 4.567356, 11.446167]
2026-01-25 17:10:37,910 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [99.0, 109.0, 193.0, 115.0, 40.0, 146.0, 1000.0, 99.0, 61.0, 36.0]
2026-01-25 17:10:37,910 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (4.18) for latency DatasetOffice
2026-01-25 17:10:37,912 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 30 minutes, 29 seconds)
2026-01-25 17:12:11,240 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:12:14,587 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 21.66619 ± 23.977
2026-01-25 17:12:14,587 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [27.116903, 11.350243, 12.8386345, 29.039255, -16.864336, 33.919582, 45.57725, -7.8228674, 11.808732, 69.69851]
2026-01-25 17:12:14,587 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [157.0, 1000.0, 61.0, 1000.0, 60.0, 310.0, 248.0, 18.0, 97.0, 196.0]
2026-01-25 17:12:14,587 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (21.67) for latency DatasetOffice
2026-01-25 17:12:14,590 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 30 minutes, 20 seconds)
2026-01-25 17:13:44,361 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:13:46,060 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 6.41671 ± 22.385
2026-01-25 17:13:46,060 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [27.677244, -5.1355, -0.051500898, -3.5911686, 30.987246, -3.5526404, 20.051464, 10.305223, 32.839584, -45.362873]
2026-01-25 17:13:46,061 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 43.0, 62.0, 54.0, 105.0, 25.0, 34.0, 44.0, 91.0, 138.0]
2026-01-25 17:13:46,063 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 26 minutes, 13 seconds)
2026-01-25 17:15:19,384 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:15:23,590 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 0.89089 ± 32.721
2026-01-25 17:15:23,590 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [7.628189, 6.4748015, -74.11532, 37.21641, -10.613584, 18.75723, -12.500841, 9.38227, 49.88692, -23.20718]
2026-01-25 17:15:23,590 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 195.0, 175.0, 47.0, 1000.0, 47.0, 1000.0, 140.0, 67.0, 145.0]
2026-01-25 17:15:23,594 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 26 minutes, 37 seconds)
2026-01-25 17:16:55,197 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:16:57,347 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 7.40941 ± 24.283
2026-01-25 17:16:57,347 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [21.576424, 2.381589, 31.952332, 13.197614, -54.815865, 39.16797, 11.11532, 10.760984, -0.43092644, -0.8113359]
2026-01-25 17:16:57,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [349.0, 160.0, 52.0, 56.0, 1000.0, 129.0, 16.0, 28.0, 21.0, 216.0]
2026-01-25 17:16:57,354 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 24 minutes, 51 seconds)
2026-01-25 17:18:38,409 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:18:41,953 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 31.66123 ± 110.151
2026-01-25 17:18:41,953 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [-6.254382, 34.572105, 80.6245, -60.11404, 15.875569, 60.768982, 13.141296, -135.10422, 312.47098, 0.6315271]
2026-01-25 17:18:41,953 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [16.0, 151.0, 374.0, 496.0, 1000.0, 51.0, 27.0, 543.0, 655.0, 24.0]
2026-01-25 17:18:41,953 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (31.66) for latency DatasetOffice
2026-01-25 17:18:41,960 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 25 minutes, 12 seconds)
2026-01-25 17:20:11,767 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:20:15,914 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 26.56399 ± 58.896
2026-01-25 17:20:15,914 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [119.68458, 53.14952, -10.177715, -30.346397, -16.858463, -4.2145853, 4.8291774, -18.268608, 150.34941, 17.492935]
2026-01-25 17:20:15,914 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 307.0, 125.0, 50.0, 1000.0, 19.0, 24.0, 156.0, 1000.0, 66.0]
2026-01-25 17:20:15,920 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 22 minutes, 47 seconds)
2026-01-25 17:21:43,624 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:21:45,749 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 63.40808 ± 59.754
2026-01-25 17:21:45,749 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [17.332546, 79.824646, 57.101105, 17.61095, 7.532575, 16.526575, 70.57886, 108.61025, 216.17152, 42.79181]
2026-01-25 17:21:45,749 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [13.0, 119.0, 134.0, 75.0, 20.0, 186.0, 130.0, 236.0, 1000.0, 46.0]
2026-01-25 17:21:45,749 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (63.41) for latency DatasetOffice
2026-01-25 17:21:45,752 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 20 minutes, 42 seconds)
2026-01-25 17:23:19,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:23:23,580 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 121.33638 ± 169.128
2026-01-25 17:23:23,580 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [456.1811, 55.707172, 85.61912, 31.432919, 6.29404, 66.577255, 31.352701, 25.158676, 455.69193, -0.6511525]
2026-01-25 17:23:23,580 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 181.0, 1000.0, 170.0, 31.0, 146.0, 73.0, 92.0, 1000.0, 104.0]
2026-01-25 17:23:23,580 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (121.34) for latency DatasetOffice
2026-01-25 17:23:23,584 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 19 minutes, 11 seconds)
2026-01-25 17:24:57,174 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:25:00,394 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 64.02826 ± 90.857
2026-01-25 17:25:00,394 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [127.660545, 19.200426, -4.658684, 126.78133, 13.96807, 14.185021, 297.16742, -3.8677716, 45.21471, 4.6314807]
2026-01-25 17:25:00,394 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [440.0, 43.0, 23.0, 1000.0, 36.0, 46.0, 1000.0, 91.0, 161.0, 20.0]
2026-01-25 17:25:00,397 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 18 minutes, 28 seconds)
2026-01-25 17:26:40,667 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:26:45,679 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 116.62573 ± 168.797
2026-01-25 17:26:45,679 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [2.6087794, 527.22534, 10.698567, 57.629253, 210.8466, 303.29944, 11.913172, -24.237715, 42.070927, 24.203045]
2026-01-25 17:26:45,679 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [66.0, 1000.0, 35.0, 1000.0, 1000.0, 1000.0, 60.0, 62.0, 196.0, 24.0]
2026-01-25 17:26:45,682 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 17 minutes, 3 seconds)
2026-01-25 17:28:20,541 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:28:24,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 136.43623 ± 231.284
2026-01-25 17:28:24,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [303.55164, 526.3902, 2.288975, 6.102374, 1.3414264, -103.53921, 583.33167, 19.491383, 19.678171, 5.725588]
2026-01-25 17:28:24,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 39.0, 30.0, 33.0, 243.0, 1000.0, 36.0, 61.0, 28.0]
2026-01-25 17:28:24,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (136.44) for latency DatasetOffice
2026-01-25 17:28:24,410 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 16 minutes, 46 seconds)
2026-01-25 17:29:52,397 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:29:58,674 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 196.06377 ± 232.401
2026-01-25 17:29:58,674 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [265.82327, 6.06813, 21.272587, 540.2442, 519.21497, 22.736433, -10.672105, 55.363045, 3.7402732, 536.8469]
2026-01-25 17:29:58,674 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 19.0, 80.0, 1000.0, 1000.0, 38.0, 100.0, 1000.0, 320.0, 1000.0]
2026-01-25 17:29:58,674 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (196.06) for latency DatasetOffice
2026-01-25 17:29:58,679 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 16 minutes, 22 seconds)
2026-01-25 17:31:35,888 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:31:40,949 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 233.55087 ± 253.232
2026-01-25 17:31:40,949 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [549.83636, -11.778461, 25.287745, 497.39944, 565.88855, 1.5981176, 104.09364, 11.93536, 550.79193, 40.456127]
2026-01-25 17:31:40,949 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 62.0, 57.0, 1000.0, 1000.0, 23.0, 185.0, 84.0, 1000.0, 79.0]
2026-01-25 17:31:40,949 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (233.55) for latency DatasetOffice
2026-01-25 17:31:40,954 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 15 minutes, 56 seconds)
2026-01-25 17:33:16,844 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:33:26,060 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 338.92871 ± 213.825
2026-01-25 17:33:26,060 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [511.1874, 3.4171362, 496.08844, 509.6369, 492.65997, 12.400361, 526.6934, 96.38889, 238.47002, 502.3444]
2026-01-25 17:33:26,060 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 18.0, 1000.0, 1000.0, 1000.0, 36.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 17:33:26,060 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (338.93) for latency DatasetOffice
2026-01-25 17:33:26,074 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 16 minutes, 31 seconds)
2026-01-25 17:34:56,503 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:35:03,914 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 311.75388 ± 251.855
2026-01-25 17:35:03,914 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [567.37744, 546.03827, 29.006363, 537.5002, 253.71011, 41.93937, 578.4431, -9.40249, 542.8403, 30.085817]
2026-01-25 17:35:03,914 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 134.0, 1000.0, 1000.0, 230.0, 1000.0, 25.0, 1000.0, 160.0]
2026-01-25 17:35:03,919 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 12 minutes, 51 seconds)
2026-01-25 17:36:34,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:36:42,118 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 359.40018 ± 235.293
2026-01-25 17:36:42,119 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [34.57826, 506.5138, 568.112, 555.5363, 13.945102, 267.00928, 25.297012, 556.80505, 586.17053, 480.03452]
2026-01-25 17:36:42,119 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [85.0, 1000.0, 1000.0, 1000.0, 26.0, 1000.0, 20.0, 1000.0, 1000.0, 1000.0]
2026-01-25 17:36:42,119 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (359.40) for latency DatasetOffice
2026-01-25 17:36:42,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 11 minutes, 3 seconds)
2026-01-25 17:38:22,053 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:38:32,398 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 426.44928 ± 193.813
2026-01-25 17:38:32,398 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [570.1007, 583.74304, 349.03366, 404.61356, 586.6441, 606.276, 606.1985, 318.32275, 249.5771, -10.016471]
2026-01-25 17:38:32,398 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 29.0]
2026-01-25 17:38:32,399 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (426.45) for latency DatasetOffice
2026-01-25 17:38:32,403 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 13 minutes, 34 seconds)
2026-01-25 17:39:59,280 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:40:07,568 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 337.24722 ± 228.329
2026-01-25 17:40:07,568 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [-32.228294, 524.8002, 33.0763, 337.27594, 595.2235, 38.960346, 384.12268, 386.69443, 504.87662, 599.6703]
2026-01-25 17:40:07,568 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [39.0, 1000.0, 59.0, 1000.0, 1000.0, 72.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 17:40:07,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 10 minutes, 1 second)
2026-01-25 17:41:41,284 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:41:48,397 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 289.28806 ± 221.661
2026-01-25 17:41:48,397 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [319.3189, 430.88043, 352.13812, 33.586826, 524.1476, 550.7268, 576.5063, 59.124065, 38.92553, 7.5259047]
2026-01-25 17:41:48,397 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 51.0, 1000.0, 1000.0, 1000.0, 83.0, 73.0, 12.0]
2026-01-25 17:41:48,404 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 7 minutes, 15 seconds)
2026-01-25 17:43:25,527 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:43:29,282 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 150.70081 ± 235.574
2026-01-25 17:43:29,282 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [30.374353, 3.0226123, 120.52438, -9.857378, 107.24195, 0.15873715, 615.54156, 18.986776, 8.083315, 612.9317]
2026-01-25 17:43:29,282 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [47.0, 20.0, 1000.0, 34.0, 114.0, 26.0, 1000.0, 105.0, 18.0, 1000.0]
2026-01-25 17:43:29,287 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 6 minutes, 20 seconds)
2026-01-25 17:45:04,818 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:45:14,111 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 452.57339 ± 208.951
2026-01-25 17:45:14,111 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [613.5413, 479.25378, 597.63684, 482.9418, 629.1615, 21.619713, 556.40375, 516.3689, 71.42211, 557.38416]
2026-01-25 17:45:14,111 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 79.0, 1000.0, 1000.0, 95.0, 1000.0]
2026-01-25 17:45:14,111 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (452.57) for latency DatasetOffice
2026-01-25 17:45:14,116 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 6 minutes, 17 seconds)
2026-01-25 17:46:49,556 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:46:56,635 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 363.32901 ± 297.694
2026-01-25 17:46:56,635 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [615.8187, 625.0843, 625.5077, 595.9213, -4.2987914, 31.535078, 527.53033, -39.02316, 639.4371, 15.777705]
2026-01-25 17:46:56,635 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 76.0, 39.0, 1000.0, 163.0, 1000.0, 46.0]
2026-01-25 17:46:56,642 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 2 minutes, 41 seconds)
2026-01-25 17:48:27,425 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:48:36,596 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 483.26328 ± 239.912
2026-01-25 17:48:36,596 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [580.4542, 460.195, 632.3164, 626.8451, 671.73224, 658.1817, 559.2943, 14.383557, 609.6887, 19.542107]
2026-01-25 17:48:36,596 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 44.0, 1000.0, 75.0]
2026-01-25 17:48:36,596 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (483.26) for latency DatasetOffice
2026-01-25 17:48:36,602 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 2 minutes, 9 seconds)
2026-01-25 17:50:11,711 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:50:20,192 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 439.34961 ± 271.331
2026-01-25 17:50:20,192 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [616.18866, 599.0853, 6.9726343, 57.329544, 615.6832, 614.0241, 612.0722, 636.42773, 623.4108, 12.302098]
2026-01-25 17:50:20,192 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 32.0, 355.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 101.0]
2026-01-25 17:50:20,198 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 1 minute, 7 seconds)
2026-01-25 17:51:56,265 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:52:04,343 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 414.13184 ± 264.011
2026-01-25 17:52:04,343 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [6.290556, 530.33075, 606.5321, 591.65295, 602.856, 23.648882, 570.40814, 6.7696934, 596.9999, 605.829]
2026-01-25 17:52:04,343 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [10.0, 1000.0, 1000.0, 1000.0, 1000.0, 60.0, 1000.0, 58.0, 1000.0, 1000.0]
2026-01-25 17:52:04,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 10 seconds)
2026-01-25 17:53:40,040 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:53:50,551 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 548.29065 ± 173.935
2026-01-25 17:53:50,551 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [589.3652, 631.51965, 622.58606, 436.03043, 621.7232, 638.9743, 625.21893, 55.98907, 643.0867, 618.41345]
2026-01-25 17:53:50,551 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 189.0, 1000.0, 1000.0]
2026-01-25 17:53:50,551 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (548.29) for latency DatasetOffice
2026-01-25 17:53:50,559 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 58 minutes, 46 seconds)
2026-01-25 17:55:16,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:55:26,012 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 259.46957 ± 231.335
2026-01-25 17:55:26,012 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [113.42907, 215.45853, 666.9139, 70.3142, 326.62683, 47.343037, 316.75018, 143.47252, 685.1797, 9.207792]
2026-01-25 17:55:26,012 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [959.0, 1000.0, 1000.0, 1000.0, 1000.0, 130.0, 1000.0, 1000.0, 1000.0, 59.0]
2026-01-25 17:55:26,017 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 55 minutes, 27 seconds)
2026-01-25 17:57:03,664 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:57:11,795 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 444.65558 ± 282.205
2026-01-25 17:57:11,795 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [6.2935295, 668.8543, 642.1446, 36.83698, 659.1059, 604.25336, 637.3042, 5.4077396, 554.3026, 632.0521]
2026-01-25 17:57:11,795 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [30.0, 1000.0, 1000.0, 77.0, 1000.0, 1000.0, 1000.0, 73.0, 1000.0, 1000.0]
2026-01-25 17:57:11,801 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 55 minutes, 3 seconds)
2026-01-25 17:58:49,376 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:59:00,763 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 634.50336 ± 60.313
2026-01-25 17:59:00,763 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [685.18243, 645.3567, 463.3338, 670.96643, 655.8786, 659.3133, 664.827, 606.77606, 647.9757, 645.4237]
2026-01-25 17:59:00,763 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 17:59:00,763 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (634.50) for latency DatasetOffice
2026-01-25 17:59:00,772 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 54 minutes, 31 seconds)
2026-01-25 18:00:25,875 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:00:33,959 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 453.25684 ± 277.688
2026-01-25 18:00:33,959 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [80.994125, 621.47296, 7.5091267, 592.2139, 640.67584, 635.99054, 637.6538, 4.3997617, 664.9544, 646.7038]
2026-01-25 18:00:33,959 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [73.0, 1000.0, 17.0, 1000.0, 1000.0, 1000.0, 1000.0, 37.0, 1000.0, 1000.0]
2026-01-25 18:00:33,966 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 50 minutes, 25 seconds)
2026-01-25 18:02:11,176 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:02:19,462 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 409.49399 ± 249.534
2026-01-25 18:02:19,462 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [656.38196, 545.69867, 659.83887, 577.2852, 23.69357, 45.743565, 452.69028, 97.570435, 673.6916, 362.346]
2026-01-25 18:02:19,463 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 100.0, 78.0, 1000.0, 148.0, 1000.0, 1000.0]
2026-01-25 18:02:19,469 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 48 minutes, 34 seconds)
2026-01-25 18:03:58,902 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:04:06,993 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 333.39078 ± 268.967
2026-01-25 18:04:06,993 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [651.6419, 4.5915203, 534.0979, 366.80875, 663.2459, 28.394522, 550.4205, 37.550247, 495.3627, 1.7938738]
2026-01-25 18:04:06,993 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 16.0, 1000.0, 1000.0, 1000.0, 41.0, 1000.0, 1000.0, 1000.0, 15.0]
2026-01-25 18:04:06,999 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 49 minutes, 24 seconds)
2026-01-25 18:05:34,622 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:05:44,006 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 482.01822 ± 215.231
2026-01-25 18:05:44,006 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [532.36914, 631.66296, 632.8147, 546.68036, 590.983, 542.9023, 71.46526, 593.85016, 633.4196, 44.0347]
2026-01-25 18:05:44,006 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 198.0, 1000.0, 1000.0, 79.0]
2026-01-25 18:05:44,014 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 45 minutes, 51 seconds)
2026-01-25 18:07:23,162 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:07:30,036 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 401.07755 ± 324.035
2026-01-25 18:07:30,037 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [671.84814, 12.206987, 689.55585, 640.434, 676.36414, 660.14923, 3.2957814, 654.0498, -12.444693, 15.316115]
2026-01-25 18:07:30,037 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 29.0, 1000.0, 1000.0, 1000.0, 1000.0, 19.0, 1000.0, 12.0, 26.0]
2026-01-25 18:07:30,042 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 43 minutes, 33 seconds)
2026-01-25 18:08:55,166 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:09:04,359 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 444.73895 ± 245.277
2026-01-25 18:09:04,359 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [657.5859, 571.57544, 15.781329, 672.8811, 651.567, 592.737, 617.10284, 19.737537, 317.4975, 330.92407]
2026-01-25 18:09:04,359 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 37.0, 1000.0, 1000.0, 1000.0, 1000.0, 45.0, 1000.0, 1000.0]
2026-01-25 18:09:04,370 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 42 minutes, 4 seconds)
2026-01-25 18:10:41,011 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:10:48,318 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 364.25946 ± 265.844
2026-01-25 18:10:48,318 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [609.64355, 148.35736, 570.9569, 25.46947, 604.58405, 626.22656, 434.7477, 605.3088, 4.956213, 12.344]
2026-01-25 18:10:48,318 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 324.0, 1000.0, 46.0, 1000.0, 1000.0, 1000.0, 1000.0, 18.0, 50.0]
2026-01-25 18:10:48,324 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 40 minutes, 4 seconds)
2026-01-25 18:12:22,699 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:12:29,679 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 394.71393 ± 292.338
2026-01-25 18:12:29,679 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [653.8808, 34.50215, 70.171036, 645.77435, 10.755738, 632.2905, 641.892, 576.1109, 645.5694, 36.192623]
2026-01-25 18:12:29,679 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 67.0, 102.0, 1000.0, 16.0, 1000.0, 1000.0, 1000.0, 1000.0, 101.0]
2026-01-25 18:12:29,685 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 37 minutes, 11 seconds)
2026-01-25 18:14:02,323 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:14:10,293 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 452.26056 ± 288.151
2026-01-25 18:14:10,293 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [652.9995, 43.672035, 593.52954, -6.866982, 600.14343, 677.7666, 7.7685075, 619.9487, 636.58606, 697.0582]
2026-01-25 18:14:10,293 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 32.0, 1000.0, 32.0, 1000.0, 1000.0, 76.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:14:10,300 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 36 minutes, 11 seconds)
2026-01-25 18:15:37,295 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:15:43,232 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 235.09415 ± 270.826
2026-01-25 18:15:43,232 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [616.4479, 4.486261, 632.3465, 453.463, 35.407665, 20.101427, 38.64205, 42.439125, 542.6434, -35.035954]
2026-01-25 18:15:43,232 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 67.0, 1000.0, 1000.0, 128.0, 55.0, 62.0, 87.0, 1000.0, 1000.0]
2026-01-25 18:15:43,239 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 32 minutes, 3 seconds)
2026-01-25 18:17:15,928 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:17:27,239 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 465.21143 ± 253.858
2026-01-25 18:17:27,239 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [176.49971, 708.25433, 643.31964, 120.31348, 690.44836, 223.15701, 642.65137, 627.85724, 708.0825, 111.530594]
2026-01-25 18:17:27,239 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:17:27,245 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 32 minutes, 11 seconds)
2026-01-25 18:18:58,732 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:19:08,847 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 542.72644 ± 236.184
2026-01-25 18:19:08,847 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [645.4206, 687.8594, 647.579, 666.3758, 615.4651, 671.70685, 0.06808961, 673.6235, 153.7095, 665.45636]
2026-01-25 18:19:08,847 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 50.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:19:08,854 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 30 minutes, 5 seconds)
2026-01-25 18:20:42,645 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:20:52,784 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 471.07022 ± 160.990
2026-01-25 18:20:52,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [538.8768, 550.86084, -3.438162, 535.6321, 550.28625, 553.8462, 448.12177, 505.38776, 508.61188, 522.51666]
2026-01-25 18:20:52,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 67.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:20:52,791 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 28 minutes, 52 seconds)
2026-01-25 18:22:26,004 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:22:33,268 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 368.01285 ± 314.082
2026-01-25 18:22:33,268 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [-5.6323905, 656.12775, -40.882336, 710.7823, 19.66649, 659.6896, 13.8803625, 392.60266, 669.20465, 604.6894]
2026-01-25 18:22:33,268 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [15.0, 1000.0, 465.0, 1000.0, 24.0, 1000.0, 27.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:22:33,274 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 27 minutes, 10 seconds)
2026-01-25 18:24:00,591 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:24:09,531 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 514.39227 ± 249.756
2026-01-25 18:24:09,531 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [652.55145, 634.4483, 667.02155, 656.94434, 626.6849, 628.58954, 627.0223, 25.221945, 619.06305, 6.375379]
2026-01-25 18:24:09,531 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 34.0, 1000.0, 9.0]
2026-01-25 18:24:09,537 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 26 minutes, 4 seconds)
2026-01-25 18:25:41,573 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:25:49,683 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 408.53500 ± 267.949
2026-01-25 18:25:49,683 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [611.25714, 625.06335, 515.67957, -17.13222, 501.03677, 22.494772, 29.704485, 678.3996, 464.93182, 653.91473]
2026-01-25 18:25:49,683 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 117.0, 1000.0, 55.0, 39.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:25:49,692 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 23 minutes, 44 seconds)
2026-01-25 18:27:23,328 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:27:29,110 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 325.75070 ± 310.948
2026-01-25 18:27:29,110 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [2.885309, 679.8089, 25.340384, 17.427572, 597.4021, 664.1428, 680.3286, 48.56792, 549.2591, -7.6555324]
2026-01-25 18:27:29,110 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [11.0, 1000.0, 37.0, 32.0, 1000.0, 1000.0, 1000.0, 90.0, 1000.0, 30.0]
2026-01-25 18:27:29,120 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 21 minutes, 42 seconds)
2026-01-25 18:29:01,105 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:29:09,165 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 412.33710 ± 267.773
2026-01-25 18:29:09,165 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [29.846329, 561.51483, 654.63354, 532.53955, 553.9081, 617.7815, 44.712612, 633.7306, -47.997578, 542.7018]
2026-01-25 18:29:09,165 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [111.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 80.0, 1000.0, 131.0, 1000.0]
2026-01-25 18:29:09,173 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 19 minutes, 25 seconds)
2026-01-25 18:30:45,081 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:30:53,181 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 414.85083 ± 261.861
2026-01-25 18:30:53,182 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [598.2774, 595.93036, 601.38727, 48.64215, 457.49594, 597.60333, 622.2561, 40.483994, 611.33496, -24.903097]
2026-01-25 18:30:53,182 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 56.0, 1000.0, 1000.0, 1000.0, 87.0, 1000.0, 35.0]
2026-01-25 18:30:53,189 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 18 minutes, 19 seconds)
2026-01-25 18:32:20,577 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:32:26,335 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 324.96802 ± 313.558
2026-01-25 18:32:26,336 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [683.77515, 581.342, 689.3384, 594.5342, 2.854949, 12.3006115, 17.68829, 9.791607, 22.454573, 635.6002]
2026-01-25 18:32:26,336 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 25.0, 36.0, 58.0, 25.0, 32.0, 1000.0]
2026-01-25 18:32:26,343 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 16 minutes, 10 seconds)
2026-01-25 18:34:01,381 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:34:08,386 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 360.76257 ± 292.176
2026-01-25 18:34:08,387 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [598.6002, 575.65656, -3.8438482, 649.6355, 17.801605, 621.7916, 16.194178, 670.55005, 454.45706, 6.783003]
2026-01-25 18:34:08,387 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 29.0, 1000.0, 174.0, 1000.0, 21.0, 1000.0, 1000.0, 21.0]
2026-01-25 18:34:08,393 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 14 minutes, 48 seconds)
2026-01-25 18:35:41,577 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:35:48,579 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 377.07788 ± 292.291
2026-01-25 18:35:48,579 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [649.5763, 644.41797, 672.2759, 588.13104, -5.3982267, 24.40466, 18.883554, 501.34274, 56.124664, 621.02]
2026-01-25 18:35:48,579 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 53.0, 46.0, 62.0, 1000.0, 119.0, 1000.0]
2026-01-25 18:35:48,586 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 13 minutes, 15 seconds)
2026-01-25 18:37:14,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:37:26,209 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 613.32922 ± 68.303
2026-01-25 18:37:26,209 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [707.2883, 693.7775, 559.1567, 605.4584, 515.5848, 631.45325, 651.55005, 581.8009, 506.31427, 680.9074]
2026-01-25 18:37:26,209 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:37:26,216 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 11 minutes, 14 seconds)
2026-01-25 18:39:01,697 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:39:11,728 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 520.00226 ± 175.402
2026-01-25 18:39:11,728 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [642.1555, 592.1912, 480.84622, 633.88184, 23.242079, 556.29, 605.0018, 550.8695, 474.38477, 641.1595]
2026-01-25 18:39:11,728 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 34.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:39:11,738 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 9 minutes, 47 seconds)
2026-01-25 18:40:44,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:40:53,373 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 514.42291 ± 237.651
2026-01-25 18:40:53,373 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [638.62476, 649.27747, 618.77985, 583.53906, 594.6702, 13.845381, 73.47329, 644.71484, 628.05963, 699.24475]
2026-01-25 18:40:53,373 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 45.0, 99.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:40:53,380 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 9 minutes, 17 seconds)
2026-01-25 18:42:26,147 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:42:34,280 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 437.44385 ± 288.318
2026-01-25 18:42:34,280 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [-5.3259897, 682.5322, 10.499348, 661.1368, 611.8652, 639.8625, 591.5637, -7.2667937, 580.4828, 609.0891]
2026-01-25 18:42:34,280 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [36.0, 1000.0, 222.0, 1000.0, 1000.0, 1000.0, 1000.0, 128.0, 1000.0, 1000.0]
2026-01-25 18:42:34,289 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 7 minutes, 27 seconds)
2026-01-25 18:44:07,030 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:44:12,862 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 335.48956 ± 328.572
2026-01-25 18:44:12,862 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [31.108458, 617.0706, 632.6174, 718.46515, 675.59564, 8.037274, -6.4375033, 16.646313, 670.88934, -9.097094]
2026-01-25 18:44:12,862 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [61.0, 1000.0, 1000.0, 1000.0, 1000.0, 26.0, 31.0, 67.0, 1000.0, 14.0]
2026-01-25 18:44:12,869 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 5 minutes, 33 seconds)
2026-01-25 18:45:46,664 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:45:54,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 429.90137 ± 281.271
2026-01-25 18:45:54,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [644.10645, 641.4399, -1.266423, 665.5147, 542.3323, 16.22128, 655.6442, 0.23280825, 536.35376, 598.43494]
2026-01-25 18:45:54,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 25.0, 1000.0, 1000.0, 39.0, 1000.0, 19.0, 1000.0, 1000.0]
2026-01-25 18:45:54,556 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 4 minutes, 23 seconds)
2026-01-25 18:47:18,769 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:47:27,823 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 316.71323 ± 238.937
2026-01-25 18:47:27,823 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [4.2731724, 426.73038, 497.9457, 8.770228, 610.53937, 0.5922466, 278.71017, 202.22285, 622.1794, 515.1687]
2026-01-25 18:47:27,823 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [17.0, 1000.0, 1000.0, 69.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:47:27,834 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 1 minute, 11 seconds)
2026-01-25 18:48:59,822 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:49:08,834 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 438.30795 ± 223.289
2026-01-25 18:49:08,834 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [609.99603, 9.583009, 553.9205, 525.2188, 478.07983, 628.281, 339.8337, 612.48865, 34.286076, 591.39215]
2026-01-25 18:49:08,834 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 44.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 80.0, 1000.0]
2026-01-25 18:49:08,845 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 65/100 (estimated time remaining: 59 minutes, 27 seconds)
2026-01-25 18:50:42,170 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:50:52,520 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 562.41614 ± 177.253
2026-01-25 18:50:52,520 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [689.31995, 597.69025, 625.76044, 691.2396, 80.173096, 661.36664, 460.23523, 607.9913, 699.2748, 511.10983]
2026-01-25 18:50:52,520 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 215.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:50:52,528 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 66/100 (estimated time remaining: 58 minutes, 7 seconds)
2026-01-25 18:52:29,873 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:52:36,861 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 374.58374 ± 291.411
2026-01-25 18:52:36,862 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [566.14813, 603.3512, -4.5856943, 48.429558, 590.38745, 732.70264, 48.235638, 20.444927, 454.96716, 685.7564]
2026-01-25 18:52:36,862 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 57.0, 50.0, 1000.0, 1000.0, 63.0, 75.0, 1000.0, 1000.0]
2026-01-25 18:52:36,868 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 67/100 (estimated time remaining: 57 minutes, 7 seconds)
2026-01-25 18:54:09,834 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:54:20,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 534.88000 ± 184.140
2026-01-25 18:54:20,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [584.63165, 587.33453, 593.8519, 620.1202, 655.3891, 510.83124, 647.9581, 632.5243, 515.41626, 0.74220693]
2026-01-25 18:54:20,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 121.0]
2026-01-25 18:54:20,055 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 68/100 (estimated time remaining: 55 minutes, 36 seconds)
2026-01-25 18:55:45,529 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:55:56,726 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 585.08606 ± 146.424
2026-01-25 18:55:56,726 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [607.2488, 609.5183, 174.13925, 634.64417, 586.148, 675.9924, 528.5925, 659.73615, 642.6789, 732.1628]
2026-01-25 18:55:56,726 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:55:56,734 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 69/100 (estimated time remaining: 54 minutes, 16 seconds)
2026-01-25 18:57:30,916 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:57:38,736 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 416.65088 ± 285.048
2026-01-25 18:57:38,736 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [648.12885, 354.02997, 595.36383, 676.1236, 657.48425, -7.018743, 671.26996, 6.1301923, 9.635747, 555.3609]
2026-01-25 18:57:38,736 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 26.0, 1000.0, 39.0, 17.0, 1000.0]
2026-01-25 18:57:38,744 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 70/100 (estimated time remaining: 52 minutes, 41 seconds)
2026-01-25 18:59:11,957 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:59:22,251 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 559.30566 ± 220.935
2026-01-25 18:59:22,251 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [690.7863, 649.5401, 345.7555, 641.3011, 672.23315, 639.4231, -38.369823, 658.9464, 688.7887, 644.65186]
2026-01-25 18:59:22,251 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 177.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:59:22,259 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 71/100 (estimated time remaining: 50 minutes, 58 seconds)
2026-01-25 19:00:55,093 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:01:06,232 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 610.21844 ± 58.275
2026-01-25 19:01:06,232 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [666.773, 553.5775, 495.40604, 665.9834, 662.66876, 533.9179, 629.674, 610.7364, 628.9735, 654.4738]
2026-01-25 19:01:06,232 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:01:06,241 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 72/100 (estimated time remaining: 49 minutes, 14 seconds)
2026-01-25 19:02:38,416 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:02:48,478 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 567.38025 ± 194.365
2026-01-25 19:02:48,478 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [647.4267, 625.66895, 637.721, 0.029029392, 671.23694, 502.61612, 632.6576, 656.06146, 662.74554, 637.63873]
2026-01-25 19:02:48,478 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 20.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:02:48,488 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 73/100 (estimated time remaining: 47 minutes, 27 seconds)
2026-01-25 19:04:14,401 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:04:24,375 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 536.33160 ± 185.993
2026-01-25 19:04:24,376 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [-3.2649863, 587.9814, 498.53256, 592.38464, 609.7481, 637.2326, 578.0031, 560.9736, 691.98865, 609.7365]
2026-01-25 19:04:24,376 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [13.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:04:24,384 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 74/100 (estimated time remaining: 45 minutes, 41 seconds)
2026-01-25 19:05:58,439 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:06:06,497 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 460.86108 ± 307.039
2026-01-25 19:06:06,497 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [682.4896, 382.58972, 672.67773, 709.7183, 730.77875, -3.9335184, 13.331118, 34.369823, 721.31006, 665.27936]
2026-01-25 19:06:06,497 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 28.0, 31.0, 104.0, 1000.0, 1000.0]
2026-01-25 19:06:06,505 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 75/100 (estimated time remaining: 44 minutes)
2026-01-25 19:07:39,347 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:07:48,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 489.47812 ± 234.114
2026-01-25 19:07:48,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [635.4537, 479.51733, 568.24304, 633.58856, 615.587, 567.75793, -13.749836, 611.9949, 708.6509, 87.73778]
2026-01-25 19:07:48,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 71.0, 1000.0, 1000.0, 162.0]
2026-01-25 19:07:48,637 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 76/100 (estimated time remaining: 42 minutes, 11 seconds)
2026-01-25 19:09:22,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:09:32,209 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 540.87146 ± 219.715
2026-01-25 19:09:32,209 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [712.3277, 594.09375, 611.0542, 485.50342, 725.1429, 6.3907466, 703.86774, 719.2393, 563.24146, 287.8533]
2026-01-25 19:09:32,209 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 14.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:09:32,218 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 77/100 (estimated time remaining: 40 minutes, 28 seconds)
2026-01-25 19:11:05,155 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:11:15,188 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 527.05945 ± 198.361
2026-01-25 19:11:15,189 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [585.8456, 655.1166, 3.5253232, 661.304, 625.55475, 592.1967, 539.29395, 619.7621, 325.72476, 662.2704]
2026-01-25 19:11:15,189 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 21.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:11:15,197 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 78/100 (estimated time remaining: 38 minutes, 50 seconds)
2026-01-25 19:12:40,720 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:12:49,846 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 529.92261 ± 272.327
2026-01-25 19:12:49,846 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [725.31476, 690.43134, 687.115, 620.1268, 758.8729, -9.263189, 636.4532, 614.65967, 575.74884, -0.23358886]
2026-01-25 19:12:49,846 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 112.0, 1000.0, 1000.0, 1000.0, 31.0]
2026-01-25 19:12:49,855 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 79/100 (estimated time remaining: 37 minutes, 4 seconds)
2026-01-25 19:14:22,446 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:14:32,674 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 559.42639 ± 200.150
2026-01-25 19:14:32,674 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [708.8709, 73.718796, 591.0398, 657.01465, 337.91675, 711.73425, 652.56415, 714.8525, 458.49005, 688.0614]
2026-01-25 19:14:32,674 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 114.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:14:32,682 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 80/100 (estimated time remaining: 35 minutes, 25 seconds)
2026-01-25 19:16:05,518 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:16:14,666 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 480.09735 ± 240.638
2026-01-25 19:16:14,667 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [599.20795, 576.93823, 540.93774, 12.351416, 597.9454, -3.115847, 696.28796, 604.92096, 595.86975, 579.6297]
2026-01-25 19:16:14,667 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 85.0, 1000.0, 66.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:16:14,680 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 81/100 (estimated time remaining: 33 minutes, 44 seconds)
2026-01-25 19:17:49,032 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:17:58,176 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 520.29199 ± 251.055
2026-01-25 19:17:58,176 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [734.53534, 466.5112, -0.67947286, 660.3363, 677.56683, 72.78834, 637.8014, 660.85394, 646.946, 646.26013]
2026-01-25 19:17:58,176 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 24.0, 1000.0, 1000.0, 90.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:17:58,186 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 82/100 (estimated time remaining: 32 minutes, 2 seconds)
2026-01-25 19:19:31,165 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:19:42,354 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 623.22498 ± 67.049
2026-01-25 19:19:42,354 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [710.22925, 469.92285, 635.0997, 711.8915, 581.7368, 675.15717, 629.6163, 610.382, 587.8734, 620.341]
2026-01-25 19:19:42,354 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:19:42,364 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 83/100 (estimated time remaining: 30 minutes, 25 seconds)
2026-01-25 19:21:14,994 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:21:26,300 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 609.82111 ± 62.584
2026-01-25 19:21:26,300 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [583.03265, 662.7432, 580.65314, 648.85693, 622.4701, 442.69604, 653.9535, 642.1243, 605.2533, 656.4275]
2026-01-25 19:21:26,300 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:21:26,311 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 84/100 (estimated time remaining: 29 minutes, 15 seconds)
2026-01-25 19:22:58,453 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:23:08,497 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 552.67291 ± 172.452
2026-01-25 19:23:08,498 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [524.61383, 634.7574, 586.5497, 623.9723, 568.72156, 695.8856, 56.110176, 645.1721, 550.3215, 640.6257]
2026-01-25 19:23:08,498 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 67.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:23:08,507 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 85/100 (estimated time remaining: 27 minutes, 30 seconds)
2026-01-25 19:24:41,287 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:24:51,243 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 574.48395 ± 192.247
2026-01-25 19:24:51,244 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [649.5226, 655.45483, 649.94586, 577.902, 655.9901, 644.5574, 579.2046, 6.022704, 638.02094, 688.21844]
2026-01-25 19:24:51,244 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 38.0, 1000.0, 1000.0]
2026-01-25 19:24:51,256 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 86/100 (estimated time remaining: 25 minutes, 49 seconds)
2026-01-25 19:26:23,086 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:26:33,184 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 565.67743 ± 192.383
2026-01-25 19:26:33,184 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [644.1556, 3.2600765, 604.50366, 712.333, 684.37396, 652.6223, 587.0744, 602.3818, 566.1445, 599.9246]
2026-01-25 19:26:33,184 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 15.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:26:33,194 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 87/100 (estimated time remaining: 24 minutes, 2 seconds)
2026-01-25 19:28:04,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:28:16,136 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 612.73859 ± 55.192
2026-01-25 19:28:16,136 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [624.56573, 672.0035, 599.6208, 562.1123, 625.9852, 506.23306, 572.2191, 594.6125, 676.7523, 693.281]
2026-01-25 19:28:16,136 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:28:16,146 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 88/100 (estimated time remaining: 22 minutes, 15 seconds)
2026-01-25 19:29:48,663 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:29:59,867 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 634.69019 ± 62.531
2026-01-25 19:29:59,867 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [707.8967, 609.61426, 630.0511, 697.02966, 482.4922, 613.5567, 673.02344, 649.5044, 598.2646, 685.46857]
2026-01-25 19:29:59,867 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:29:59,867 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (634.69) for latency DatasetOffice
2026-01-25 19:29:59,877 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 89/100 (estimated time remaining: 20 minutes, 32 seconds)
2026-01-25 19:31:23,296 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:31:34,434 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 612.50763 ± 59.992
2026-01-25 19:31:34,434 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [643.97766, 642.7499, 591.7701, 676.4219, 637.2631, 620.21826, 458.1495, 567.3699, 620.0904, 667.0652]
2026-01-25 19:31:34,434 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:31:34,444 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 90/100 (estimated time remaining: 18 minutes, 33 seconds)
2026-01-25 19:33:06,709 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:33:16,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 541.41888 ± 202.458
2026-01-25 19:33:16,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [676.8151, 672.566, 612.49005, 616.8265, 24.305166, 640.6076, 620.47235, 288.45236, 622.24896, 639.4048]
2026-01-25 19:33:16,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 64.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:33:16,829 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 91/100 (estimated time remaining: 16 minutes, 51 seconds)
2026-01-25 19:34:48,671 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:34:55,529 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 403.02954 ± 318.656
2026-01-25 19:34:55,529 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [678.736, 669.9307, 22.368004, 731.77966, 732.36115, 687.8835, 33.336838, 8.964178, 38.250584, 426.6848]
2026-01-25 19:34:55,529 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 32.0, 1000.0, 1000.0, 1000.0, 77.0, 27.0, 63.0, 1000.0]
2026-01-25 19:34:55,540 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 92/100 (estimated time remaining: 15 minutes, 4 seconds)
2026-01-25 19:36:27,792 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:36:38,994 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 608.55542 ± 124.351
2026-01-25 19:36:38,994 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [651.5643, 643.56305, 241.32887, 645.1247, 639.21246, 616.35815, 644.17334, 645.61676, 650.06555, 708.54724]
2026-01-25 19:36:38,994 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:36:39,004 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 24 seconds)
2026-01-25 19:38:11,896 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:38:18,618 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 404.72427 ± 325.511
2026-01-25 19:38:18,618 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [589.4285, 631.10846, 647.90405, 737.9903, 30.786093, 1.1446239, 694.7904, 3.796083, 709.761, 0.5335199]
2026-01-25 19:38:18,618 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 52.0, 23.0, 1000.0, 33.0, 1000.0, 16.0]
2026-01-25 19:38:18,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 38 seconds)
2026-01-25 19:39:51,667 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:40:00,636 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 515.57709 ± 254.088
2026-01-25 19:40:00,636 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [660.8551, 627.8268, 612.63617, 9.4040985, 663.98376, 9.694867, 598.5732, 632.68866, 670.70013, 669.40814]
2026-01-25 19:40:00,636 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 17.0, 1000.0, 21.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:40:00,647 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 7 seconds)
2026-01-25 19:41:33,233 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:41:42,218 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 514.66052 ± 251.355
2026-01-25 19:41:42,218 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [722.8095, 633.6631, 665.16876, 588.5189, 592.2848, 33.504818, 705.85284, 568.50854, 8.332204, 627.96173]
2026-01-25 19:41:42,218 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 41.0, 1000.0, 1000.0, 41.0, 1000.0]
2026-01-25 19:41:42,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 25 seconds)
2026-01-25 19:43:12,730 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:43:18,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 328.62299 ± 318.630
2026-01-25 19:43:18,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [662.3379, 11.970041, 596.31793, -2.0622454, 689.2036, 633.9334, 11.654195, 650.4803, 9.598143, 22.796728]
2026-01-25 19:43:18,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 27.0, 1000.0, 60.0, 1000.0, 1000.0, 52.0, 1000.0, 50.0, 23.0]
2026-01-25 19:43:18,341 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 42 seconds)
2026-01-25 19:44:50,050 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:44:58,732 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 527.56317 ± 261.973
2026-01-25 19:44:58,732 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [689.5699, 2.4003756, 677.18585, 689.3382, 694.7, 651.4635, 630.15784, 11.176282, 616.0669, 613.57294]
2026-01-25 19:44:58,732 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 14.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 65.0, 1000.0, 1000.0]
2026-01-25 19:44:58,743 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 98/100 (estimated time remaining: 4 minutes, 59 seconds)
2026-01-25 19:46:29,255 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:46:38,066 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 544.01917 ± 258.770
2026-01-25 19:46:38,066 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [712.57935, 667.3643, 646.8002, 689.3714, 644.0711, 34.31029, 695.1016, 655.25745, 673.38416, 21.951715]
2026-01-25 19:46:38,066 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 38.0, 1000.0, 1000.0, 1000.0, 66.0]
2026-01-25 19:46:38,077 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 19 seconds)
2026-01-25 19:48:00,460 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:48:09,266 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 538.24121 ± 259.617
2026-01-25 19:48:09,266 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [6.923895, 634.7483, 725.31555, 712.6938, 648.5503, 659.7043, 632.6969, 37.91042, 671.7909, 652.0777]
2026-01-25 19:48:09,266 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [20.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 104.0, 1000.0, 1000.0]
2026-01-25 19:48:09,277 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 37 seconds)
2026-01-25 19:49:39,898 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:49:47,583 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 464.98520 ± 304.942
2026-01-25 19:49:47,583 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [633.68726, 728.07306, 607.6004, 4.661764, -0.19712844, 721.80164, 676.4466, 639.5265, 635.2801, 2.9717839]
2026-01-25 19:49:47,583 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 15.0, 22.0, 1000.0, 1000.0, 1000.0, 1000.0, 16.0]
2026-01-25 19:49:47,594 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1299 [DEBUG]: Training session finished
