2026-01-22 23:51:41,451 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-ant/DatasetOffice-sac-aug-mem5 
2026-01-22 23:51:41,451 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-ant/DatasetOffice-sac-aug-mem5 
2026-01-22 23:51:41,451 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x1475c08bbf50>}
2026-01-22 23:51:41,451 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1159 [DEBUG]: using device: cuda
2026-01-22 23:51:41,451 baseline-sac-noisy-ant:77 [WARNING]: args.memorize_actions != args.horizon: 5 != 32
2026-01-22 23:51:41,595 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1181 [INFO]: Creating new trainer
2026-01-22 23:51:41,612 baseline-sac-noisy-ant:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=67, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2026-01-22 23:51:41,612 baseline-sac-noisy-ant:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=75, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-22 23:51:42,502 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1242 [DEBUG]: Starting training session...
2026-01-22 23:51:42,502 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 1/100
2026-01-22 23:53:15,652 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:53:17,160 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: -209.65479 ± 494.765
2026-01-22 23:53:17,160 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [-10.163131, -150.83835, -0.58004856, -27.750195, -11.254916, -77.074165, -57.16323, -72.675354, -1687.9663, -1.0819607]
2026-01-22 23:53:17,160 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [25.0, 96.0, 20.0, 31.0, 24.0, 54.0, 43.0, 54.0, 1000.0, 21.0]
2026-01-22 23:53:17,160 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (-209.65) for latency DatasetOffice
2026-01-22 23:53:17,163 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 36 minutes, 11 seconds)
2026-01-22 23:54:46,985 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:54:50,733 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: -268.49216 ± 371.543
2026-01-22 23:54:50,733 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [-36.830555, -22.375223, -16.004498, -855.72205, -860.97345, -9.861292, -788.8542, -22.948498, -25.274979, -46.077003]
2026-01-22 23:54:50,733 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [83.0, 33.0, 34.0, 1000.0, 1000.0, 33.0, 1000.0, 36.0, 54.0, 44.0]
2026-01-22 23:54:50,737 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 33 minutes, 43 seconds)
2026-01-22 23:56:30,770 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:56:34,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: -94.55370 ± 104.204
2026-01-22 23:56:34,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [-10.061313, -252.06943, -287.84885, -209.44746, -20.769232, -47.65533, -24.045528, -1.4031701, -52.73209, -39.504593]
2026-01-22 23:56:34,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [73.0, 1000.0, 1000.0, 1000.0, 134.0, 119.0, 154.0, 33.0, 139.0, 88.0]
2026-01-22 23:56:34,923 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (-94.55) for latency DatasetOffice
2026-01-22 23:56:34,926 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 37 minutes, 35 seconds)
2026-01-22 23:58:05,631 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:58:08,089 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 8.67072 ± 29.293
2026-01-22 23:58:08,089 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [-54.439304, -14.012276, 41.022823, 39.288914, 35.117126, 13.20903, -20.660606, 5.428704, 10.622478, 31.130322]
2026-01-22 23:58:08,089 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [179.0, 49.0, 52.0, 300.0, 1000.0, 56.0, 77.0, 284.0, 238.0, 40.0]
2026-01-22 23:58:08,089 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (8.67) for latency DatasetOffice
2026-01-22 23:58:08,093 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 34 minutes, 14 seconds)
2026-01-22 23:59:47,012 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:59:49,695 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 38.39285 ± 53.256
2026-01-22 23:59:49,695 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [24.78803, 25.69235, 18.789196, 98.71509, 6.1354256, 20.241184, -27.99051, 18.2619, 171.43405, 27.861786]
2026-01-22 23:59:49,695 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [319.0, 188.0, 46.0, 1000.0, 70.0, 56.0, 98.0, 156.0, 447.0, 76.0]
2026-01-22 23:59:49,695 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (38.39) for latency DatasetOffice
2026-01-22 23:59:49,700 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 34 minutes, 16 seconds)
2026-01-23 00:01:18,118 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:01:19,632 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 10.28884 ± 8.198
2026-01-23 00:01:19,632 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [25.065731, 17.618872, 7.5496287, 5.1094337, 23.273802, 9.50198, 6.955575, 4.3537426, 0.40676647, 3.0528238]
2026-01-23 00:01:19,632 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [23.0, 1000.0, 20.0, 56.0, 68.0, 62.0, 28.0, 66.0, 14.0, 23.0]
2026-01-23 00:01:19,637 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 31 minutes, 10 seconds)
2026-01-23 00:03:02,220 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:03:03,192 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: -3.32470 ± 26.214
2026-01-23 00:03:03,192 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [-69.446945, 2.4552364, -34.67962, 15.388071, 19.374706, 3.636982, 10.821721, 4.4304104, 13.693783, 1.0786537]
2026-01-23 00:03:03,193 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [397.0, 74.0, 59.0, 23.0, 118.0, 34.0, 58.0, 56.0, 37.0, 74.0]
2026-01-23 00:03:03,198 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 32 minutes, 39 seconds)
2026-01-23 00:04:32,592 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:04:36,148 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 62.45897 ± 82.127
2026-01-23 00:04:36,149 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [-20.58075, 46.868988, 25.713333, -2.767136, 22.620794, 158.63243, 204.70883, -13.2128725, 187.58504, 15.021068]
2026-01-23 00:04:36,149 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [52.0, 254.0, 95.0, 24.0, 56.0, 494.0, 1000.0, 218.0, 1000.0, 17.0]
2026-01-23 00:04:36,149 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (62.46) for latency DatasetOffice
2026-01-23 00:04:36,152 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 27 minutes, 34 seconds)
2026-01-23 00:06:11,093 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:06:14,486 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 57.58332 ± 91.705
2026-01-23 00:06:14,486 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [207.4971, 10.0012455, 174.84421, -2.2980788, 2.4493861, -4.8432035, 205.7, 20.740095, -15.825761, -22.431807]
2026-01-23 00:06:14,486 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 57.0, 657.0, 22.0, 19.0, 24.0, 1000.0, 31.0, 78.0, 135.0]
2026-01-23 00:06:14,492 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 27 minutes, 32 seconds)
2026-01-23 00:07:48,402 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:07:51,963 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 69.43494 ± 150.317
2026-01-23 00:07:51,963 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [-1.7742836, 404.92017, 6.778002, 17.641804, -15.283216, -80.01272, 295.29526, -44.331722, 118.28277, -7.1666136]
2026-01-23 00:07:51,963 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [41.0, 1000.0, 58.0, 104.0, 32.0, 124.0, 1000.0, 127.0, 561.0, 146.0]
2026-01-23 00:07:51,963 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (69.43) for latency DatasetOffice
2026-01-23 00:07:51,969 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 24 minutes, 40 seconds)
2026-01-23 00:09:29,050 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:09:31,302 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 37.24273 ± 127.900
2026-01-23 00:09:31,302 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [-40.075512, 22.363243, -63.14298, 6.9003415, 16.303467, -21.163929, 12.736745, -0.4186006, 26.96693, 411.9576]
2026-01-23 00:09:31,302 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [75.0, 272.0, 290.0, 58.0, 25.0, 126.0, 35.0, 43.0, 95.0, 1000.0]
2026-01-23 00:09:31,309 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 25 minutes, 51 seconds)
2026-01-23 00:11:08,070 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:11:12,180 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 60.60512 ± 111.442
2026-01-23 00:11:12,180 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [13.583896, 11.028002, -2.9218316, 295.29807, -21.299635, -33.18748, 139.28888, -6.274113, -17.71225, 228.2477]
2026-01-23 00:11:12,180 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [18.0, 38.0, 190.0, 1000.0, 134.0, 67.0, 1000.0, 43.0, 124.0, 1000.0]
2026-01-23 00:11:12,185 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 23 minutes, 26 seconds)
2026-01-23 00:12:47,868 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:12:52,318 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 157.90868 ± 175.071
2026-01-23 00:12:52,318 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [71.53279, 101.9566, 513.52606, -18.122019, 61.46677, 285.7762, 31.616745, 427.99338, 77.54002, 25.800142]
2026-01-23 00:12:52,318 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [121.0, 229.0, 1000.0, 81.0, 188.0, 1000.0, 48.0, 1000.0, 199.0, 46.0]
2026-01-23 00:12:52,318 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (157.91) for latency DatasetOffice
2026-01-23 00:12:52,322 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 23 minutes, 53 seconds)
2026-01-23 00:14:28,636 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:14:31,658 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 89.98051 ± 144.494
2026-01-23 00:14:31,658 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [11.442711, 8.639454, 21.197176, 26.30693, 22.416353, -11.767484, 65.27619, 351.86288, 4.8420014, 399.58896]
2026-01-23 00:14:31,658 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [75.0, 19.0, 45.0, 39.0, 56.0, 236.0, 186.0, 1000.0, 53.0, 1000.0]
2026-01-23 00:14:31,662 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 22 minutes, 31 seconds)
2026-01-23 00:16:01,102 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:16:04,669 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 53.42032 ± 133.137
2026-01-23 00:16:04,670 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [420.67432, 148.36288, -13.685958, -30.197353, -2.8487573, -19.992393, -45.95048, 48.54399, 0.3615108, 28.93542]
2026-01-23 00:16:04,670 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 53.0, 134.0, 301.0, 116.0, 181.0, 154.0, 129.0, 136.0]
2026-01-23 00:16:04,675 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 19 minutes, 35 seconds)
2026-01-23 00:17:42,204 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:17:46,439 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 156.31645 ± 197.306
2026-01-23 00:17:46,440 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [-20.507624, 4.8561745, 19.253479, 462.37735, 38.69122, 510.54153, 96.585, 6.777987, 66.278656, 378.31085]
2026-01-23 00:17:46,440 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [57.0, 83.0, 40.0, 1000.0, 114.0, 1000.0, 183.0, 49.0, 153.0, 1000.0]
2026-01-23 00:17:46,445 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 18 minutes, 38 seconds)
2026-01-23 00:19:21,603 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:19:25,220 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 97.68417 ± 168.252
2026-01-23 00:19:25,220 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [450.09546, 74.268234, 19.262691, 19.904066, -11.133555, 411.37805, 11.679314, -2.7965488, 10.159844, -5.9758425]
2026-01-23 00:19:25,221 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 555.0, 99.0, 106.0, 27.0, 1000.0, 33.0, 34.0, 227.0, 81.0]
2026-01-23 00:19:25,225 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 16 minutes, 24 seconds)
2026-01-23 00:20:58,225 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:21:06,980 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 354.37875 ± 198.804
2026-01-23 00:21:06,980 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [571.8639, 475.05896, 492.8352, 14.669271, 377.4387, 171.46231, 9.482414, 497.73175, 459.51498, 473.73013]
2026-01-23 00:21:06,980 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 91.0, 1000.0, 333.0, 15.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:21:06,980 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (354.38) for latency DatasetOffice
2026-01-23 00:21:06,984 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 15 minutes, 12 seconds)
2026-01-23 00:22:42,313 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:22:49,745 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 412.61017 ± 329.647
2026-01-23 00:22:49,745 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [537.6967, 697.9192, 75.49676, 6.1557374, 692.6329, 3.449663, 716.97815, -27.23788, 720.6136, 702.3968]
2026-01-23 00:22:49,745 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 181.0, 73.0, 1000.0, 20.0, 1000.0, 160.0, 1000.0, 1000.0]
2026-01-23 00:22:49,745 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (412.61) for latency DatasetOffice
2026-01-23 00:22:49,749 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 14 minutes, 29 seconds)
2026-01-23 00:24:28,604 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:24:36,953 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 369.08505 ± 256.493
2026-01-23 00:24:36,953 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [327.57547, 0.9767974, 655.2995, 507.4769, 401.0937, 653.9922, 606.2121, 5.398054, 13.617477, 519.2084]
2026-01-23 00:24:36,953 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 55.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 22.0, 134.0, 1000.0]
2026-01-23 00:24:36,959 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 16 minutes, 36 seconds)
2026-01-23 00:26:13,761 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:26:21,015 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 492.22900 ± 386.937
2026-01-23 00:26:21,016 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [604.05396, 52.947166, 16.07871, 14.174477, 881.0635, 746.65765, 857.4647, 825.87585, 894.4215, 29.552563]
2026-01-23 00:26:21,016 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 72.0, 31.0, 65.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 44.0]
2026-01-23 00:26:21,016 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (492.23) for latency DatasetOffice
2026-01-23 00:26:21,020 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 15 minutes, 30 seconds)
2026-01-23 00:27:58,493 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:28:08,098 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 560.15167 ± 291.475
2026-01-23 00:28:08,098 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [41.138054, 523.22955, 791.6369, 30.427729, 794.57117, 828.36304, 488.6202, 823.84534, 763.13245, 516.5523]
2026-01-23 00:28:08,098 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [105.0, 1000.0, 1000.0, 206.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:28:08,098 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (560.15) for latency DatasetOffice
2026-01-23 00:28:08,103 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 15 minutes, 56 seconds)
2026-01-23 00:29:43,075 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:29:54,767 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 575.41150 ± 37.515
2026-01-23 00:29:54,767 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [672.0702, 590.38055, 549.45636, 554.8509, 561.9046, 537.0027, 558.9604, 564.47894, 607.62274, 557.3881]
2026-01-23 00:29:54,767 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:29:54,767 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (575.41) for latency DatasetOffice
2026-01-23 00:29:54,774 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 15 minutes, 27 seconds)
2026-01-23 00:31:26,896 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:31:36,256 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 663.43890 ± 326.102
2026-01-23 00:31:36,257 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [827.8247, 852.6497, 820.7804, 756.6603, 26.14241, 841.45276, 0.31434134, 830.0906, 839.00165, 839.4728]
2026-01-23 00:31:36,257 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 56.0, 1000.0, 22.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:31:36,257 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (663.44) for latency DatasetOffice
2026-01-23 00:31:36,261 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 13 minutes, 22 seconds)
2026-01-23 00:33:06,759 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:33:16,196 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 651.97491 ± 335.439
2026-01-23 00:33:16,196 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [862.7566, 20.643293, 827.61444, 22.846601, 882.5471, 864.21686, 830.53284, 850.71484, 884.474, 473.4025]
2026-01-23 00:33:16,196 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 73.0, 1000.0, 66.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:33:16,202 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 9 minutes, 48 seconds)
2026-01-23 00:34:51,666 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:35:03,129 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 816.88251 ± 26.812
2026-01-23 00:35:03,130 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [831.4258, 838.4161, 857.2168, 847.5861, 825.1713, 813.9992, 773.2275, 780.93414, 808.39514, 792.4536]
2026-01-23 00:35:03,130 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:35:03,130 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (816.88) for latency DatasetOffice
2026-01-23 00:35:03,135 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 8 minutes, 47 seconds)
2026-01-23 00:36:37,612 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:36:47,028 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 678.71002 ± 337.991
2026-01-23 00:36:47,029 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [828.467, 862.4722, 874.2475, 846.4446, 28.631893, 837.14594, 843.6714, 868.20325, -20.860561, 818.6766]
2026-01-23 00:36:47,029 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 48.0, 1000.0, 1000.0, 1000.0, 144.0, 1000.0]
2026-01-23 00:36:47,035 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 6 minutes, 16 seconds)
2026-01-23 00:38:21,644 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:38:32,158 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 702.58606 ± 239.152
2026-01-23 00:38:32,158 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [520.0492, 780.7877, 862.9622, 762.4574, 39.132385, 776.71826, 807.7449, 853.7946, 816.366, 805.8478]
2026-01-23 00:38:32,158 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 53.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:38:32,163 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 4 minutes, 10 seconds)
2026-01-23 00:40:09,135 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:40:18,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 668.69073 ± 340.288
2026-01-23 00:40:18,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [3.9087214, 900.18115, -19.100992, 855.4064, 854.4962, 847.3698, 827.1596, 846.8717, 830.0525, 740.5623]
2026-01-23 00:40:18,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [37.0, 1000.0, 121.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:40:18,583 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 3 minutes, 36 seconds)
2026-01-23 00:41:54,700 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:42:05,180 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 675.24207 ± 253.765
2026-01-23 00:42:05,180 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [787.444, 831.7978, 814.4344, 803.8386, 602.82965, 448.89676, 830.4192, 797.8896, 831.2162, 3.654256]
2026-01-23 00:42:05,180 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 71.0]
2026-01-23 00:42:05,185 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 3 minutes, 25 seconds)
2026-01-23 00:43:38,682 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:43:47,924 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 635.75702 ± 328.350
2026-01-23 00:43:47,925 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [828.5275, 828.30774, 29.493393, -20.87106, 895.1714, 814.1579, 748.459, 548.772, 836.4858, 849.06665]
2026-01-23 00:43:47,925 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 81.0, 67.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:43:47,930 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 42 seconds)
2026-01-23 00:45:15,029 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:45:23,083 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 606.35901 ± 404.465
2026-01-23 00:45:23,083 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [2.4299512, -16.711395, 806.7449, 977.43805, 865.8494, 914.2187, 943.8813, 865.5093, 11.056404, 693.1735]
2026-01-23 00:45:23,083 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [47.0, 45.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 32.0, 1000.0]
2026-01-23 00:45:23,088 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 56 minutes, 58 seconds)
2026-01-23 00:46:57,722 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:47:09,242 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 792.24701 ± 132.053
2026-01-23 00:47:09,242 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [947.4429, 871.14404, 868.9279, 883.4775, 554.1391, 848.4961, 657.4523, 866.6871, 582.87054, 841.8329]
2026-01-23 00:47:09,242 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:47:09,247 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 55 minutes, 28 seconds)
2026-01-23 00:48:44,286 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:48:53,553 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 673.26697 ± 361.630
2026-01-23 00:48:53,553 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [887.83234, 632.24084, 999.1922, -34.59971, 24.25585, 913.4395, 823.3624, 963.4669, 922.1158, 601.364]
2026-01-23 00:48:53,553 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 96.0, 49.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:48:53,558 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 53 minutes, 17 seconds)
2026-01-23 00:50:29,494 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:50:40,077 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 825.10791 ± 252.120
2026-01-23 00:50:40,077 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [171.19502, 823.45544, 1051.8727, 861.02435, 941.9433, 756.4951, 1103.348, 995.1458, 660.8926, 885.7069]
2026-01-23 00:50:40,077 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [257.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:50:40,077 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (825.11) for latency DatasetOffice
2026-01-23 00:50:40,083 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 51 minutes, 33 seconds)
2026-01-23 00:52:12,101 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:52:22,486 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 700.15967 ± 233.751
2026-01-23 00:52:22,486 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [762.67224, 676.02124, 787.6764, 804.0725, 804.50995, 785.015, 7.1144013, 792.7594, 788.5231, 793.23315]
2026-01-23 00:52:22,486 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 25.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:52:22,492 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 49 minutes, 46 seconds)
2026-01-23 00:53:57,279 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:54:07,698 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 622.01514 ± 203.941
2026-01-23 00:54:07,698 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [698.6402, 711.0567, 727.06036, 604.5189, 26.126268, 712.2697, 592.6335, 721.70374, 703.99725, 722.1443]
2026-01-23 00:54:07,698 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 54.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:54:07,710 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 50 minutes, 10 seconds)
2026-01-23 00:55:41,251 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:55:51,628 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 684.76489 ± 225.460
2026-01-23 00:55:51,628 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [774.9775, 772.7449, 773.50183, 786.4352, 11.76299, 747.5698, 748.2298, 719.4508, 726.17004, 786.80634]
2026-01-23 00:55:51,628 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 38.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:55:51,635 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 47 minutes, 57 seconds)
2026-01-23 00:57:19,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:57:29,231 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 653.31927 ± 330.367
2026-01-23 00:57:29,231 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [733.1014, 655.8942, 953.8824, 700.91425, 812.73444, 45.789062, 988.3815, 793.2478, -0.5199238, 849.7676]
2026-01-23 00:57:29,231 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 60.0, 1000.0, 1000.0, 21.0, 1000.0]
2026-01-23 00:57:29,239 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 44 minutes, 51 seconds)
2026-01-23 00:59:03,768 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:59:14,068 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 758.85681 ± 268.281
2026-01-23 00:59:14,068 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [876.6219, 866.10767, -6.288662, 598.12396, 846.14874, 879.193, 878.5676, 891.55426, 884.4285, 874.11115]
2026-01-23 00:59:14,068 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 33.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:59:14,076 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 42 minutes, 47 seconds)
2026-01-23 01:00:47,553 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:00:57,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 855.33301 ± 295.866
2026-01-23 01:00:57,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1081.093, 951.8169, 1008.5367, 864.00244, 3.5904405, 869.2774, 978.76373, 1027.2864, 982.9269, 786.03564]
2026-01-23 01:00:57,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 50.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:00:57,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (855.33) for latency DatasetOffice
2026-01-23 01:00:57,814 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 41 minutes, 20 seconds)
2026-01-23 01:02:32,325 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:02:42,720 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 693.26111 ± 244.168
2026-01-23 01:02:42,720 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [820.4403, 684.93097, 587.4638, 940.2214, 848.5385, 823.3776, 603.0355, 33.502254, 761.1824, 829.9174]
2026-01-23 01:02:42,720 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 73.0, 1000.0, 1000.0]
2026-01-23 01:02:42,728 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 39 minutes, 34 seconds)
2026-01-23 01:04:16,317 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:04:26,736 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 823.59064 ± 289.337
2026-01-23 01:04:26,736 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [577.29443, 886.02374, 996.8364, 982.0043, 38.29405, 859.47107, 972.49646, 898.9163, 1027.3582, 997.2107]
2026-01-23 01:04:26,736 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 92.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:04:26,742 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 37 minutes, 52 seconds)
2026-01-23 01:06:00,315 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:06:10,680 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 827.63440 ± 290.943
2026-01-23 01:06:10,680 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [878.3909, 867.974, 996.4287, 934.1241, 851.9804, -12.540582, 1002.0695, 943.9572, 1047.3502, 766.6098]
2026-01-23 01:06:10,680 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 42.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:06:10,689 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 37 minutes, 20 seconds)
2026-01-23 01:07:45,159 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:07:55,680 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 693.06097 ± 258.245
2026-01-23 01:07:55,680 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [748.2481, 898.94965, 578.7945, 799.23425, 920.3795, 24.527046, 737.6329, 515.008, 934.81915, 773.01654]
2026-01-23 01:07:55,680 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 107.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:07:55,687 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 35 minutes, 37 seconds)
2026-01-23 01:09:29,843 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:09:40,145 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 875.92938 ± 295.901
2026-01-23 01:09:40,145 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [950.13556, 967.5115, 969.44507, 891.3521, 1038.2133, 944.5543, 998.6792, 1007.62225, -4.2994366, 996.08026]
2026-01-23 01:09:40,145 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 29.0, 1000.0]
2026-01-23 01:09:40,145 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (875.93) for latency DatasetOffice
2026-01-23 01:09:40,155 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 34 minutes, 1 second)
2026-01-23 01:11:14,264 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:11:25,215 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1081.12463 ± 84.271
2026-01-23 01:11:25,215 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1093.8748, 1116.1653, 1063.7198, 993.436, 1121.9584, 893.43787, 1179.78, 1120.5985, 1189.2028, 1039.0718]
2026-01-23 01:11:25,215 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 783.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:11:25,215 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (1081.12) for latency DatasetOffice
2026-01-23 01:11:25,223 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 32 minutes, 18 seconds)
2026-01-23 01:12:55,762 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:13:02,709 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 628.26270 ± 514.276
2026-01-23 01:13:02,709 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [862.0705, 16.046072, -13.506709, 1111.6923, 1104.9766, 1052.4308, 9.85026, 1.9101766, 1074.4786, 1062.6786]
2026-01-23 01:13:02,709 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 20.0, 53.0, 1000.0, 1000.0, 1000.0, 23.0, 18.0, 1000.0, 1000.0]
2026-01-23 01:13:02,716 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 29 minutes, 26 seconds)
2026-01-23 01:14:30,751 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:14:38,669 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 522.40771 ± 381.501
2026-01-23 01:14:38,669 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [544.30664, 578.07886, 967.55927, 870.2004, -1.1696577, 495.0846, 1054.7827, -0.923707, 6.978902, 709.1792]
2026-01-23 01:14:38,669 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 897.0, 30.0, 1000.0, 1000.0, 17.0, 19.0, 1000.0]
2026-01-23 01:14:38,676 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 26 minutes, 21 seconds)
2026-01-23 01:16:12,330 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:16:21,445 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 780.35217 ± 394.235
2026-01-23 01:16:21,445 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [950.91724, -1.3310077, 970.85333, 1064.4896, 830.386, 997.4244, 1023.1903, 930.7066, 4.305568, 1032.5797]
2026-01-23 01:16:21,445 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 25.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 37.0, 1000.0]
2026-01-23 01:16:21,454 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 24 minutes, 17 seconds)
2026-01-23 01:17:58,273 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:18:09,760 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 869.35236 ± 207.927
2026-01-23 01:18:09,760 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [913.9731, 995.2341, 866.5882, 850.91876, 323.51605, 1053.6561, 815.51, 802.2872, 937.531, 1134.3085]
2026-01-23 01:18:09,760 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:18:09,767 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 23 minutes, 14 seconds)
2026-01-23 01:19:41,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:19:51,416 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 840.72217 ± 294.119
2026-01-23 01:19:51,416 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [922.0874, -17.239227, 914.32745, 795.7928, 891.2176, 997.5208, 1023.6858, 981.9231, 1022.88043, 875.02625]
2026-01-23 01:19:51,416 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 44.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:19:51,423 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 20 minutes, 59 seconds)
2026-01-23 01:21:24,566 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:21:33,612 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 723.27399 ± 390.183
2026-01-23 01:21:33,612 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [882.2685, -12.198878, 847.9605, -37.24237, 1069.6167, 774.1492, 855.0252, 972.27344, 1122.2036, 758.68353]
2026-01-23 01:21:33,612 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [852.0, 54.0, 1000.0, 98.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:21:33,618 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 20 minutes, 2 seconds)
2026-01-23 01:23:06,992 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:23:15,302 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 774.39551 ± 478.226
2026-01-23 01:23:15,302 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [-60.23419, 1004.2482, 1130.583, 1151.693, 1016.9756, 211.52602, 10.206313, 1040.198, 1117.8839, 1120.8754]
2026-01-23 01:23:15,302 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [115.0, 1000.0, 1000.0, 1000.0, 1000.0, 246.0, 48.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:23:15,310 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 19 minutes, 13 seconds)
2026-01-23 01:24:51,649 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:25:01,842 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 902.84814 ± 303.937
2026-01-23 01:25:01,843 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [923.01636, 1014.7732, 1057.3405, 1043.9324, 1099.7683, 990.82, 884.5232, 981.7669, 1023.76715, 8.773923]
2026-01-23 01:25:01,843 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 36.0]
2026-01-23 01:25:01,849 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 18 minutes, 3 seconds)
2026-01-23 01:26:37,955 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:26:47,043 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 905.08679 ± 452.842
2026-01-23 01:26:47,043 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1153.2755, 1069.0043, 1190.1769, 1027.9386, 1.3421509, 1202.1079, 1125.6808, 1126.2196, 8.028051, 1147.0946]
2026-01-23 01:26:47,043 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 20.0, 1000.0, 1000.0, 1000.0, 52.0, 1000.0]
2026-01-23 01:26:47,052 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 15 minutes, 52 seconds)
2026-01-23 01:28:20,735 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:28:31,056 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1020.07697 ± 343.225
2026-01-23 01:28:31,056 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1132.9349, 1173.9731, 1169.156, 1141.7812, 1189.1442, 1106.2701, 1065.7349, 1143.5653, 1081.5808, -3.3723702]
2026-01-23 01:28:31,056 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 45.0]
2026-01-23 01:28:31,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 14 minutes, 28 seconds)
2026-01-23 01:30:05,116 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:30:15,292 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 920.86914 ± 316.957
2026-01-23 01:30:15,292 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [2.4494793, 969.3241, 1070.2771, 1033.332, 1104.9426, 1066.0826, 1074.0522, 1025.5403, 800.58875, 1062.1016]
2026-01-23 01:30:15,292 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [39.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:30:15,302 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 13 minutes, 2 seconds)
2026-01-23 01:31:44,885 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:31:55,703 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1012.82336 ± 169.595
2026-01-23 01:31:55,703 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1060.927, 1179.1534, 1096.685, 586.0343, 850.3983, 943.44183, 1128.2717, 1111.5509, 1127.8134, 1043.9569]
2026-01-23 01:31:55,703 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 533.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:31:55,710 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 11 minutes, 7 seconds)
2026-01-23 01:33:24,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:33:35,570 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 980.41125 ± 214.481
2026-01-23 01:33:35,570 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1172.0541, 1148.349, 1137.9758, 1208.5928, 1109.9695, 683.5485, 1069.2963, 735.67474, 937.2136, 601.4373]
2026-01-23 01:33:35,570 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 561.0]
2026-01-23 01:33:35,577 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 8 minutes, 29 seconds)
2026-01-23 01:35:12,455 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:35:23,765 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 985.63977 ± 225.554
2026-01-23 01:35:23,765 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [693.2192, 1103.0044, 1148.2769, 1055.7804, 1134.0887, 1091.0558, 1214.9607, 1164.37, 598.7275, 652.9145]
2026-01-23 01:35:23,765 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 974.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:35:23,774 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 7 minutes, 10 seconds)
2026-01-23 01:36:52,544 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:37:00,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 759.92664 ± 340.391
2026-01-23 01:37:00,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [732.1347, 981.99475, 80.187355, 223.08508, 1021.88, 694.4848, 698.1365, 1162.9109, 1032.1893, 972.263]
2026-01-23 01:37:00,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 116.0, 201.0, 1000.0, 643.0, 635.0, 1000.0, 1000.0, 905.0]
2026-01-23 01:37:01,000 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 4 minutes, 35 seconds)
2026-01-23 01:38:33,831 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:38:43,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 931.33313 ± 332.826
2026-01-23 01:38:43,703 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1110.2667, 1151.4613, 555.2667, 1100.154, 1141.6298, 801.5061, 1108.095, 1187.1576, 103.95706, 1053.8372]
2026-01-23 01:38:43,703 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 601.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 115.0, 1000.0]
2026-01-23 01:38:43,717 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 2 minutes, 42 seconds)
2026-01-23 01:40:22,859 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:40:32,287 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 977.45587 ± 374.047
2026-01-23 01:40:32,287 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [250.98357, 214.29947, 1141.1107, 1086.9104, 1138.7452, 1156.6234, 1202.6019, 1203.7332, 1187.4227, 1192.1279]
2026-01-23 01:40:32,288 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [197.0, 221.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:40:32,327 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 1 minute, 59 seconds)
2026-01-23 01:42:06,636 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:42:17,596 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 961.03601 ± 164.121
2026-01-23 01:42:17,596 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1082.2532, 849.0629, 787.60443, 978.3599, 1086.9084, 1158.1375, 881.76, 1083.0476, 1088.1897, 615.0369]
2026-01-23 01:42:17,596 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 791.0, 1000.0, 1000.0, 1000.0, 1000.0, 874.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:42:17,607 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 54 seconds)
2026-01-23 01:43:53,096 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:44:03,386 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1075.10376 ± 286.555
2026-01-23 01:44:03,387 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [217.51607, 1166.9033, 1157.8046, 1148.5173, 1140.5686, 1208.4199, 1159.6538, 1192.4382, 1169.3889, 1189.8264]
2026-01-23 01:44:03,387 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [213.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:44:03,397 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 67/100 (estimated time remaining: 58 minutes, 53 seconds)
2026-01-23 01:45:35,772 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:45:46,201 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1022.15558 ± 317.717
2026-01-23 01:45:46,201 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1124.1487, 188.63132, 1120.6158, 1212.9391, 1186.4835, 1203.201, 1210.0681, 1230.823, 679.1293, 1065.5162]
2026-01-23 01:45:46,201 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 175.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:45:46,212 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 68/100 (estimated time remaining: 57 minutes, 46 seconds)
2026-01-23 01:47:20,371 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:47:31,766 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1053.85254 ± 139.119
2026-01-23 01:47:31,766 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1150.4036, 971.792, 1050.2073, 1002.9406, 723.34607, 1153.0565, 995.1121, 1241.8164, 1183.7527, 1066.0977]
2026-01-23 01:47:31,766 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:47:31,775 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 69/100 (estimated time remaining: 56 minutes, 19 seconds)
2026-01-23 01:49:04,862 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:49:16,115 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1079.83203 ± 43.142
2026-01-23 01:49:16,115 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1098.5397, 1041.9542, 1026.1829, 1140.3414, 1068.6547, 1092.7825, 1105.3035, 1133.637, 1090.6803, 1000.2435]
2026-01-23 01:49:16,115 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:49:16,123 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 70/100 (estimated time remaining: 54 minutes, 7 seconds)
2026-01-23 01:50:49,714 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:51:00,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1044.42261 ± 159.842
2026-01-23 01:51:00,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1092.5608, 730.39886, 1165.0283, 1201.7688, 1118.6115, 1007.62555, 1175.5691, 1181.2144, 990.9026, 780.5463]
2026-01-23 01:51:00,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 885.0, 1000.0, 1000.0, 852.0, 1000.0]
2026-01-23 01:51:00,711 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 71/100 (estimated time remaining: 52 minutes, 18 seconds)
2026-01-23 01:52:30,312 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:52:41,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1084.85864 ± 115.790
2026-01-23 01:52:41,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1033.0886, 1174.0171, 989.80804, 1186.9902, 1213.433, 1205.2542, 854.43256, 1156.9974, 964.7523, 1069.812]
2026-01-23 01:52:41,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [902.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:52:41,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (1084.86) for latency DatasetOffice
2026-01-23 01:52:41,477 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 72/100 (estimated time remaining: 50 minutes, 4 seconds)
2026-01-23 01:54:12,696 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:54:22,082 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 936.41534 ± 348.284
2026-01-23 01:54:22,082 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [97.95764, 1209.3716, 1138.9075, 1051.6093, 434.1195, 1102.7405, 1004.70526, 1119.1531, 1040.1765, 1165.4135]
2026-01-23 01:54:22,082 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [110.0, 1000.0, 1000.0, 1000.0, 370.0, 1000.0, 1000.0, 1000.0, 898.0, 1000.0]
2026-01-23 01:54:22,091 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 73/100 (estimated time remaining: 48 minutes, 8 seconds)
2026-01-23 01:55:54,267 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:56:04,533 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1094.73145 ± 322.450
2026-01-23 01:56:04,533 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [143.6296, 1138.729, 1188.84, 1197.8062, 1225.215, 1256.9045, 1244.9169, 1282.7834, 1065.8383, 1202.6519]
2026-01-23 01:56:04,533 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [137.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:56:04,533 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (1094.73) for latency DatasetOffice
2026-01-23 01:56:04,542 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 74/100 (estimated time remaining: 46 minutes, 8 seconds)
2026-01-23 01:57:38,251 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:57:49,548 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1106.07581 ± 238.685
2026-01-23 01:57:49,548 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1165.5421, 1223.8284, 1207.0558, 399.2943, 1147.4752, 1239.8599, 1125.8091, 1230.4515, 1137.0061, 1184.4358]
2026-01-23 01:57:49,548 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:57:49,549 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (1106.08) for latency DatasetOffice
2026-01-23 01:57:49,557 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 75/100 (estimated time remaining: 44 minutes, 29 seconds)
2026-01-23 01:59:27,322 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:59:37,760 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 998.50830 ± 231.627
2026-01-23 01:59:37,760 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [822.3291, 722.37585, 1250.7053, 1089.9756, 1230.1301, 1121.3488, 515.18146, 1138.1593, 1171.1736, 923.7038]
2026-01-23 01:59:37,760 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 936.0, 1000.0, 1000.0, 417.0, 875.0, 1000.0, 1000.0]
2026-01-23 01:59:37,771 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 76/100 (estimated time remaining: 43 minutes, 5 seconds)
2026-01-23 02:01:08,015 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:01:18,126 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1054.61157 ± 316.128
2026-01-23 02:01:18,126 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1208.319, 1246.4978, 1118.8551, 1048.3541, 1248.8591, 1090.5226, 1206.5525, 1126.1885, 1126.0232, 125.94308]
2026-01-23 02:01:18,126 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 136.0]
2026-01-23 02:01:18,135 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 77/100 (estimated time remaining: 41 minutes, 19 seconds)
2026-01-23 02:02:55,276 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:03:05,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1039.20630 ± 274.722
2026-01-23 02:03:05,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [992.784, 1203.6301, 1264.4656, 1074.1604, 392.06927, 826.7961, 1258.8015, 1276.7959, 1280.0854, 822.47565]
2026-01-23 02:03:05,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 320.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:03:05,766 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 78/100 (estimated time remaining: 40 minutes, 8 seconds)
2026-01-23 02:04:42,863 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:04:53,417 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1049.90320 ± 164.232
2026-01-23 02:04:53,418 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1256.9299, 1115.0052, 988.5707, 744.2997, 879.46124, 1222.0753, 1168.9503, 992.5916, 909.3288, 1221.8184]
2026-01-23 02:04:53,418 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 804.0, 1000.0, 704.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:04:53,426 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 79/100 (estimated time remaining: 38 minutes, 47 seconds)
2026-01-23 02:06:19,610 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:06:30,611 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1152.43628 ± 134.083
2026-01-23 02:06:30,611 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1141.2609, 1242.035, 1261.1877, 1057.0282, 1247.132, 1150.6249, 1244.2594, 793.33844, 1215.3577, 1172.14]
2026-01-23 02:06:30,611 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 903.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:06:30,611 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (1152.44) for latency DatasetOffice
2026-01-23 02:06:30,620 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 80/100 (estimated time remaining: 36 minutes, 28 seconds)
2026-01-23 02:08:05,846 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:08:15,867 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 986.01837 ± 306.642
2026-01-23 02:08:15,867 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1236.2563, 1054.528, 1219.5559, 453.54758, 615.01715, 1284.6475, 1281.5236, 829.81476, 1252.975, 632.31793]
2026-01-23 02:08:15,867 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 366.0, 1000.0, 1000.0, 1000.0, 663.0, 1000.0, 1000.0]
2026-01-23 02:08:15,882 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 81/100 (estimated time remaining: 34 minutes, 32 seconds)
2026-01-23 02:09:46,275 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:09:56,145 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 850.70721 ± 261.107
2026-01-23 02:09:56,145 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1004.7939, 410.3669, 892.8876, 883.5824, 1254.4263, 449.32352, 808.15796, 1000.4409, 1133.6788, 669.41437]
2026-01-23 02:09:56,145 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 393.0, 1000.0, 1000.0, 1000.0, 373.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:09:56,154 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 82/100 (estimated time remaining: 32 minutes, 48 seconds)
2026-01-23 02:11:30,020 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:11:39,482 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 877.55969 ± 355.564
2026-01-23 02:11:39,482 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [678.6214, 1176.6731, 654.475, 0.93671376, 1196.8547, 1204.4802, 763.66583, 1036.4215, 925.9049, 1137.5635]
2026-01-23 02:11:39,482 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 16.0, 1000.0, 1000.0, 629.0, 841.0, 1000.0, 1000.0]
2026-01-23 02:11:39,496 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 83/100 (estimated time remaining: 30 minutes, 49 seconds)
2026-01-23 02:13:13,706 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:13:21,547 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 858.58575 ± 467.116
2026-01-23 02:13:21,547 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1294.9454, 1310.4167, 1237.5889, 284.1033, 1147.951, 1234.7012, 96.97974, 237.05885, 1117.9258, 624.1859]
2026-01-23 02:13:21,547 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 248.0, 1000.0, 1000.0, 96.0, 206.0, 1000.0, 572.0]
2026-01-23 02:13:21,559 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 84/100 (estimated time remaining: 28 minutes, 47 seconds)
2026-01-23 02:14:55,720 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:15:06,071 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 982.48535 ± 314.376
2026-01-23 02:15:06,071 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1250.85, 264.5876, 771.4748, 943.7762, 1270.598, 886.89764, 1246.7574, 1261.8671, 1202.9613, 725.0845]
2026-01-23 02:15:06,071 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 218.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:15:06,081 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 85/100 (estimated time remaining: 27 minutes, 29 seconds)
2026-01-23 02:16:39,085 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:16:49,881 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1207.03491 ± 124.326
2026-01-23 02:16:49,881 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1095.0775, 1306.4493, 1304.2711, 1354.5347, 1298.3594, 1246.0149, 1224.6045, 979.15826, 1247.6978, 1014.1829]
2026-01-23 02:16:49,881 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 845.0, 1000.0, 798.0]
2026-01-23 02:16:49,882 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1274 [INFO]: New best (1207.03) for latency DatasetOffice
2026-01-23 02:16:49,891 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 86/100 (estimated time remaining: 25 minutes, 42 seconds)
2026-01-23 02:18:23,109 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:18:34,395 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1159.56958 ± 199.016
2026-01-23 02:18:34,396 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1248.0248, 1224.5898, 1231.9082, 714.32166, 832.9068, 1249.7633, 1330.4254, 1175.5425, 1289.0503, 1299.163]
2026-01-23 02:18:34,396 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:18:34,408 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 87/100 (estimated time remaining: 24 minutes, 11 seconds)
2026-01-23 02:20:05,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:20:14,914 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1031.60095 ± 350.655
2026-01-23 02:20:14,914 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1278.0623, 1161.9038, 759.4971, 1284.5248, 917.6104, 313.0802, 1299.4146, 1337.5103, 1378.6903, 585.71594]
2026-01-23 02:20:14,914 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 798.0, 298.0, 1000.0, 1000.0, 1000.0, 475.0]
2026-01-23 02:20:14,924 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 88/100 (estimated time remaining: 22 minutes, 20 seconds)
2026-01-23 02:21:51,363 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:22:01,474 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1065.38245 ± 329.286
2026-01-23 02:22:01,475 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1220.6008, 929.3571, 1293.89, 1238.2799, 122.84607, 1172.6567, 1186.7006, 1197.6471, 1231.1311, 1060.7153]
2026-01-23 02:22:01,475 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 116.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:22:01,486 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 89/100 (estimated time remaining: 20 minutes, 47 seconds)
2026-01-23 02:23:34,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:23:43,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 959.63867 ± 388.037
2026-01-23 02:23:43,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1217.1041, 1272.3411, 298.46323, 936.2227, 901.6906, 1312.3906, 894.4818, 205.92673, 1277.3168, 1280.4492]
2026-01-23 02:23:43,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 235.0, 846.0, 1000.0, 1000.0, 717.0, 184.0, 1000.0, 1000.0]
2026-01-23 02:23:43,425 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 90/100 (estimated time remaining: 18 minutes, 58 seconds)
2026-01-23 02:25:13,726 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:25:23,087 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1080.89844 ± 383.802
2026-01-23 02:25:23,087 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1330.3615, 1305.6696, 1300.7463, 1264.8705, 1257.0594, 938.0066, 62.10348, 1342.0638, 770.9236, 1237.1792]
2026-01-23 02:25:23,087 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 735.0, 73.0, 1000.0, 597.0, 1000.0]
2026-01-23 02:25:23,100 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 91/100 (estimated time remaining: 17 minutes, 6 seconds)
2026-01-23 02:27:01,069 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:27:11,055 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1109.71216 ± 301.059
2026-01-23 02:27:11,055 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1312.4526, 978.07446, 1206.4913, 1286.0557, 1240.0956, 1205.5773, 244.01195, 1237.3179, 1221.0043, 1166.0408]
2026-01-23 02:27:11,055 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 793.0, 1000.0, 1000.0, 1000.0, 1000.0, 213.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:27:11,066 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 92/100 (estimated time remaining: 15 minutes, 29 seconds)
2026-01-23 02:28:42,478 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:28:53,548 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1201.18750 ± 111.538
2026-01-23 02:28:53,548 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1097.654, 1248.4904, 1151.7981, 1253.9701, 1257.4752, 1358.9594, 935.4117, 1233.8755, 1281.1533, 1193.0873]
2026-01-23 02:28:53,548 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 929.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:28:53,559 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 49 seconds)
2026-01-23 02:30:22,560 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:30:32,692 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1127.13855 ± 373.161
2026-01-23 02:30:32,692 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1329.5876, 150.65863, 715.04913, 1371.8635, 1404.659, 1232.1144, 1244.7422, 1286.1221, 1293.7067, 1242.8818]
2026-01-23 02:30:32,692 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 139.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:30:32,706 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 55 seconds)
2026-01-23 02:32:05,800 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:32:17,076 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1169.76636 ± 103.955
2026-01-23 02:32:17,076 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1163.5635, 944.28754, 1271.1753, 1033.0029, 1271.1017, 1251.6941, 1217.7594, 1227.7115, 1107.1459, 1210.223]
2026-01-23 02:32:17,076 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:32:17,089 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 16 seconds)
2026-01-23 02:33:49,343 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:34:00,140 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1144.99805 ± 58.579
2026-01-23 02:34:00,140 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1121.9028, 1151.7369, 1122.4077, 1093.5914, 1134.679, 1022.5951, 1228.0898, 1185.109, 1163.3632, 1226.5063]
2026-01-23 02:34:00,140 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 920.0, 1000.0, 860.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:34:00,150 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 37 seconds)
2026-01-23 02:35:33,262 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:35:43,461 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1121.86206 ± 292.315
2026-01-23 02:35:43,462 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1270.2842, 1099.5004, 1207.6073, 1235.9464, 1283.3113, 1160.0878, 1277.0165, 1227.1173, 1197.9318, 259.81714]
2026-01-23 02:35:43,462 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 232.0]
2026-01-23 02:35:43,473 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 49 seconds)
2026-01-23 02:37:13,555 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:37:22,577 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 953.73779 ± 341.036
2026-01-23 02:37:22,577 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1088.546, 1162.676, 1000.7934, 932.12616, 1205.7529, 172.2306, 437.32498, 1227.84, 1181.672, 1128.4164]
2026-01-23 02:37:22,577 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [911.0, 1000.0, 783.0, 1000.0, 1000.0, 161.0, 393.0, 1000.0, 1000.0, 952.0]
2026-01-23 02:37:22,590 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 5 seconds)
2026-01-23 02:38:56,779 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:39:05,972 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 987.96716 ± 372.509
2026-01-23 02:39:05,972 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [863.2201, 1179.1633, 708.84247, 1244.8956, 1392.197, 1239.6663, 1324.1658, 332.613, 379.28726, 1215.6208]
2026-01-23 02:39:05,972 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [731.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 277.0, 318.0, 1000.0]
2026-01-23 02:39:05,982 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 25 seconds)
2026-01-23 02:40:38,795 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:40:49,926 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 1158.99902 ± 222.371
2026-01-23 02:40:49,926 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [1299.8457, 1273.3972, 1240.2233, 1277.6752, 1134.9735, 1328.1677, 1233.0402, 724.399, 1346.2102, 732.05865]
2026-01-23 02:40:49,926 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 925.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:40:49,938 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1247 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 42 seconds)
2026-01-23 02:42:24,454 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:42:32,854 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1269 [DEBUG]: Total Reward: 884.14441 ± 375.017
2026-01-23 02:42:32,854 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1270 [DEBUG]: All rewards: [355.43848, 1225.4734, 1179.552, 370.09412, 329.22955, 1203.7864, 1266.059, 908.2492, 1194.9233, 808.6395]
2026-01-23 02:42:32,854 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1271 [DEBUG]: All trajectory lengths: [305.0, 1000.0, 1000.0, 325.0, 288.0, 1000.0, 1000.0, 1000.0, 1000.0, 661.0]
2026-01-23 02:42:32,866 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1299 [DEBUG]: Training session finished
