2026-01-25 17:02:42,221 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-halfcheetah/DatasetOffice-sac
2026-01-25 17:02:42,221 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-halfcheetah/DatasetOffice-sac
2026-01-25 17:02:42,221 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x14d18dcc0c90>}
2026-01-25 17:02:42,222 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1159 [DEBUG]: using device: cuda
2026-01-25 17:02:42,365 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1181 [INFO]: Creating new trainer
2026-01-25 17:02:42,383 baseline-sac-noisy-halfcheetah:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=17, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2026-01-25 17:02:42,383 baseline-sac-noisy-halfcheetah:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-25 17:02:42,986 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1242 [DEBUG]: Starting training session...
2026-01-25 17:02:42,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 1/100
2026-01-25 17:04:06,446 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:04:14,916 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: -212.79971 ± 63.729
2026-01-25 17:04:14,916 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [-233.846, -194.94437, -335.06543, -220.47678, -239.21828, -152.42442, -182.75848, -149.61646, -121.53561, -298.11108]
2026-01-25 17:04:14,916 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 17:04:14,916 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (-212.80) for latency DatasetOffice
2026-01-25 17:04:14,921 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 31 minutes, 41 seconds)
2026-01-25 17:05:43,016 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:05:51,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: -106.87643 ± 61.387
2026-01-25 17:05:51,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [-36.972633, -189.88023, -228.77002, -53.012985, -62.821823, -141.2684, -37.45165, -100.793686, -99.9629, -117.83]
2026-01-25 17:05:51,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 17:05:51,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (-106.88) for latency DatasetOffice
2026-01-25 17:05:51,673 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 34 minutes, 5 seconds)
2026-01-25 17:07:19,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:07:28,429 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: -136.39413 ± 82.998
2026-01-25 17:07:28,429 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [-263.52478, -113.27608, -197.62305, -215.28801, -49.369034, -75.231804, -37.511417, -207.16084, -185.59143, -19.36487]
2026-01-25 17:07:28,429 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 17:07:28,433 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 33 minutes, 49 seconds)
2026-01-25 17:08:56,708 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:09:05,243 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 59.72349 ± 51.219
2026-01-25 17:09:05,244 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [-26.093904, 65.12997, 24.942251, 104.87112, 140.63911, 139.69048, 40.11888, 24.784288, 27.688366, 55.46435]
2026-01-25 17:09:05,244 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 17:09:05,244 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (59.72) for latency DatasetOffice
2026-01-25 17:09:05,247 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 32 minutes, 54 seconds)
2026-01-25 17:10:33,381 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:10:41,886 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 98.37016 ± 69.051
2026-01-25 17:10:41,886 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [77.13899, 125.5695, 135.47253, 161.25534, 41.13583, 49.12773, 34.799057, 102.720924, 6.5584593, 249.9232]
2026-01-25 17:10:41,886 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 17:10:41,886 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (98.37) for latency DatasetOffice
2026-01-25 17:10:41,889 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 31 minutes, 39 seconds)
2026-01-25 17:12:10,087 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:12:18,624 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 94.63444 ± 126.065
2026-01-25 17:12:18,625 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [153.67645, 97.130486, 2.762715, 111.735855, 59.849533, 313.73923, 148.27286, 100.125885, 164.0181, -204.96664]
2026-01-25 17:12:18,625 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 17:12:18,630 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 31 minutes, 33 seconds)
2026-01-25 17:13:46,795 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:13:55,341 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 114.04231 ± 90.237
2026-01-25 17:13:55,342 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [159.13907, 173.70123, 134.49205, 80.84182, -61.325478, -44.498432, 130.24956, 162.66554, 210.31021, 194.84756]
2026-01-25 17:13:55,342 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 17:13:55,342 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (114.04) for latency DatasetOffice
2026-01-25 17:13:55,346 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 29 minutes, 56 seconds)
2026-01-25 17:15:23,721 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:15:32,373 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 214.96445 ± 57.785
2026-01-25 17:15:32,374 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [144.41907, 152.44104, 318.491, 204.08232, 199.19609, 171.20016, 290.57513, 284.11655, 191.43277, 193.69052]
2026-01-25 17:15:32,374 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 17:15:32,374 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (214.96) for latency DatasetOffice
2026-01-25 17:15:32,378 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 28 minutes, 24 seconds)
2026-01-25 17:17:00,706 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:17:09,290 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 305.32120 ± 95.147
2026-01-25 17:17:09,290 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [456.50925, 213.6895, 327.34396, 290.20032, 262.3845, 387.34412, 311.8976, 114.8372, 417.2799, 271.72546]
2026-01-25 17:17:09,290 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 17:17:09,290 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (305.32) for latency DatasetOffice
2026-01-25 17:17:09,294 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 26 minutes, 49 seconds)
2026-01-25 17:18:37,574 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:18:46,149 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 234.84224 ± 34.799
2026-01-25 17:18:46,149 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [256.643, 222.00066, 293.84317, 258.18964, 280.35446, 179.02225, 220.56145, 193.50826, 225.43661, 218.86282]
2026-01-25 17:18:46,149 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 17:18:46,153 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 25 minutes, 16 seconds)
2026-01-25 17:20:14,497 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:20:23,143 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 412.93710 ± 67.681
2026-01-25 17:20:23,143 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [422.06314, 415.69653, 225.49805, 412.70975, 479.70547, 479.62238, 447.35437, 432.40894, 402.27493, 412.0375]
2026-01-25 17:20:23,143 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 17:20:23,143 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (412.94) for latency DatasetOffice
2026-01-25 17:20:23,147 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 23 minutes, 44 seconds)
2026-01-25 17:21:51,387 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:22:00,082 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 435.31454 ± 58.194
2026-01-25 17:22:00,083 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [500.05972, 497.06305, 419.00348, 542.52234, 420.96646, 444.13922, 355.6961, 360.28705, 396.41092, 416.99692]
2026-01-25 17:22:00,083 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 17:22:00,083 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (435.31) for latency DatasetOffice
2026-01-25 17:22:00,088 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 22 minutes, 11 seconds)
2026-01-25 17:23:28,296 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:23:36,872 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 535.65253 ± 49.352
2026-01-25 17:23:36,872 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [517.8759, 567.81476, 520.2728, 560.8949, 454.54114, 468.3985, 531.248, 576.68854, 526.95807, 631.8328]
2026-01-25 17:23:36,872 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 17:23:36,872 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (535.65) for latency DatasetOffice
2026-01-25 17:23:36,876 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 20 minutes, 30 seconds)
2026-01-25 17:25:05,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:25:13,639 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 635.95648 ± 68.629
2026-01-25 17:25:13,639 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [569.1807, 703.74133, 554.2596, 621.71954, 554.082, 688.3341, 748.4616, 575.0142, 635.66626, 709.1061]
2026-01-25 17:25:13,639 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 17:25:13,639 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (635.96) for latency DatasetOffice
2026-01-25 17:25:13,643 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 18 minutes, 50 seconds)
2026-01-25 17:26:41,850 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:26:50,281 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 708.97198 ± 139.064
2026-01-25 17:26:50,281 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [737.104, 828.2739, 905.2664, 629.5032, 682.0148, 403.47794, 671.21405, 837.57983, 800.69714, 594.5886]
2026-01-25 17:26:50,281 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 17:26:50,281 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (708.97) for latency DatasetOffice
2026-01-25 17:26:50,285 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 17 minutes, 10 seconds)
2026-01-25 17:28:18,539 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:28:26,995 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 633.62958 ± 302.059
2026-01-25 17:28:26,996 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [831.09863, -58.872894, 463.64334, 788.7439, 902.3449, 933.8438, 879.57874, 766.38617, 366.6497, 462.8792]
2026-01-25 17:28:26,996 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 17:28:26,999 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 15 minutes, 28 seconds)
2026-01-25 17:29:55,230 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:30:03,842 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 813.79419 ± 76.034
2026-01-25 17:30:03,842 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [863.90436, 824.23456, 790.4143, 951.9441, 690.5196, 752.5288, 859.84503, 894.59845, 777.7829, 732.1699]
2026-01-25 17:30:03,843 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 17:30:03,843 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (813.79) for latency DatasetOffice
2026-01-25 17:30:03,846 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 13 minutes, 50 seconds)
2026-01-25 17:31:32,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:31:40,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 942.35773 ± 133.173
2026-01-25 17:31:40,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [709.8634, 870.084, 982.72174, 1084.2206, 949.9726, 757.2731, 889.2059, 957.627, 1080.815, 1141.794]
2026-01-25 17:31:40,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 17:31:40,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (942.36) for latency DatasetOffice
2026-01-25 17:31:40,589 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 12 minutes, 12 seconds)
2026-01-25 17:33:08,750 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:33:17,249 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1076.77234 ± 67.298
2026-01-25 17:33:17,249 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [1247.3699, 1117.3468, 1104.7551, 1050.3857, 1005.2555, 1062.0214, 1045.8729, 1022.5929, 1094.6644, 1017.46014]
2026-01-25 17:33:17,249 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 17:33:17,249 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (1076.77) for latency DatasetOffice
2026-01-25 17:33:17,254 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 10 minutes, 34 seconds)
2026-01-25 17:34:45,527 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:34:54,086 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1262.74158 ± 171.003
2026-01-25 17:34:54,086 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [1123.9645, 1379.8768, 1354.8374, 1365.7788, 1213.972, 1441.9927, 857.0959, 1189.1558, 1249.1038, 1451.6385]
2026-01-25 17:34:54,086 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 17:34:54,086 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (1262.74) for latency DatasetOffice
2026-01-25 17:34:54,090 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 9 minutes)
2026-01-25 17:36:22,280 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:36:30,863 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 627.65808 ± 450.008
2026-01-25 17:36:30,863 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [1088.2362, 161.4566, 511.92737, 1139.2703, 944.88245, 336.60074, 918.2658, 1158.8148, 39.991554, -22.86458]
2026-01-25 17:36:30,863 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 17:36:30,870 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 7 minutes, 25 seconds)
2026-01-25 17:37:59,197 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:38:07,733 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1049.11731 ± 380.762
2026-01-25 17:38:07,733 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [1025.2877, 1167.4281, 1277.4125, 1252.7627, 1299.5842, 1357.2585, 1243.8617, 1245.2405, 407.15582, 215.18077]
2026-01-25 17:38:07,733 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 17:38:07,741 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 5 minutes, 48 seconds)
2026-01-25 17:39:36,081 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:39:44,597 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1653.39197 ± 166.528
2026-01-25 17:39:44,597 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [1891.0143, 1476.1842, 1460.8118, 1937.548, 1584.0905, 1775.9972, 1465.7383, 1751.7426, 1586.2198, 1604.5728]
2026-01-25 17:39:44,597 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 17:39:44,597 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (1653.39) for latency DatasetOffice
2026-01-25 17:39:44,602 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 4 minutes, 13 seconds)
2026-01-25 17:41:13,008 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:41:21,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1487.15027 ± 341.695
2026-01-25 17:41:21,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [1960.6682, 1827.15, 1509.9822, 661.99274, 1554.0497, 1606.2803, 1378.0844, 1510.1956, 1198.0454, 1665.0558]
2026-01-25 17:41:21,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 17:41:21,551 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 2 minutes, 41 seconds)
2026-01-25 17:42:49,882 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:42:58,427 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1390.33655 ± 589.639
2026-01-25 17:42:58,427 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [831.7232, 688.414, 1920.3346, 1370.5159, 1869.3666, 1665.2491, 184.97546, 2072.2402, 1778.0133, 1522.5319]
2026-01-25 17:42:58,427 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 17:42:58,433 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 1 minute, 5 seconds)
2026-01-25 17:44:26,715 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:44:35,214 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 748.63513 ± 730.174
2026-01-25 17:44:35,214 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [-75.025185, 1616.0089, 315.0858, 1767.1547, 1539.1211, 17.829021, 94.341675, 1558.1576, 523.36395, 130.3138]
2026-01-25 17:44:35,214 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 17:44:35,221 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 27/100 (estimated time remaining: 1 hour, 59 minutes, 28 seconds)
2026-01-25 17:46:03,445 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:46:12,102 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1256.03735 ± 820.986
2026-01-25 17:46:12,102 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [1990.5222, 1359.9144, 1857.4442, 2330.8933, 85.88344, 461.4702, 868.7099, 1812.9338, -91.96656, 1884.5688]
2026-01-25 17:46:12,103 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 17:46:12,109 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 28/100 (estimated time remaining: 1 hour, 57 minutes, 51 seconds)
2026-01-25 17:47:40,324 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:47:48,857 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1865.29077 ± 299.895
2026-01-25 17:47:48,857 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [1973.911, 1985.9476, 2029.3069, 2280.9644, 1720.2854, 2217.4885, 1725.55, 1830.0017, 1721.9785, 1167.4744]
2026-01-25 17:47:48,857 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 17:47:48,857 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (1865.29) for latency DatasetOffice
2026-01-25 17:47:48,864 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 29/100 (estimated time remaining: 1 hour, 56 minutes, 13 seconds)
2026-01-25 17:49:17,140 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:49:25,732 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 997.22949 ± 589.104
2026-01-25 17:49:25,732 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [932.1056, 460.31265, 1432.6093, 1373.8586, 2063.6558, 757.62726, 67.859886, 1033.8341, 1548.2732, 302.1593]
2026-01-25 17:49:25,732 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 17:49:25,738 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 30/100 (estimated time remaining: 1 hour, 54 minutes, 35 seconds)
2026-01-25 17:50:53,984 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:51:02,457 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1287.34558 ± 933.282
2026-01-25 17:51:02,457 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2323.5457, 524.29645, 603.89325, 426.1564, 2170.8657, 2365.5264, 2017.0088, 2136.4175, 346.50037, -40.753654]
2026-01-25 17:51:02,457 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 17:51:02,463 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 52 minutes, 56 seconds)
2026-01-25 17:52:30,662 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:52:39,175 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1608.46289 ± 644.782
2026-01-25 17:52:39,175 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2383.5173, 1843.5795, 1185.8136, 1435.8677, 2059.3496, 1691.3081, 2084.3027, 2291.733, 264.54065, 844.6175]
2026-01-25 17:52:39,175 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 17:52:39,183 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 51 minutes, 18 seconds)
2026-01-25 17:54:07,416 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:54:15,908 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1158.88550 ± 870.251
2026-01-25 17:54:15,908 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [11.245645, 855.9699, 533.00604, 2046.591, 26.185228, 2240.4482, 1482.8062, 2199.6885, 1913.7727, 279.14114]
2026-01-25 17:54:15,908 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 17:54:15,915 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 49 minutes, 39 seconds)
2026-01-25 17:55:44,155 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:55:52,674 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1755.17310 ± 841.587
2026-01-25 17:55:52,674 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [239.29198, 1219.5653, 200.39688, 2456.9246, 1915.611, 2296.1038, 2276.0066, 2498.781, 2174.6182, 2274.4307]
2026-01-25 17:55:52,674 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 17:55:52,680 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 48 minutes, 3 seconds)
2026-01-25 17:57:20,986 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:57:29,525 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1636.34302 ± 764.019
2026-01-25 17:57:29,525 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [1795.2722, 2057.6323, 235.16713, 1908.0867, 135.65479, 2414.6914, 1490.3586, 2163.315, 2229.791, 1933.4615]
2026-01-25 17:57:29,525 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 17:57:29,530 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 46 minutes, 26 seconds)
2026-01-25 17:58:57,731 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:59:06,338 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2191.44873 ± 380.722
2026-01-25 17:59:06,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2341.9958, 2449.413, 2399.078, 2326.256, 2363.5615, 2506.7104, 2452.3643, 2175.4595, 1421.1715, 1478.4758]
2026-01-25 17:59:06,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 17:59:06,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (2191.45) for latency DatasetOffice
2026-01-25 17:59:06,344 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 44 minutes, 50 seconds)
2026-01-25 18:00:34,617 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:00:43,262 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1733.02124 ± 889.745
2026-01-25 18:00:43,263 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2313.1995, 2304.9888, 520.34766, 2508.019, 2172.8665, 2527.115, 2037.7488, 2284.158, 210.70934, 451.0611]
2026-01-25 18:00:43,263 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:00:43,270 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 43 minutes, 16 seconds)
2026-01-25 18:02:11,526 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:02:19,931 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2490.72241 ± 100.245
2026-01-25 18:02:19,931 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2662.9985, 2536.603, 2488.0479, 2453.0845, 2543.1562, 2543.3237, 2429.959, 2521.1838, 2476.7903, 2252.0762]
2026-01-25 18:02:19,931 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:02:19,931 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (2490.72) for latency DatasetOffice
2026-01-25 18:02:19,940 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 41 minutes, 38 seconds)
2026-01-25 18:03:48,391 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:03:56,927 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2036.33142 ± 730.108
2026-01-25 18:03:56,927 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2298.3086, 1877.0698, 2439.157, 2261.809, 2202.6702, 2535.0295, 2121.3286, 2325.6294, -91.646194, 2393.9585]
2026-01-25 18:03:56,927 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:03:56,934 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 40 minutes, 4 seconds)
2026-01-25 18:05:25,232 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:05:33,725 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2329.11621 ± 133.993
2026-01-25 18:05:33,725 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2444.177, 2368.3123, 2309.0288, 2253.0684, 2546.5225, 2372.5925, 2370.0925, 2345.471, 2005.5527, 2276.3442]
2026-01-25 18:05:33,725 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:05:33,733 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 38 minutes, 27 seconds)
2026-01-25 18:07:02,051 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:07:10,574 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2349.80811 ± 215.943
2026-01-25 18:07:10,574 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2516.4795, 2298.844, 2411.3604, 2370.9587, 2375.6445, 2408.1711, 2462.1863, 2419.2163, 1728.7992, 2506.4216]
2026-01-25 18:07:10,574 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:07:10,580 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 36 minutes, 50 seconds)
2026-01-25 18:08:38,789 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:08:47,335 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1912.09509 ± 826.337
2026-01-25 18:08:47,335 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2294.6064, 2398.2634, 546.605, 2366.8765, 2507.3718, 2511.074, 2317.4656, 2496.9111, 185.20517, 1496.5729]
2026-01-25 18:08:47,335 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:08:47,341 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 35 minutes, 12 seconds)
2026-01-25 18:10:15,678 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:10:24,155 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2238.88281 ± 662.793
2026-01-25 18:10:24,155 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2345.1191, 2389.9036, 2624.3545, 2571.9543, 2494.42, 2466.1182, 2435.5654, 2390.4119, 2405.1897, 265.79095]
2026-01-25 18:10:24,155 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:10:24,164 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 33 minutes, 36 seconds)
2026-01-25 18:11:52,550 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:12:01,206 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2384.30200 ± 107.245
2026-01-25 18:12:01,207 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2490.331, 2311.0579, 2396.9233, 2471.9092, 2331.4927, 2579.31, 2445.8557, 2233.514, 2236.826, 2345.8013]
2026-01-25 18:12:01,207 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:12:01,215 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 32 minutes)
2026-01-25 18:13:29,475 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:13:37,948 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2419.65674 ± 207.147
2026-01-25 18:13:37,948 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2531.1619, 2553.094, 1816.0074, 2508.2463, 2414.6099, 2477.7627, 2423.525, 2513.597, 2419.0002, 2539.5627]
2026-01-25 18:13:37,948 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:13:37,955 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 30 minutes, 23 seconds)
2026-01-25 18:15:06,298 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:15:14,799 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2433.42651 ± 219.091
2026-01-25 18:15:14,800 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2596.1545, 2426.9185, 2519.657, 2329.8174, 2554.5266, 2530.1255, 2544.4958, 2461.928, 1814.4865, 2556.1538]
2026-01-25 18:15:14,800 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:15:14,808 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 28 minutes, 46 seconds)
2026-01-25 18:16:43,057 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:16:51,592 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2022.76624 ± 784.252
2026-01-25 18:16:51,592 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2448.3499, 2371.8572, 569.2603, 2506.5427, 2486.8113, 2633.409, 2371.0105, 2429.1843, 2008.9426, 402.2921]
2026-01-25 18:16:51,592 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:16:51,601 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 27 minutes, 10 seconds)
2026-01-25 18:18:19,867 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:18:28,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2359.05640 ± 85.284
2026-01-25 18:18:28,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2330.1104, 2418.9985, 2500.0217, 2319.2837, 2428.0479, 2345.183, 2199.665, 2266.3892, 2445.3245, 2337.5405]
2026-01-25 18:18:28,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:18:28,356 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 25 minutes, 32 seconds)
2026-01-25 18:19:56,628 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:20:05,146 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2430.37354 ± 127.748
2026-01-25 18:20:05,146 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2538.823, 2164.715, 2519.839, 2589.106, 2344.1243, 2568.497, 2319.35, 2490.941, 2405.137, 2363.2031]
2026-01-25 18:20:05,146 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:20:05,152 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 23 minutes, 52 seconds)
2026-01-25 18:21:33,430 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:21:41,899 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2527.90454 ± 147.772
2026-01-25 18:21:41,899 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2660.0437, 2470.0388, 2513.4353, 2774.272, 2469.2063, 2649.7717, 2620.0962, 2220.6675, 2492.9204, 2408.5938]
2026-01-25 18:21:41,899 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:21:41,899 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (2527.90) for latency DatasetOffice
2026-01-25 18:21:41,907 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 22 minutes, 16 seconds)
2026-01-25 18:23:10,132 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:23:18,625 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2503.79834 ± 223.197
2026-01-25 18:23:18,625 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2680.46, 2609.991, 2612.0823, 2451.8274, 2474.1536, 2498.3477, 2642.6023, 2604.8206, 1869.244, 2594.4534]
2026-01-25 18:23:18,625 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:23:18,633 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 20 minutes, 38 seconds)
2026-01-25 18:24:46,849 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:24:55,423 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2303.66235 ± 654.204
2026-01-25 18:24:55,423 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2399.7102, 2756.4573, 472.40884, 2717.607, 2651.0798, 2546.4019, 2502.3572, 2770.5173, 2229.843, 1990.241]
2026-01-25 18:24:55,423 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:24:55,431 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 19 minutes, 1 second)
2026-01-25 18:26:23,685 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:26:32,165 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2572.69482 ± 89.970
2026-01-25 18:26:32,166 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2549.8923, 2700.289, 2655.9688, 2667.769, 2509.509, 2564.4045, 2400.3618, 2605.9824, 2609.4158, 2463.3542]
2026-01-25 18:26:32,166 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:26:32,166 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (2572.69) for latency DatasetOffice
2026-01-25 18:26:32,174 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 17 minutes, 24 seconds)
2026-01-25 18:28:00,314 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:28:08,952 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2458.20508 ± 459.887
2026-01-25 18:28:08,952 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2717.1863, 1157.8433, 2732.3425, 2762.1917, 2292.4495, 2802.97, 2528.2021, 2616.5825, 2410.3022, 2561.9795]
2026-01-25 18:28:08,952 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:28:08,961 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 15 minutes, 47 seconds)
2026-01-25 18:29:37,149 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:29:45,599 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2583.62085 ± 106.131
2026-01-25 18:29:45,600 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2648.638, 2730.4126, 2581.2302, 2646.7056, 2493.336, 2708.403, 2612.8435, 2354.6729, 2521.8818, 2538.0847]
2026-01-25 18:29:45,600 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:29:45,600 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (2583.62) for latency DatasetOffice
2026-01-25 18:29:45,607 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 14 minutes, 10 seconds)
2026-01-25 18:31:12,984 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:31:21,447 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2644.93872 ± 79.459
2026-01-25 18:31:21,447 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2762.6829, 2601.2942, 2716.129, 2677.2864, 2707.941, 2501.261, 2588.7786, 2603.2488, 2568.7341, 2722.0303]
2026-01-25 18:31:21,447 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:31:21,447 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (2644.94) for latency DatasetOffice
2026-01-25 18:31:21,454 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 12 minutes, 25 seconds)
2026-01-25 18:32:48,540 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:32:56,965 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2335.22290 ± 672.127
2026-01-25 18:32:56,965 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2670.0984, 2571.7517, 569.3687, 2740.8296, 2768.1765, 2774.9226, 2503.7107, 2847.8025, 2178.9219, 1726.6467]
2026-01-25 18:32:56,965 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:32:56,973 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 10 minutes, 37 seconds)
2026-01-25 18:34:23,908 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:34:32,268 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2533.81250 ± 87.908
2026-01-25 18:34:32,268 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2609.132, 2514.6624, 2610.595, 2568.83, 2535.2014, 2557.8079, 2359.8794, 2497.6917, 2668.3398, 2415.9854]
2026-01-25 18:34:32,268 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:34:32,276 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 8 minutes, 48 seconds)
2026-01-25 18:35:58,646 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:36:07,066 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2503.87134 ± 179.899
2026-01-25 18:36:07,066 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2646.4792, 2094.398, 2594.3635, 2669.1782, 2227.1318, 2631.692, 2534.549, 2527.4346, 2563.9558, 2549.5317]
2026-01-25 18:36:07,066 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:36:07,075 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 6 minutes, 56 seconds)
2026-01-25 18:37:33,515 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:37:42,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2510.75073 ± 316.347
2026-01-25 18:37:42,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2687.228, 2690.843, 1573.2953, 2576.1765, 2569.7156, 2539.9285, 2633.0432, 2636.4336, 2638.5803, 2562.2644]
2026-01-25 18:37:42,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:37:42,032 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 5 minutes, 6 seconds)
2026-01-25 18:39:08,172 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:39:16,500 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2611.72241 ± 137.630
2026-01-25 18:39:16,501 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2797.9229, 2573.7454, 2733.3484, 2551.0054, 2399.4216, 2636.744, 2619.4255, 2697.6873, 2360.3005, 2747.6228]
2026-01-25 18:39:16,501 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:39:16,509 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 3 minutes, 20 seconds)
2026-01-25 18:40:42,345 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:40:50,582 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2294.01685 ± 438.817
2026-01-25 18:40:50,582 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2503.7378, 2552.396, 1262.3741, 2640.8235, 2542.621, 2547.2393, 2569.585, 2539.7546, 2037.2922, 1744.3442]
2026-01-25 18:40:50,582 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:40:50,590 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 1 minute, 34 seconds)
2026-01-25 18:42:16,522 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:42:24,845 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2658.14233 ± 86.892
2026-01-25 18:42:24,845 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2771.3137, 2746.097, 2779.2607, 2553.901, 2726.808, 2647.0356, 2537.329, 2570.477, 2619.4844, 2629.7163]
2026-01-25 18:42:24,845 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:42:24,845 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (2658.14) for latency DatasetOffice
2026-01-25 18:42:24,854 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 63/100 (estimated time remaining: 59 minutes, 51 seconds)
2026-01-25 18:43:50,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:43:59,181 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2569.72119 ± 101.555
2026-01-25 18:43:59,182 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2660.0396, 2377.871, 2578.5825, 2635.8748, 2437.6316, 2728.606, 2580.371, 2628.9397, 2484.932, 2584.3657]
2026-01-25 18:43:59,182 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:43:59,192 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 64/100 (estimated time remaining: 58 minutes, 13 seconds)
2026-01-25 18:45:25,178 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:45:33,489 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2672.83447 ± 85.140
2026-01-25 18:45:33,489 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2810.4714, 2722.7168, 2655.6846, 2735.134, 2645.9546, 2741.1313, 2540.8423, 2598.5994, 2729.0664, 2548.7444]
2026-01-25 18:45:33,489 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:45:33,489 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (2672.83) for latency DatasetOffice
2026-01-25 18:45:33,499 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 65/100 (estimated time remaining: 56 minutes, 34 seconds)
2026-01-25 18:46:59,278 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:47:07,737 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2463.94043 ± 585.904
2026-01-25 18:47:07,738 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2629.4731, 2635.147, 2628.8914, 2617.1216, 2651.252, 2675.639, 2661.9941, 2788.399, 711.7586, 2639.7302]
2026-01-25 18:47:07,738 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:47:07,746 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 66/100 (estimated time remaining: 54 minutes, 58 seconds)
2026-01-25 18:48:33,666 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:48:41,982 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2485.96655 ± 419.333
2026-01-25 18:48:41,982 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2691.6646, 2817.2302, 1598.9631, 2829.1838, 2791.6736, 2812.8735, 2517.872, 2687.702, 2268.895, 1843.6091]
2026-01-25 18:48:41,982 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:48:41,991 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 67/100 (estimated time remaining: 53 minutes, 25 seconds)
2026-01-25 18:50:07,860 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:50:16,340 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2446.67773 ± 702.068
2026-01-25 18:50:16,340 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2691.1243, 2727.9102, 2725.355, 2732.5208, 2686.6306, 2672.4482, 2672.581, 2502.8113, 2706.466, 348.92706]
2026-01-25 18:50:16,340 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:50:16,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 68/100 (estimated time remaining: 51 minutes, 51 seconds)
2026-01-25 18:51:42,167 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:51:50,517 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2550.54346 ± 161.001
2026-01-25 18:51:50,517 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2665.8464, 2288.1484, 2688.7776, 2694.7795, 2249.9326, 2740.9954, 2586.0022, 2594.4604, 2467.7632, 2528.7273]
2026-01-25 18:51:50,517 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:51:50,526 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 69/100 (estimated time remaining: 50 minutes, 16 seconds)
2026-01-25 18:53:16,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:53:24,894 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2510.21875 ± 80.338
2026-01-25 18:53:24,894 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2656.32, 2521.1863, 2471.6865, 2574.7434, 2483.8948, 2516.5623, 2584.5337, 2355.0396, 2512.1882, 2426.0342]
2026-01-25 18:53:24,894 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:53:24,904 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 70/100 (estimated time remaining: 48 minutes, 42 seconds)
2026-01-25 18:54:50,929 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:54:59,394 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2585.08423 ± 184.346
2026-01-25 18:54:59,394 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2689.2212, 2684.8997, 2733.8755, 2156.2085, 2846.7168, 2655.5732, 2599.7034, 2531.7805, 2396.2227, 2556.6404]
2026-01-25 18:54:59,394 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:54:59,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 71/100 (estimated time remaining: 47 minutes, 9 seconds)
2026-01-25 18:56:25,248 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:56:33,617 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2127.71533 ± 894.819
2026-01-25 18:56:33,617 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2531.0134, 2694.649, 584.86694, 2722.0903, 2723.081, 2736.3103, 2427.2715, 2760.822, 1837.3743, 259.6718]
2026-01-25 18:56:33,617 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:56:33,627 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 72/100 (estimated time remaining: 45 minutes, 35 seconds)
2026-01-25 18:57:59,439 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:58:07,876 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2410.61768 ± 616.167
2026-01-25 18:58:07,877 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2731.4573, 2664.3071, 2709.771, 2770.6162, 2536.5415, 604.8163, 2653.5334, 2501.2686, 2639.9753, 2293.8887]
2026-01-25 18:58:07,877 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:58:07,885 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 73/100 (estimated time remaining: 44 minutes)
2026-01-25 18:59:33,768 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:59:42,106 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2680.78296 ± 78.112
2026-01-25 18:59:42,107 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2772.2124, 2627.3599, 2711.8525, 2711.0173, 2518.508, 2811.6804, 2644.9543, 2684.7126, 2627.361, 2698.1724]
2026-01-25 18:59:42,107 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 18:59:42,107 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (2680.78) for latency DatasetOffice
2026-01-25 18:59:42,116 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 74/100 (estimated time remaining: 42 minutes, 26 seconds)
2026-01-25 19:01:07,927 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:01:16,248 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2499.73462 ± 100.845
2026-01-25 19:01:16,248 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2484.4673, 2419.6665, 2285.2676, 2547.1177, 2398.4456, 2627.9492, 2613.283, 2502.5642, 2551.5732, 2567.0098]
2026-01-25 19:01:16,248 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:01:16,257 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 75/100 (estimated time remaining: 40 minutes, 51 seconds)
2026-01-25 19:02:42,115 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:02:50,484 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2680.33643 ± 253.292
2026-01-25 19:02:50,484 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2827.54, 2841.6956, 2733.4387, 2665.9668, 2807.553, 2799.888, 2691.32, 2719.8914, 1939.1555, 2776.913]
2026-01-25 19:02:50,484 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:02:50,494 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 76/100 (estimated time remaining: 39 minutes, 15 seconds)
2026-01-25 19:04:16,334 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:04:24,542 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2373.17896 ± 658.046
2026-01-25 19:04:24,542 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2747.9658, 2788.4873, 615.8564, 2808.046, 2615.3813, 2760.288, 2741.4434, 2674.9792, 2062.5754, 1916.7656]
2026-01-25 19:04:24,542 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:04:24,552 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 77/100 (estimated time remaining: 37 minutes, 40 seconds)
2026-01-25 19:05:50,341 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:05:58,804 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2559.91138 ± 703.192
2026-01-25 19:05:58,804 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2864.9536, 2763.1343, 2804.4211, 2858.066, 2709.6616, 2722.4827, 2871.2463, 2699.378, 2846.8552, 458.9135]
2026-01-25 19:05:58,804 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:05:58,814 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 78/100 (estimated time remaining: 36 minutes, 6 seconds)
2026-01-25 19:07:24,562 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:07:32,902 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2652.58252 ± 110.753
2026-01-25 19:07:32,902 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2746.8328, 2596.594, 2552.3584, 2798.5574, 2422.5151, 2661.0432, 2751.745, 2748.3547, 2570.8103, 2677.0151]
2026-01-25 19:07:32,902 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:07:32,912 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 79/100 (estimated time remaining: 34 minutes, 31 seconds)
2026-01-25 19:08:58,775 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:09:07,222 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2653.59473 ± 59.927
2026-01-25 19:09:07,222 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2665.2693, 2719.571, 2655.699, 2605.1128, 2705.2053, 2687.7556, 2668.5825, 2496.486, 2663.3364, 2668.93]
2026-01-25 19:09:07,222 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:09:07,232 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 80/100 (estimated time remaining: 32 minutes, 58 seconds)
2026-01-25 19:10:33,111 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:10:41,499 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2634.96143 ± 242.563
2026-01-25 19:10:41,499 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2797.6423, 2747.371, 2439.4958, 2720.8245, 2764.204, 2757.302, 2704.5024, 2617.7847, 1977.9277, 2822.5596]
2026-01-25 19:10:41,499 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:10:41,508 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 81/100 (estimated time remaining: 31 minutes, 24 seconds)
2026-01-25 19:12:07,082 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:12:15,399 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2246.48389 ± 816.008
2026-01-25 19:12:15,399 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2763.8145, 2852.0625, 424.16403, 2828.8025, 2727.9514, 2916.008, 1106.7471, 2732.8599, 2251.6003, 1860.8308]
2026-01-25 19:12:15,399 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:12:15,409 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 82/100 (estimated time remaining: 29 minutes, 49 seconds)
2026-01-25 19:13:40,996 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:13:49,397 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2573.65454 ± 103.707
2026-01-25 19:13:49,397 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2588.0493, 2509.624, 2634.873, 2686.5742, 2615.7537, 2564.0574, 2627.3406, 2506.517, 2685.164, 2318.5916]
2026-01-25 19:13:49,397 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:13:49,407 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 83/100 (estimated time remaining: 28 minutes, 14 seconds)
2026-01-25 19:15:15,136 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:15:23,483 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2633.83691 ± 160.475
2026-01-25 19:15:23,483 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2710.3313, 2456.8418, 2667.5547, 2803.994, 2285.6853, 2828.3792, 2760.266, 2696.0598, 2534.955, 2594.3005]
2026-01-25 19:15:23,483 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:15:23,492 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 84/100 (estimated time remaining: 26 minutes, 39 seconds)
2026-01-25 19:16:49,247 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:16:57,538 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2757.68066 ± 54.001
2026-01-25 19:16:57,538 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2838.2854, 2765.0322, 2783.2478, 2770.8308, 2628.3723, 2795.638, 2753.8242, 2751.0283, 2785.6182, 2704.9285]
2026-01-25 19:16:57,538 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:16:57,538 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (2757.68) for latency DatasetOffice
2026-01-25 19:16:57,547 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 85/100 (estimated time remaining: 25 minutes, 5 seconds)
2026-01-25 19:18:23,191 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:18:31,420 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2609.52734 ± 117.247
2026-01-25 19:18:31,420 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2722.1252, 2712.772, 2325.2717, 2501.5554, 2597.8064, 2666.207, 2694.286, 2573.6116, 2594.0798, 2707.5557]
2026-01-25 19:18:31,420 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:18:31,432 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 86/100 (estimated time remaining: 23 minutes, 29 seconds)
2026-01-25 19:19:57,152 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:20:05,438 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2158.45435 ± 807.947
2026-01-25 19:20:05,439 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2654.386, 2744.7107, 1233.938, 2594.9543, 2543.0728, 2771.5295, 2549.8606, 2609.9397, 217.18976, 1664.9634]
2026-01-25 19:20:05,439 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:20:05,450 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 87/100 (estimated time remaining: 21 minutes, 56 seconds)
2026-01-25 19:21:31,118 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:21:39,564 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2756.37744 ± 86.180
2026-01-25 19:21:39,565 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2802.9573, 2645.653, 2834.1658, 2857.8499, 2848.4734, 2816.857, 2745.8809, 2654.2927, 2745.123, 2612.5215]
2026-01-25 19:21:39,565 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:21:39,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 88/100 (estimated time remaining: 20 minutes, 22 seconds)
2026-01-25 19:23:05,218 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:23:13,516 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2693.48682 ± 176.001
2026-01-25 19:23:13,516 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2715.6738, 2601.186, 2838.1394, 2786.1062, 2254.1558, 2873.031, 2848.4248, 2728.5046, 2734.3542, 2555.293]
2026-01-25 19:23:13,516 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:23:13,526 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 89/100 (estimated time remaining: 18 minutes, 48 seconds)
2026-01-25 19:24:39,179 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:24:47,417 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2692.36279 ± 87.180
2026-01-25 19:24:47,417 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2764.91, 2819.1912, 2712.2932, 2631.5884, 2643.968, 2802.7283, 2654.1484, 2687.744, 2700.4238, 2506.6353]
2026-01-25 19:24:47,417 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:24:47,427 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 90/100 (estimated time remaining: 17 minutes, 13 seconds)
2026-01-25 19:26:12,928 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:26:21,237 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2774.48779 ± 83.814
2026-01-25 19:26:21,238 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2781.0293, 2873.6838, 2811.7927, 2736.97, 2837.7874, 2895.2388, 2732.6182, 2742.2063, 2586.2148, 2747.337]
2026-01-25 19:26:21,238 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:26:21,238 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (2774.49) for latency DatasetOffice
2026-01-25 19:26:21,249 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 91/100 (estimated time remaining: 15 minutes, 39 seconds)
2026-01-25 19:27:46,646 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:27:55,000 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2421.83643 ± 462.195
2026-01-25 19:27:55,000 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2672.931, 2858.3403, 1434.35, 2681.2085, 2767.7327, 2782.1738, 2636.8281, 2564.833, 1902.6714, 1917.2958]
2026-01-25 19:27:55,000 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:27:55,011 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 92/100 (estimated time remaining: 14 minutes, 5 seconds)
2026-01-25 19:29:20,305 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:29:28,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2719.60596 ± 88.566
2026-01-25 19:29:28,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2726.9587, 2741.0525, 2787.4832, 2742.033, 2700.8643, 2868.9346, 2669.3896, 2506.9482, 2760.3452, 2692.0493]
2026-01-25 19:29:28,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:29:28,934 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 93/100 (estimated time remaining: 12 minutes, 30 seconds)
2026-01-25 19:30:54,370 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:31:02,631 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2750.38770 ± 104.821
2026-01-25 19:31:02,631 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2871.8667, 2507.386, 2817.6863, 2816.5867, 2689.6167, 2842.8528, 2834.1868, 2667.2905, 2729.0566, 2727.3474]
2026-01-25 19:31:02,631 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:31:02,642 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 94/100 (estimated time remaining: 10 minutes, 56 seconds)
2026-01-25 19:32:28,247 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:32:36,529 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2656.08252 ± 140.564
2026-01-25 19:32:36,529 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2820.9443, 2811.61, 2669.1821, 2643.498, 2568.43, 2624.6685, 2806.4973, 2641.6377, 2317.8792, 2656.4763]
2026-01-25 19:32:36,529 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:32:36,540 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 95/100 (estimated time remaining: 9 minutes, 22 seconds)
2026-01-25 19:34:02,054 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:34:10,344 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2610.18506 ± 220.893
2026-01-25 19:34:10,344 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2732.822, 2691.0452, 2854.8315, 2663.296, 2535.3035, 2483.982, 2669.4795, 2620.803, 2031.4161, 2818.872]
2026-01-25 19:34:10,344 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:34:10,356 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 96/100 (estimated time remaining: 7 minutes, 49 seconds)
2026-01-25 19:35:35,764 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:35:44,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2228.84082 ± 802.983
2026-01-25 19:35:44,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [1902.9541, 2858.098, 1610.2621, 2748.1199, 2902.081, 2759.5483, 2581.6099, 2796.4226, 227.63298, 1901.6792]
2026-01-25 19:35:44,046 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:35:44,057 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 15 seconds)
2026-01-25 19:37:09,498 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:37:17,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2785.08813 ± 91.006
2026-01-25 19:37:17,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2717.2139, 2807.549, 2938.0771, 2902.7224, 2762.3713, 2662.8289, 2805.6697, 2747.839, 2856.1914, 2650.418]
2026-01-25 19:37:17,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:37:17,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (2785.09) for latency DatasetOffice
2026-01-25 19:37:17,809 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 98/100 (estimated time remaining: 4 minutes, 41 seconds)
2026-01-25 19:38:43,079 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:38:51,396 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2670.27734 ± 154.041
2026-01-25 19:38:51,396 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2733.3557, 2345.6958, 2770.7192, 2886.9348, 2518.0615, 2833.3113, 2584.0283, 2758.154, 2593.2556, 2679.2566]
2026-01-25 19:38:51,396 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:38:51,407 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 7 seconds)
2026-01-25 19:40:16,576 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:40:24,765 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2619.84009 ± 62.668
2026-01-25 19:40:24,765 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2659.5803, 2619.2783, 2624.9944, 2661.3147, 2694.7446, 2543.0166, 2721.2485, 2569.2244, 2587.984, 2517.0137]
2026-01-25 19:40:24,765 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:40:24,776 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 33 seconds)
2026-01-25 19:41:50,005 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:41:58,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2738.74951 ± 111.623
2026-01-25 19:41:58,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2848.9211, 2819.4038, 2573.659, 2806.2593, 2766.7249, 2779.583, 2808.362, 2816.6084, 2503.7334, 2664.243]
2026-01-25 19:41:58,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-25 19:41:58,239 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1299 [DEBUG]: Training session finished
