2026-01-23 01:04:29,535 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-hopper/DatasetOffice-sac-aug-mem1  
2026-01-23 01:04:29,535 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-hopper/DatasetOffice-sac-aug-mem1  
2026-01-23 01:04:29,535 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x151008b1ead0>}
2026-01-23 01:04:29,535 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1159 [DEBUG]: using device: cuda
2026-01-23 01:04:29,535 baseline-sac-noisy-hopper:77 [WARNING]: args.memorize_actions != args.horizon: 1 != 32
2026-01-23 01:04:29,677 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1181 [INFO]: Creating new trainer
2026-01-23 01:04:29,693 baseline-sac-noisy-hopper:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=14, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2026-01-23 01:04:29,693 baseline-sac-noisy-hopper:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-23 01:04:30,442 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1242 [DEBUG]: Starting training session...
2026-01-23 01:04:30,442 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 1/100
2026-01-23 01:06:04,231 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:06:04,634 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 93.08952 ± 16.888
2026-01-23 01:06:04,634 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [97.8004, 96.133995, 100.82424, 99.06325, 101.195786, 97.38446, 42.61393, 98.901764, 99.22474, 97.75255]
2026-01-23 01:06:04,634 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [52.0, 52.0, 53.0, 52.0, 53.0, 53.0, 25.0, 53.0, 53.0, 52.0]
2026-01-23 01:06:04,634 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (93.09) for latency DatasetOffice
2026-01-23 01:06:04,637 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 35 minutes, 25 seconds)
2026-01-23 01:07:42,053 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:07:42,736 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 173.94913 ± 15.216
2026-01-23 01:07:42,736 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [178.12724, 183.45645, 155.26329, 161.37004, 152.95186, 184.50842, 159.03438, 178.38246, 202.26727, 184.12984]
2026-01-23 01:07:42,736 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [86.0, 91.0, 77.0, 79.0, 76.0, 88.0, 80.0, 86.0, 97.0, 88.0]
2026-01-23 01:07:42,736 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (173.95) for latency DatasetOffice
2026-01-23 01:07:42,741 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 37 minutes, 2 seconds)
2026-01-23 01:09:23,720 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:09:24,582 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 76.11754 ± 31.963
2026-01-23 01:09:24,582 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [39.309143, 94.71311, 110.70065, 115.3682, 83.312904, 32.51035, 58.85703, 38.09076, 120.904274, 67.40895]
2026-01-23 01:09:24,582 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [49.0, 107.0, 122.0, 131.0, 99.0, 46.0, 73.0, 51.0, 132.0, 78.0]
2026-01-23 01:09:24,586 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 38 minutes, 30 seconds)
2026-01-23 01:11:09,836 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:11:10,756 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 187.12631 ± 32.870
2026-01-23 01:11:10,756 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [169.59708, 173.80324, 226.8899, 231.10956, 218.12315, 161.71585, 158.32056, 150.15668, 229.3372, 152.2099]
2026-01-23 01:11:10,756 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [107.0, 110.0, 137.0, 141.0, 130.0, 94.0, 104.0, 95.0, 138.0, 96.0]
2026-01-23 01:11:10,756 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (187.13) for latency DatasetOffice
2026-01-23 01:11:10,761 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 40 minutes, 7 seconds)
2026-01-23 01:12:54,772 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:13:00,139 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 564.63208 ± 378.385
2026-01-23 01:13:00,139 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [993.23584, 708.39056, 990.7295, 674.3236, 33.005466, 614.4155, 30.292927, 573.4644, 35.68635, 992.77637]
2026-01-23 01:13:00,139 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 738.0, 1000.0, 683.0, 33.0, 624.0, 35.0, 607.0, 46.0, 1000.0]
2026-01-23 01:13:00,139 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (564.63) for latency DatasetOffice
2026-01-23 01:13:00,143 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 41 minutes, 24 seconds)
2026-01-23 01:14:38,980 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:14:46,954 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 797.71454 ± 353.841
2026-01-23 01:14:46,954 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1018.4347, 230.72665, 1014.95355, 1023.4946, 1075.3876, 1022.00464, 1020.3136, 317.11893, 1025.9137, 228.79723]
2026-01-23 01:14:46,955 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 103.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 271.0, 1000.0, 106.0]
2026-01-23 01:14:46,955 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (797.71) for latency DatasetOffice
2026-01-23 01:14:46,960 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 43 minutes, 39 seconds)
2026-01-23 01:16:27,361 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:16:31,532 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 436.60449 ± 330.885
2026-01-23 01:16:31,533 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [856.95154, 211.7999, 42.47145, 1046.9683, 252.25992, 635.1232, 712.92145, 310.7301, 211.80345, 85.01564]
2026-01-23 01:16:31,533 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [824.0, 196.0, 38.0, 1000.0, 233.0, 492.0, 682.0, 286.0, 230.0, 87.0]
2026-01-23 01:16:31,536 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 43 minutes, 55 seconds)
2026-01-23 01:18:15,320 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:18:17,092 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 349.83267 ± 48.530
2026-01-23 01:18:17,093 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [414.03424, 371.4649, 323.40027, 371.20093, 356.72028, 357.82504, 396.71072, 270.26318, 258.41776, 378.28925]
2026-01-23 01:18:17,093 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [234.0, 195.0, 152.0, 198.0, 183.0, 189.0, 221.0, 125.0, 122.0, 197.0]
2026-01-23 01:18:17,098 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 43 minutes, 18 seconds)
2026-01-23 01:20:00,952 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:20:02,324 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 314.09839 ± 60.047
2026-01-23 01:20:02,324 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [349.77594, 336.6388, 330.97263, 357.58044, 267.26413, 327.4887, 340.3178, 332.81613, 148.49008, 349.63937]
2026-01-23 01:20:02,324 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [173.0, 132.0, 158.0, 178.0, 107.0, 129.0, 163.0, 130.0, 72.0, 172.0]
2026-01-23 01:20:02,329 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 41 minutes, 14 seconds)
2026-01-23 01:21:44,486 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:21:45,904 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 363.42468 ± 21.878
2026-01-23 01:21:45,904 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [326.6869, 337.35156, 367.31464, 387.29395, 371.3991, 398.67468, 344.6845, 383.73523, 363.79355, 353.3127]
2026-01-23 01:21:45,904 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [137.0, 140.0, 187.0, 198.0, 187.0, 208.0, 159.0, 195.0, 177.0, 164.0]
2026-01-23 01:21:45,909 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 37 minutes, 43 seconds)
2026-01-23 01:23:29,105 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:23:30,318 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 313.47430 ± 10.494
2026-01-23 01:23:30,318 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [311.64807, 308.33203, 331.50314, 317.17993, 304.36746, 315.53812, 299.97855, 308.3637, 332.66333, 305.16855]
2026-01-23 01:23:30,319 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [127.0, 126.0, 132.0, 126.0, 125.0, 127.0, 124.0, 126.0, 134.0, 125.0]
2026-01-23 01:23:30,323 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 35 minutes, 15 seconds)
2026-01-23 01:25:18,289 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:25:19,447 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 318.41324 ± 11.038
2026-01-23 01:25:19,447 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [308.7614, 325.7342, 323.52164, 323.95477, 330.31134, 324.2579, 326.64417, 300.06772, 298.03552, 322.84387]
2026-01-23 01:25:19,447 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [136.0, 144.0, 143.0, 136.0, 141.0, 137.0, 144.0, 134.0, 134.0, 141.0]
2026-01-23 01:25:19,452 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 34 minutes, 51 seconds)
2026-01-23 01:27:04,492 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:27:05,362 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 172.41037 ± 152.132
2026-01-23 01:27:05,363 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [43.23368, 394.8611, 323.1469, 64.983444, 120.04784, 218.13516, 26.053007, 27.621498, 64.8652, 441.1558]
2026-01-23 01:27:05,363 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [50.0, 181.0, 136.0, 38.0, 87.0, 99.0, 31.0, 30.0, 73.0, 179.0]
2026-01-23 01:27:05,367 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 33 minutes, 11 seconds)
2026-01-23 01:28:45,425 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:28:46,596 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 395.30560 ± 138.263
2026-01-23 01:28:46,596 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [496.4114, 430.0941, 501.19943, 365.73297, 406.711, 9.564105, 493.9217, 487.87598, 379.59985, 381.94547]
2026-01-23 01:28:46,596 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [174.0, 151.0, 175.0, 139.0, 160.0, 11.0, 172.0, 173.0, 144.0, 144.0]
2026-01-23 01:28:46,602 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 30 minutes, 17 seconds)
2026-01-23 01:30:28,759 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:30:29,936 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 402.81671 ± 35.356
2026-01-23 01:30:29,936 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [390.1902, 462.77365, 467.66846, 426.67188, 392.83585, 367.0692, 394.27286, 363.11856, 387.26416, 376.30237]
2026-01-23 01:30:29,936 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [143.0, 160.0, 163.0, 150.0, 142.0, 138.0, 144.0, 137.0, 143.0, 140.0]
2026-01-23 01:30:29,941 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 28 minutes, 28 seconds)
2026-01-23 01:32:13,884 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:32:15,253 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 477.66693 ± 32.302
2026-01-23 01:32:15,253 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [460.5429, 505.39124, 524.1139, 465.92184, 498.76712, 402.17792, 505.87067, 471.8783, 473.78302, 468.22214]
2026-01-23 01:32:15,253 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [167.0, 168.0, 187.0, 168.0, 176.0, 144.0, 180.0, 167.0, 168.0, 178.0]
2026-01-23 01:32:15,262 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 26 minutes, 58 seconds)
2026-01-23 01:33:59,314 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:34:00,996 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 490.19586 ± 150.747
2026-01-23 01:34:00,997 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [549.2706, 498.99142, 558.5653, 553.00775, 580.10913, 560.9376, 552.04706, 570.3635, 420.79266, 57.873306]
2026-01-23 01:34:00,997 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [195.0, 173.0, 191.0, 195.0, 197.0, 197.0, 193.0, 196.0, 162.0, 39.0]
2026-01-23 01:34:01,003 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 24 minutes, 17 seconds)
2026-01-23 01:35:43,878 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:35:45,324 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 526.01184 ± 48.665
2026-01-23 01:35:45,324 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [563.3539, 529.9735, 528.5882, 545.3868, 535.02246, 550.47186, 582.90643, 393.20612, 507.8421, 523.3671]
2026-01-23 01:35:45,324 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [191.0, 174.0, 177.0, 180.0, 179.0, 182.0, 193.0, 142.0, 169.0, 177.0]
2026-01-23 01:35:45,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 22 minutes, 7 seconds)
2026-01-23 01:37:26,024 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:37:27,663 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 492.69360 ± 60.877
2026-01-23 01:37:27,663 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [485.9701, 521.15076, 546.8283, 563.36847, 392.6397, 421.71075, 532.3036, 410.8483, 563.04865, 489.0673]
2026-01-23 01:37:27,663 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [166.0, 178.0, 184.0, 188.0, 144.0, 152.0, 181.0, 148.0, 188.0, 167.0]
2026-01-23 01:37:27,669 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 20 minutes, 41 seconds)
2026-01-23 01:39:10,642 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:39:12,271 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 647.63342 ± 59.556
2026-01-23 01:39:12,271 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [636.1234, 638.879, 611.6261, 694.9627, 651.6559, 618.30035, 610.86536, 807.3698, 598.5755, 607.9761]
2026-01-23 01:39:12,271 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [195.0, 195.0, 188.0, 209.0, 200.0, 190.0, 187.0, 237.0, 185.0, 189.0]
2026-01-23 01:39:12,277 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 19 minutes, 17 seconds)
2026-01-23 01:40:49,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:40:51,151 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 408.84717 ± 365.305
2026-01-23 01:40:51,151 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [754.7916, 713.3107, 713.3458, 804.17865, 866.6933, 134.43605, 30.031174, 16.068558, 30.321758, 25.294312]
2026-01-23 01:40:51,151 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [236.0, 231.0, 225.0, 266.0, 292.0, 68.0, 28.0, 14.0, 39.0, 21.0]
2026-01-23 01:40:51,156 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 15 minutes, 51 seconds)
2026-01-23 01:42:28,874 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:42:30,888 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 739.34564 ± 72.782
2026-01-23 01:42:30,888 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [833.36426, 704.3879, 706.3898, 682.9369, 676.0709, 707.4593, 684.158, 874.965, 687.1318, 836.59247]
2026-01-23 01:42:30,888 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [280.0, 239.0, 242.0, 213.0, 228.0, 240.0, 230.0, 280.0, 235.0, 267.0]
2026-01-23 01:42:30,893 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 12 minutes, 34 seconds)
2026-01-23 01:44:10,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:44:12,098 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 667.09875 ± 231.942
2026-01-23 01:44:12,098 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [703.07715, 721.11365, 704.4634, 680.54926, 304.2085, 711.22534, 982.64923, 699.2515, 205.70561, 958.7441]
2026-01-23 01:44:12,098 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [231.0, 234.0, 227.0, 220.0, 127.0, 232.0, 305.0, 228.0, 93.0, 302.0]
2026-01-23 01:44:12,103 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 10 minutes, 4 seconds)
2026-01-23 01:45:51,547 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:45:53,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 609.61432 ± 210.811
2026-01-23 01:45:53,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [703.51215, 671.81476, 658.89856, 663.1456, 677.2831, 604.11804, 885.3232, 524.7646, 31.48781, 675.79535]
2026-01-23 01:45:53,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [227.0, 220.0, 208.0, 212.0, 213.0, 194.0, 294.0, 180.0, 39.0, 220.0]
2026-01-23 01:45:53,474 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 8 minutes, 8 seconds)
2026-01-23 01:47:32,639 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:47:35,154 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 907.01868 ± 178.317
2026-01-23 01:47:35,155 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [773.4867, 1420.9058, 853.4032, 861.557, 847.7096, 815.3656, 892.66614, 873.90094, 780.5549, 950.6363]
2026-01-23 01:47:35,155 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [280.0, 481.0, 285.0, 305.0, 284.0, 260.0, 299.0, 293.0, 285.0, 324.0]
2026-01-23 01:47:35,155 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (907.02) for latency DatasetOffice
2026-01-23 01:47:35,160 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 5 minutes, 43 seconds)
2026-01-23 01:49:13,477 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:49:15,603 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 742.35901 ± 409.604
2026-01-23 01:49:15,603 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [101.03688, 898.6287, 387.19217, 1117.894, 27.99361, 1058.0629, 958.215, 1245.3147, 648.9757, 980.27655]
2026-01-23 01:49:15,603 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [59.0, 325.0, 149.0, 362.0, 24.0, 370.0, 308.0, 423.0, 225.0, 312.0]
2026-01-23 01:49:15,608 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 4 minutes, 25 seconds)
2026-01-23 01:50:57,913 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:51:00,290 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 782.34766 ± 46.113
2026-01-23 01:51:00,290 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [790.2524, 793.7713, 756.50446, 672.5542, 803.5759, 822.89966, 815.69855, 732.3505, 810.5749, 825.2947]
2026-01-23 01:51:00,290 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [252.0, 255.0, 241.0, 216.0, 255.0, 263.0, 260.0, 234.0, 256.0, 266.0]
2026-01-23 01:51:00,295 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 3 minutes, 57 seconds)
2026-01-23 01:52:42,205 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:52:44,709 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 807.45160 ± 136.021
2026-01-23 01:52:44,709 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [790.1104, 886.44165, 631.2375, 1099.5419, 875.1635, 716.3471, 832.9872, 589.25635, 837.2103, 816.2204]
2026-01-23 01:52:44,709 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [251.0, 287.0, 215.0, 369.0, 285.0, 230.0, 258.0, 183.0, 270.0, 264.0]
2026-01-23 01:52:44,716 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 3 minutes, 1 second)
2026-01-23 01:54:26,452 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:54:28,206 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 663.06073 ± 276.675
2026-01-23 01:54:28,206 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [631.3282, 29.465595, 307.15594, 847.3249, 864.40356, 870.72424, 536.3912, 842.3623, 858.5831, 842.8679]
2026-01-23 01:54:28,206 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [217.0, 32.0, 127.0, 268.0, 295.0, 303.0, 166.0, 272.0, 281.0, 271.0]
2026-01-23 01:54:28,211 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 1 minute, 49 seconds)
2026-01-23 01:56:10,112 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:56:12,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 991.02307 ± 393.821
2026-01-23 01:56:12,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1014.4414, 1574.583, 830.1742, 12.390369, 1404.2277, 933.6944, 1007.9134, 882.0497, 1169.9711, 1080.7847]
2026-01-23 01:56:12,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [346.0, 534.0, 284.0, 12.0, 443.0, 308.0, 341.0, 273.0, 397.0, 356.0]
2026-01-23 01:56:12,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (991.02) for latency DatasetOffice
2026-01-23 01:56:12,812 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 47 seconds)
2026-01-23 01:57:47,474 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:57:50,744 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1181.87732 ± 505.315
2026-01-23 01:57:50,744 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [888.18274, 2161.986, 1941.6608, 1091.4662, 1172.7941, 1395.7076, 353.25162, 970.052, 856.34045, 987.33154]
2026-01-23 01:57:50,744 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [305.0, 709.0, 646.0, 351.0, 416.0, 472.0, 127.0, 331.0, 265.0, 330.0]
2026-01-23 01:57:50,744 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (1181.88) for latency DatasetOffice
2026-01-23 01:57:50,750 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 58 minutes, 28 seconds)
2026-01-23 01:59:27,073 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:59:29,359 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 863.32306 ± 341.667
2026-01-23 01:59:29,359 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [958.7383, 1436.1284, 896.9903, 911.6617, 875.79236, 1026.0256, 387.37628, 1052.0173, 135.40544, 953.095]
2026-01-23 01:59:29,359 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [315.0, 461.0, 279.0, 305.0, 296.0, 338.0, 150.0, 339.0, 69.0, 311.0]
2026-01-23 01:59:29,365 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 55 minutes, 23 seconds)
2026-01-23 02:01:08,381 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:01:11,940 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1116.17615 ± 281.809
2026-01-23 02:01:11,940 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1103.0757, 991.8812, 1027.8369, 897.5164, 1426.7079, 1019.858, 991.39624, 962.6542, 896.70593, 1844.129]
2026-01-23 02:01:11,940 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [360.0, 330.0, 353.0, 314.0, 465.0, 342.0, 337.0, 326.0, 312.0, 584.0]
2026-01-23 02:01:11,945 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 53 minutes, 16 seconds)
2026-01-23 02:02:51,760 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:02:55,213 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1281.85962 ± 422.712
2026-01-23 02:02:55,213 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1313.0884, 681.7512, 1406.1035, 1198.9666, 1035.8512, 2135.7725, 781.6512, 1566.2113, 1719.4058, 979.7938]
2026-01-23 02:02:55,213 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [458.0, 243.0, 422.0, 383.0, 324.0, 663.0, 259.0, 506.0, 555.0, 294.0]
2026-01-23 02:02:55,213 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (1281.86) for latency DatasetOffice
2026-01-23 02:02:55,218 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 51 minutes, 32 seconds)
2026-01-23 02:04:33,031 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:04:35,446 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 996.67297 ± 150.660
2026-01-23 02:04:35,446 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1006.6123, 1246.6797, 962.22906, 980.97424, 1062.7789, 917.56726, 967.1563, 940.8913, 674.34875, 1207.4921]
2026-01-23 02:04:35,446 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [298.0, 373.0, 285.0, 289.0, 305.0, 269.0, 285.0, 278.0, 226.0, 351.0]
2026-01-23 02:04:35,452 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 48 minutes, 54 seconds)
2026-01-23 02:06:11,075 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:06:13,192 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 845.03027 ± 279.017
2026-01-23 02:06:13,192 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [955.8972, 914.5834, 923.8851, 891.27954, 917.9881, 932.1373, 864.67926, 489.27994, 230.38562, 1330.187]
2026-01-23 02:06:13,192 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [289.0, 274.0, 276.0, 263.0, 273.0, 277.0, 257.0, 174.0, 136.0, 382.0]
2026-01-23 02:06:13,198 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 47 minutes, 11 seconds)
2026-01-23 02:07:54,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:07:57,221 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1074.62903 ± 792.666
2026-01-23 02:07:57,221 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1658.931, 1855.6012, 796.99084, 1082.1212, 713.2109, 2748.685, 145.04674, 63.26865, 1244.7706, 437.66388]
2026-01-23 02:07:57,221 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [528.0, 587.0, 289.0, 378.0, 272.0, 1000.0, 71.0, 48.0, 413.0, 170.0]
2026-01-23 02:07:57,229 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 46 minutes, 39 seconds)
2026-01-23 02:09:35,472 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:09:39,193 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1450.44946 ± 660.360
2026-01-23 02:09:39,193 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1079.176, 1113.0914, 996.3629, 1410.9548, 1728.8972, 911.1619, 1951.9497, 1118.5509, 1016.71027, 3177.64]
2026-01-23 02:09:39,193 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [319.0, 330.0, 298.0, 417.0, 528.0, 265.0, 595.0, 327.0, 301.0, 1000.0]
2026-01-23 02:09:39,193 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (1450.45) for latency DatasetOffice
2026-01-23 02:09:39,199 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 44 minutes, 49 seconds)
2026-01-23 02:11:20,026 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:11:23,719 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1393.21912 ± 398.178
2026-01-23 02:11:23,719 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [526.8547, 1573.4718, 1753.2632, 960.7786, 1242.8063, 1286.5814, 1933.2172, 1410.2664, 1457.0315, 1787.9204]
2026-01-23 02:11:23,719 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [192.0, 502.0, 532.0, 312.0, 370.0, 394.0, 644.0, 445.0, 446.0, 574.0]
2026-01-23 02:11:23,726 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 43 minutes, 23 seconds)
2026-01-23 02:13:05,247 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:13:10,758 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2037.48657 ± 976.155
2026-01-23 02:13:10,758 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [475.14676, 2475.6284, 1514.0723, 1248.224, 2985.139, 3199.5312, 3143.7424, 1196.3668, 3021.769, 1115.2468]
2026-01-23 02:13:10,758 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [177.0, 768.0, 455.0, 372.0, 1000.0, 983.0, 1000.0, 356.0, 914.0, 330.0]
2026-01-23 02:13:10,758 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (2037.49) for latency DatasetOffice
2026-01-23 02:13:10,764 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 43 minutes, 3 seconds)
2026-01-23 02:14:51,566 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:14:54,909 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1318.92896 ± 763.562
2026-01-23 02:14:54,909 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1456.2374, 320.52847, 1519.7605, 3163.7346, 1394.5398, 1260.497, 1511.9613, 182.68364, 1182.1633, 1197.1831]
2026-01-23 02:14:54,909 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [419.0, 132.0, 454.0, 966.0, 405.0, 371.0, 446.0, 86.0, 346.0, 352.0]
2026-01-23 02:14:54,916 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 42 minutes, 36 seconds)
2026-01-23 02:16:42,019 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:16:45,073 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1261.83313 ± 497.515
2026-01-23 02:16:45,073 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [929.72894, 2551.503, 972.4778, 887.27484, 1094.011, 917.8044, 1426.4583, 1209.235, 1699.3822, 930.4565]
2026-01-23 02:16:45,073 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [272.0, 762.0, 286.0, 259.0, 318.0, 268.0, 415.0, 354.0, 502.0, 273.0]
2026-01-23 02:16:45,080 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 42 minutes, 3 seconds)
2026-01-23 02:18:20,571 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:18:23,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1065.63525 ± 108.847
2026-01-23 02:18:23,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1030.8369, 1213.1562, 1161.3885, 1052.9216, 943.54376, 1240.4434, 994.14996, 1041.6097, 1097.1368, 881.16473]
2026-01-23 02:18:23,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [308.0, 351.0, 339.0, 310.0, 276.0, 367.0, 296.0, 301.0, 321.0, 304.0]
2026-01-23 02:18:23,120 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 39 minutes, 32 seconds)
2026-01-23 02:20:10,052 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:20:13,422 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1242.51416 ± 1019.948
2026-01-23 02:20:13,422 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1846.4231, 2900.7612, 1105.8322, 25.796343, 3085.3975, 973.472, 968.75604, 309.72903, 74.59357, 1134.38]
2026-01-23 02:20:13,422 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [563.0, 881.0, 358.0, 21.0, 1000.0, 290.0, 293.0, 127.0, 53.0, 345.0]
2026-01-23 02:20:13,429 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 38 minutes, 52 seconds)
2026-01-23 02:21:56,754 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:22:00,949 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1384.80408 ± 983.725
2026-01-23 02:22:00,949 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1347.9844, 11.485256, 29.159733, 1516.8822, 2585.8232, 2695.945, 1963.5619, 272.8565, 1009.56793, 2414.774]
2026-01-23 02:22:00,949 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [432.0, 12.0, 24.0, 455.0, 779.0, 811.0, 574.0, 115.0, 304.0, 747.0]
2026-01-23 02:22:00,955 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 37 minutes, 12 seconds)
2026-01-23 02:23:34,025 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:23:37,628 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1462.18140 ± 740.567
2026-01-23 02:23:37,628 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1456.7952, 1249.4337, 963.0885, 936.43677, 641.26874, 3273.4026, 1236.3502, 1102.0393, 1419.5336, 2343.4646]
2026-01-23 02:23:37,628 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [437.0, 376.0, 286.0, 277.0, 200.0, 1000.0, 369.0, 327.0, 423.0, 708.0]
2026-01-23 02:23:37,634 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 34 minutes, 5 seconds)
2026-01-23 02:25:10,161 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:25:14,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1699.03101 ± 760.936
2026-01-23 02:25:14,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [958.1521, 1744.5419, 1399.9414, 3123.5513, 1112.4365, 1929.408, 3080.1414, 1484.836, 1090.9286, 1066.3723]
2026-01-23 02:25:14,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [292.0, 528.0, 426.0, 1000.0, 331.0, 606.0, 1000.0, 446.0, 329.0, 324.0]
2026-01-23 02:25:14,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 30 minutes, 1 second)
2026-01-23 02:26:44,879 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:26:47,993 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1176.55078 ± 659.762
2026-01-23 02:26:47,993 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [398.50052, 1175.3502, 938.37573, 968.35443, 2022.6742, 1085.1959, 569.0236, 2233.1475, 339.67303, 2035.2126]
2026-01-23 02:26:47,993 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [152.0, 389.0, 282.0, 294.0, 630.0, 329.0, 202.0, 700.0, 137.0, 648.0]
2026-01-23 02:26:47,999 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 27 minutes, 30 seconds)
2026-01-23 02:28:17,360 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:28:21,480 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1575.25269 ± 876.740
2026-01-23 02:28:21,480 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [997.521, 973.74023, 3047.8594, 1228.0942, 976.7015, 987.9286, 2532.529, 952.9517, 3096.7043, 958.49634]
2026-01-23 02:28:21,480 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [302.0, 294.0, 1000.0, 379.0, 301.0, 299.0, 781.0, 281.0, 1000.0, 284.0]
2026-01-23 02:28:21,488 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 22 minutes, 58 seconds)
2026-01-23 02:29:53,177 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:29:55,268 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 899.37909 ± 27.759
2026-01-23 02:29:55,269 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [900.03406, 895.0658, 916.9158, 868.5283, 888.679, 917.95557, 904.0345, 872.1976, 963.7032, 866.6777]
2026-01-23 02:29:55,269 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [267.0, 266.0, 271.0, 258.0, 264.0, 271.0, 267.0, 259.0, 285.0, 257.0]
2026-01-23 02:29:55,274 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 19 minutes, 3 seconds)
2026-01-23 02:31:28,327 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:31:31,507 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1280.30688 ± 524.366
2026-01-23 02:31:31,507 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [948.9728, 1211.9028, 955.043, 1415.1799, 2590.0154, 1149.0199, 1158.5029, 523.52966, 1698.3021, 1152.6003]
2026-01-23 02:31:31,507 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [281.0, 357.0, 286.0, 424.0, 784.0, 339.0, 345.0, 185.0, 525.0, 342.0]
2026-01-23 02:31:31,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 17 minutes, 24 seconds)
2026-01-23 02:33:03,364 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:33:07,504 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1599.17896 ± 549.242
2026-01-23 02:33:07,504 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1264.4955, 1405.4393, 1274.1469, 1273.4174, 1227.4388, 1208.5524, 2082.247, 3040.5825, 1444.2969, 1771.1731]
2026-01-23 02:33:07,504 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [387.0, 476.0, 391.0, 380.0, 379.0, 366.0, 647.0, 1000.0, 439.0, 551.0]
2026-01-23 02:33:07,511 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 15 minutes, 39 seconds)
2026-01-23 02:34:43,850 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:34:49,963 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2004.39709 ± 836.446
2026-01-23 02:34:49,963 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1768.6674, 3175.8345, 1018.775, 3169.3591, 1469.0535, 1784.435, 1128.119, 3244.1418, 1231.2847, 2054.2996]
2026-01-23 02:34:49,963 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [552.0, 1000.0, 303.0, 1000.0, 440.0, 549.0, 336.0, 1000.0, 375.0, 621.0]
2026-01-23 02:34:49,969 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 15 minutes, 30 seconds)
2026-01-23 02:36:22,260 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:36:24,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 917.40320 ± 33.750
2026-01-23 02:36:24,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [984.2807, 915.5133, 881.051, 936.5468, 941.6589, 855.2477, 934.5661, 896.2082, 911.826, 917.1342]
2026-01-23 02:36:24,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [295.0, 274.0, 261.0, 278.0, 280.0, 254.0, 277.0, 266.0, 272.0, 272.0]
2026-01-23 02:36:24,813 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 14 minutes, 6 seconds)
2026-01-23 02:37:57,061 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:38:03,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2314.28369 ± 1040.856
2026-01-23 02:38:03,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3049.6316, 704.9354, 363.9448, 3137.5642, 3125.0337, 2985.7214, 2010.7664, 3145.9666, 1494.3264, 3124.9478]
2026-01-23 02:38:03,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [938.0, 250.0, 141.0, 1000.0, 1000.0, 969.0, 610.0, 980.0, 510.0, 1000.0]
2026-01-23 02:38:03,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (2314.28) for latency DatasetOffice
2026-01-23 02:38:03,480 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 13 minutes, 13 seconds)
2026-01-23 02:39:36,794 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:39:39,466 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 959.25916 ± 45.552
2026-01-23 02:39:39,466 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [945.93933, 954.15424, 963.56476, 1067.5857, 874.6986, 952.66974, 988.9948, 933.26404, 958.7191, 953.0024]
2026-01-23 02:39:39,466 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [283.0, 287.0, 286.0, 318.0, 259.0, 283.0, 299.0, 276.0, 286.0, 285.0]
2026-01-23 02:39:39,472 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 11 minutes, 34 seconds)
2026-01-23 02:41:15,848 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:41:20,051 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1381.69446 ± 875.020
2026-01-23 02:41:20,051 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3097.56, 970.6341, 2936.301, 994.59894, 1597.8407, 262.1651, 914.7351, 1199.08, 885.15607, 958.8738]
2026-01-23 02:41:20,051 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [959.0, 299.0, 891.0, 299.0, 485.0, 132.0, 286.0, 372.0, 302.0, 292.0]
2026-01-23 02:41:20,059 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 10 minutes, 35 seconds)
2026-01-23 02:42:56,379 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:42:59,053 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 955.41815 ± 902.513
2026-01-23 02:42:59,053 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1134.8096, 990.8022, 1556.6526, 2927.2566, 20.296547, 1784.7678, 40.864437, 896.67535, 92.980095, 109.07579]
2026-01-23 02:42:59,053 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [392.0, 296.0, 467.0, 956.0, 20.0, 607.0, 48.0, 322.0, 53.0, 61.0]
2026-01-23 02:42:59,060 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 8 minutes, 28 seconds)
2026-01-23 02:44:26,964 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:44:30,726 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1274.61414 ± 653.697
2026-01-23 02:44:30,726 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [594.6052, 968.2144, 1230.4774, 1289.5797, 1320.4285, 3126.7495, 1308.9431, 948.334, 991.5542, 967.2547]
2026-01-23 02:44:30,726 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [204.0, 293.0, 369.0, 389.0, 399.0, 1000.0, 388.0, 283.0, 303.0, 290.0]
2026-01-23 02:44:30,735 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 6 minutes, 24 seconds)
2026-01-23 02:46:10,115 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:46:12,793 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1095.80103 ± 142.644
2026-01-23 02:46:12,793 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [971.80176, 1323.8848, 1104.0325, 1027.3774, 1323.6735, 1021.6495, 976.3332, 968.5141, 975.08093, 1265.6616]
2026-01-23 02:46:12,793 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [294.0, 403.0, 334.0, 308.0, 400.0, 310.0, 296.0, 291.0, 297.0, 388.0]
2026-01-23 02:46:12,801 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 5 minutes, 14 seconds)
2026-01-23 02:47:48,757 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:47:50,900 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 905.65790 ± 97.208
2026-01-23 02:47:50,900 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [874.0797, 998.54144, 935.9647, 942.1818, 957.6977, 916.7177, 900.9218, 632.31976, 974.3807, 923.77356]
2026-01-23 02:47:50,900 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [262.0, 301.0, 278.0, 281.0, 284.0, 274.0, 267.0, 196.0, 295.0, 276.0]
2026-01-23 02:47:50,908 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 3 minutes, 53 seconds)
2026-01-23 02:49:22,789 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:49:24,832 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 865.93604 ± 28.160
2026-01-23 02:49:24,832 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [899.01434, 873.52167, 867.4481, 850.6927, 827.952, 884.11194, 881.3428, 805.5736, 894.1723, 875.5305]
2026-01-23 02:49:24,832 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [278.0, 258.0, 257.0, 252.0, 247.0, 261.0, 260.0, 241.0, 267.0, 261.0]
2026-01-23 02:49:24,838 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 1 minute, 24 seconds)
2026-01-23 02:50:57,345 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:51:04,699 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2745.31445 ± 782.230
2026-01-23 02:51:04,699 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3204.9849, 2962.3037, 3231.5847, 3218.7422, 3170.0913, 3221.499, 939.8006, 3215.6343, 1516.1404, 2772.3623]
2026-01-23 02:51:04,699 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 891.0, 1000.0, 1000.0, 1000.0, 1000.0, 283.0, 1000.0, 457.0, 832.0]
2026-01-23 02:51:04,699 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (2745.31) for latency DatasetOffice
2026-01-23 02:51:04,706 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 64/100 (estimated time remaining: 59 minutes, 53 seconds)
2026-01-23 02:52:39,139 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:52:41,431 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 964.48083 ± 39.363
2026-01-23 02:52:41,431 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [876.9004, 968.95197, 990.8915, 992.8929, 930.0697, 948.1804, 1029.603, 965.52216, 987.1088, 954.68713]
2026-01-23 02:52:41,431 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [263.0, 291.0, 300.0, 301.0, 274.0, 281.0, 300.0, 294.0, 295.0, 283.0]
2026-01-23 02:52:41,438 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 65/100 (estimated time remaining: 58 minutes, 53 seconds)
2026-01-23 02:54:10,401 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:54:12,366 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 774.79260 ± 539.281
2026-01-23 02:54:12,366 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [162.28664, 40.684177, 34.69178, 1911.5822, 906.9691, 964.003, 918.1499, 971.1776, 922.93713, 915.44434]
2026-01-23 02:54:12,366 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [77.0, 44.0, 48.0, 582.0, 271.0, 285.0, 273.0, 287.0, 276.0, 272.0]
2026-01-23 02:54:12,375 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 66/100 (estimated time remaining: 55 minutes, 57 seconds)
2026-01-23 02:55:49,545 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:55:51,999 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 991.66632 ± 509.517
2026-01-23 02:55:51,999 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1236.8108, 554.0617, 990.7013, 1162.2465, 1031.592, 667.7745, 2139.3174, 1040.8167, 1043.3193, 50.023006]
2026-01-23 02:55:51,999 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [364.0, 193.0, 294.0, 339.0, 306.0, 222.0, 658.0, 307.0, 314.0, 36.0]
2026-01-23 02:55:52,009 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 67/100 (estimated time remaining: 54 minutes, 31 seconds)
2026-01-23 02:57:23,377 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:57:28,703 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1942.29260 ± 1088.614
2026-01-23 02:57:28,704 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3112.7275, 1179.1243, 11.172396, 1073.3086, 3176.6636, 3182.6409, 1392.8185, 1232.6099, 1877.7079, 3184.1526]
2026-01-23 02:57:28,704 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 357.0, 11.0, 340.0, 1000.0, 1000.0, 417.0, 367.0, 568.0, 1000.0]
2026-01-23 02:57:28,713 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 68/100 (estimated time remaining: 53 minutes, 13 seconds)
2026-01-23 02:59:08,920 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:59:11,857 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1189.95776 ± 339.321
2026-01-23 02:59:11,858 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [972.0149, 612.6631, 1050.9819, 1445.8896, 974.93176, 1046.2478, 1800.3586, 1094.5828, 1675.6696, 1226.2378]
2026-01-23 02:59:11,858 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [297.0, 214.0, 319.0, 432.0, 295.0, 318.0, 552.0, 343.0, 508.0, 361.0]
2026-01-23 02:59:11,866 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 69/100 (estimated time remaining: 51 minutes, 57 seconds)
2026-01-23 03:00:46,607 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:00:53,455 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2483.98560 ± 974.488
2026-01-23 03:00:53,455 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1014.20776, 3196.697, 3124.7224, 3096.1826, 2375.042, 391.24158, 2078.7515, 3173.2034, 3219.958, 3169.848]
2026-01-23 03:00:53,456 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [341.0, 1000.0, 1000.0, 1000.0, 747.0, 148.0, 632.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:00:53,463 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 70/100 (estimated time remaining: 50 minutes, 50 seconds)
2026-01-23 03:02:19,149 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:02:24,386 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1968.92603 ± 892.557
2026-01-23 03:02:24,386 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3174.5798, 2043.7122, 2334.7341, 3156.0742, 3202.9094, 1012.11774, 1000.0737, 1202.629, 1116.4856, 1445.9432]
2026-01-23 03:02:24,386 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 621.0, 710.0, 1000.0, 1000.0, 301.0, 301.0, 355.0, 334.0, 434.0]
2026-01-23 03:02:24,394 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 71/100 (estimated time remaining: 49 minutes, 12 seconds)
2026-01-23 03:04:02,403 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:04:05,917 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1362.97559 ± 670.657
2026-01-23 03:04:05,917 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [975.93835, 1881.9283, 2557.5366, 974.2511, 1011.24054, 1957.5468, 2023.4183, 248.6434, 797.87115, 1201.3804]
2026-01-23 03:04:05,917 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [290.0, 588.0, 789.0, 295.0, 303.0, 630.0, 659.0, 114.0, 269.0, 358.0]
2026-01-23 03:04:05,924 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 72/100 (estimated time remaining: 47 minutes, 44 seconds)
2026-01-23 03:05:38,890 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:05:41,939 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1091.18018 ± 1009.952
2026-01-23 03:05:41,939 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1191.4683, 985.48755, 3178.897, 2234.7664, 1864.4043, 1013.1215, 76.67896, 327.28348, 9.227292, 30.466948]
2026-01-23 03:05:41,939 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [345.0, 323.0, 1000.0, 708.0, 652.0, 360.0, 52.0, 140.0, 15.0, 34.0]
2026-01-23 03:05:41,947 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 73/100 (estimated time remaining: 46 minutes, 2 seconds)
2026-01-23 03:07:16,760 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:07:23,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2557.71924 ± 922.775
2026-01-23 03:07:23,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1195.2115, 1215.1797, 3210.6206, 3142.9246, 3201.8801, 3191.2393, 3088.438, 3122.7253, 1041.584, 3167.3906]
2026-01-23 03:07:23,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [352.0, 360.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 342.0, 1000.0]
2026-01-23 03:07:24,000 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 74/100 (estimated time remaining: 44 minutes, 17 seconds)
2026-01-23 03:08:55,578 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:08:58,405 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 991.39630 ± 351.303
2026-01-23 03:08:58,405 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [972.91754, 1030.1505, 1471.438, 26.24369, 1087.6091, 1099.6453, 1009.5426, 1049.9127, 1203.1051, 963.3976]
2026-01-23 03:08:58,405 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [285.0, 303.0, 473.0, 34.0, 320.0, 326.0, 301.0, 309.0, 352.0, 283.0]
2026-01-23 03:08:58,417 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 75/100 (estimated time remaining: 42 minutes, 1 second)
2026-01-23 03:10:34,419 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:10:38,934 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1632.55835 ± 1117.428
2026-01-23 03:10:38,934 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3068.6628, 1266.1272, 3004.4336, 1242.3955, 1104.52, 1076.0157, 3094.8657, 2363.5825, 30.44913, 74.53002]
2026-01-23 03:10:38,934 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 390.0, 926.0, 373.0, 332.0, 324.0, 1000.0, 751.0, 27.0, 65.0]
2026-01-23 03:10:38,941 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 76/100 (estimated time remaining: 41 minutes, 12 seconds)
2026-01-23 03:12:12,449 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:12:18,904 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2038.76245 ± 1009.038
2026-01-23 03:12:18,904 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1262.076, 1681.5983, 1795.9219, 2859.1697, 3226.4463, 3217.3455, 1625.7048, 14.56523, 1499.651, 3205.1475]
2026-01-23 03:12:18,904 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [385.0, 505.0, 587.0, 921.0, 1000.0, 1000.0, 522.0, 17.0, 459.0, 1000.0]
2026-01-23 03:12:18,918 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 77/100 (estimated time remaining: 39 minutes, 26 seconds)
2026-01-23 03:13:51,838 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:13:59,562 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2729.01440 ± 862.850
2026-01-23 03:13:59,562 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [473.39627, 3058.4062, 3170.7664, 3153.9873, 3160.0015, 3165.256, 3133.4531, 1721.7252, 3089.9739, 3163.1768]
2026-01-23 03:13:59,562 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [177.0, 970.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 561.0, 1000.0, 1000.0]
2026-01-23 03:13:59,570 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 78/100 (estimated time remaining: 38 minutes, 9 seconds)
2026-01-23 03:15:32,466 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:15:34,065 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 536.77844 ± 1011.014
2026-01-23 03:15:34,065 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3172.308, 1726.6793, 18.773355, 7.612177, 185.11404, 18.13854, 138.74281, 42.99368, 35.2943, 22.127851]
2026-01-23 03:15:34,065 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 560.0, 21.0, 10.0, 107.0, 18.0, 69.0, 47.0, 34.0, 22.0]
2026-01-23 03:15:34,076 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 79/100 (estimated time remaining: 35 minutes, 56 seconds)
2026-01-23 03:17:03,813 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:17:09,485 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2107.22510 ± 914.664
2026-01-23 03:17:09,485 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [2171.01, 3172.9038, 3150.0134, 2353.9395, 3173.7393, 902.55396, 1228.1849, 2063.3257, 2375.0886, 481.49167]
2026-01-23 03:17:09,485 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [702.0, 1000.0, 1000.0, 713.0, 993.0, 269.0, 400.0, 648.0, 774.0, 161.0]
2026-01-23 03:17:09,496 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 80/100 (estimated time remaining: 34 minutes, 22 seconds)
2026-01-23 03:18:44,758 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:18:53,174 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 3005.32861 ± 519.044
2026-01-23 03:18:53,174 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3188.3572, 3211.3828, 3202.781, 3216.8342, 3202.189, 3179.1245, 3206.5085, 2977.5115, 3207.1147, 1461.481]
2026-01-23 03:18:53,174 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 932.0, 1000.0, 476.0]
2026-01-23 03:18:53,174 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (3005.33) for latency DatasetOffice
2026-01-23 03:18:53,183 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 81/100 (estimated time remaining: 32 minutes, 56 seconds)
2026-01-23 03:20:31,509 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:20:37,441 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2110.11914 ± 1401.716
2026-01-23 03:20:37,441 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3179.3625, 3193.7065, 3178.7883, 3169.5164, 3174.1377, 3166.0828, 1899.9081, 53.076267, 45.308285, 41.304203]
2026-01-23 03:20:37,441 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 610.0, 38.0, 54.0, 24.0]
2026-01-23 03:20:37,451 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 82/100 (estimated time remaining: 31 minutes, 34 seconds)
2026-01-23 03:22:04,470 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:22:11,324 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2528.02002 ± 755.536
2026-01-23 03:22:11,324 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1203.5764, 1647.8175, 3107.7444, 3162.922, 2895.3809, 3210.1392, 1883.804, 3179.7144, 3200.9666, 1788.1322]
2026-01-23 03:22:11,324 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [352.0, 498.0, 1000.0, 1000.0, 877.0, 1000.0, 606.0, 1000.0, 1000.0, 550.0]
2026-01-23 03:22:11,332 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 83/100 (estimated time remaining: 29 minutes, 30 seconds)
2026-01-23 03:23:44,579 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:23:52,108 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2723.19482 ± 797.566
2026-01-23 03:23:52,109 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3171.2178, 3187.1343, 1809.6544, 3182.22, 2714.2832, 2881.77, 677.5166, 3207.348, 3202.8723, 3197.932]
2026-01-23 03:23:52,109 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 564.0, 1000.0, 835.0, 914.0, 214.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:23:52,119 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 84/100 (estimated time remaining: 28 minutes, 13 seconds)
2026-01-23 03:25:23,005 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:25:25,924 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 847.59357 ± 1242.494
2026-01-23 03:25:25,925 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1507.4243, 11.065342, 4.8382134, 73.2847, 31.541166, 209.61014, 12.178626, 265.40347, 3188.7605, 3171.8296]
2026-01-23 03:25:25,925 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [496.0, 14.0, 8.0, 72.0, 24.0, 185.0, 12.0, 116.0, 1000.0, 1000.0]
2026-01-23 03:25:25,933 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 85/100 (estimated time remaining: 26 minutes, 28 seconds)
2026-01-23 03:27:01,562 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:27:09,016 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2659.75366 ± 690.197
2026-01-23 03:27:09,016 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1839.6885, 3139.1414, 3188.2373, 1409.4364, 2464.8687, 1757.2603, 3198.7803, 3193.902, 3204.6528, 3201.5688]
2026-01-23 03:27:09,016 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [591.0, 1000.0, 1000.0, 480.0, 781.0, 550.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:27:09,024 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 86/100 (estimated time remaining: 24 minutes, 47 seconds)
2026-01-23 03:28:40,314 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:28:47,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2642.50830 ± 1056.615
2026-01-23 03:28:47,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3216.311, 3225.7727, 490.05832, 3192.3218, 3150.1626, 3208.4434, 574.0447, 3161.0947, 3171.2666, 3035.6067]
2026-01-23 03:28:47,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 183.0, 1000.0, 1000.0, 1000.0, 210.0, 1000.0, 1000.0, 917.0]
2026-01-23 03:28:47,595 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 87/100 (estimated time remaining: 22 minutes, 52 seconds)
2026-01-23 03:30:25,724 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:30:29,590 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1213.77844 ± 1257.801
2026-01-23 03:30:29,590 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3240.6345, 1654.7045, 3224.8198, 1918.1511, 1790.566, 27.655525, 32.411083, 56.95523, 45.46052, 146.42535]
2026-01-23 03:30:29,590 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 495.0, 1000.0, 569.0, 577.0, 23.0, 31.0, 58.0, 47.0, 78.0]
2026-01-23 03:30:29,600 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 88/100 (estimated time remaining: 21 minutes, 35 seconds)
2026-01-23 03:31:54,766 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:32:01,386 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2414.73877 ± 1073.909
2026-01-23 03:32:01,386 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3236.8328, 3210.9736, 954.24084, 2304.8054, 3231.6155, 1257.3875, 3213.1003, 357.9487, 3206.1511, 3174.331]
2026-01-23 03:32:01,386 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 301.0, 733.0, 1000.0, 385.0, 1000.0, 139.0, 1000.0, 1000.0]
2026-01-23 03:32:01,396 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 89/100 (estimated time remaining: 19 minutes, 34 seconds)
2026-01-23 03:33:38,747 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:33:43,621 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1783.37439 ± 1050.725
2026-01-23 03:33:43,621 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [6.2629952, 1493.2078, 1284.6863, 3174.759, 3199.8872, 2016.3796, 3174.3433, 778.455, 1697.2394, 1008.52344]
2026-01-23 03:33:43,621 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [9.0, 455.0, 396.0, 1000.0, 1000.0, 616.0, 1000.0, 232.0, 515.0, 346.0]
2026-01-23 03:33:43,630 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 90/100 (estimated time remaining: 18 minutes, 14 seconds)
2026-01-23 03:35:14,829 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:35:18,148 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1244.16565 ± 924.049
2026-01-23 03:35:18,148 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1593.7797, 1413.5273, 1361.143, 1369.4829, 1591.221, 3187.6694, 1745.5688, 82.61539, 43.93329, 52.71488]
2026-01-23 03:35:18,148 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [513.0, 423.0, 409.0, 410.0, 510.0, 1000.0, 554.0, 98.0, 43.0, 36.0]
2026-01-23 03:35:18,158 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 91/100 (estimated time remaining: 16 minutes, 18 seconds)
2026-01-23 03:36:48,955 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:36:55,186 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1982.56445 ± 1311.938
2026-01-23 03:36:55,186 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3208.108, 41.059715, 3254.479, 3237.654, 1341.9441, 329.32523, 3192.222, 436.04672, 1551.904, 3232.9016]
2026-01-23 03:36:55,186 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 28.0, 1000.0, 1000.0, 401.0, 133.0, 1000.0, 162.0, 479.0, 980.0]
2026-01-23 03:36:55,195 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 92/100 (estimated time remaining: 14 minutes, 37 seconds)
2026-01-23 03:38:32,204 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:38:36,144 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1543.68860 ± 923.804
2026-01-23 03:38:36,144 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1800.0286, 343.09454, 2238.7769, 1992.7437, 1753.6735, 1889.1204, 1764.8956, 3171.173, 454.7114, 28.66841]
2026-01-23 03:38:36,144 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [533.0, 134.0, 687.0, 587.0, 542.0, 557.0, 518.0, 944.0, 168.0, 24.0]
2026-01-23 03:38:36,153 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 93/100 (estimated time remaining: 12 minutes, 58 seconds)
2026-01-23 03:40:13,448 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:40:18,179 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1830.52661 ± 664.490
2026-01-23 03:40:18,179 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [2216.809, 916.9203, 1589.3855, 1516.8785, 2098.4675, 2286.3484, 2157.344, 1598.4609, 779.09064, 3145.559]
2026-01-23 03:40:18,179 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [656.0, 313.0, 505.0, 455.0, 619.0, 697.0, 645.0, 488.0, 231.0, 1000.0]
2026-01-23 03:40:18,189 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 35 seconds)
2026-01-23 03:41:54,620 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:42:01,651 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2577.36621 ± 715.931
2026-01-23 03:42:01,651 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3167.663, 3236.1143, 2441.3965, 1483.2054, 1216.7593, 2529.982, 2164.1196, 3200.7122, 3266.0237, 3067.6863]
2026-01-23 03:42:01,652 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 772.0, 475.0, 390.0, 769.0, 646.0, 1000.0, 1000.0, 912.0]
2026-01-23 03:42:01,661 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 95/100 (estimated time remaining: 9 minutes, 57 seconds)
2026-01-23 03:43:28,282 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:43:30,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 958.51532 ± 84.089
2026-01-23 03:43:30,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1073.6888, 965.7822, 1009.283, 1081.8125, 929.7665, 933.40283, 993.1015, 953.01715, 824.78754, 820.5113]
2026-01-23 03:43:30,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [319.0, 284.0, 301.0, 318.0, 277.0, 275.0, 299.0, 282.0, 248.0, 248.0]
2026-01-23 03:43:30,584 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 12 seconds)
2026-01-23 03:45:05,906 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:45:13,907 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2908.08691 ± 468.762
2026-01-23 03:45:13,907 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3190.0703, 3197.3, 3167.1807, 3206.6716, 2053.311, 1969.7325, 3208.473, 3180.2532, 3175.6062, 2732.2715]
2026-01-23 03:45:13,907 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 656.0, 629.0, 1000.0, 1000.0, 1000.0, 858.0]
2026-01-23 03:45:13,916 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 38 seconds)
2026-01-23 03:46:44,964 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:46:49,432 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1352.61206 ± 1342.219
2026-01-23 03:46:49,433 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [178.28299, 35.19121, 985.35486, 527.7392, 82.57355, 3120.7795, 3196.309, 3204.472, 2190.3252, 5.0929446]
2026-01-23 03:46:49,433 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [85.0, 44.0, 337.0, 212.0, 91.0, 1000.0, 1000.0, 1000.0, 692.0, 8.0]
2026-01-23 03:46:49,442 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 98/100 (estimated time remaining: 4 minutes, 55 seconds)
2026-01-23 03:48:27,077 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:48:32,918 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1881.12891 ± 1097.664
2026-01-23 03:48:32,918 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1545.617, 2363.4563, 3263.8435, 2085.2227, 1491.4547, 3224.8962, 549.11035, 3229.8618, 1049.4943, 8.3327675]
2026-01-23 03:48:32,918 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [473.0, 714.0, 1000.0, 641.0, 450.0, 1000.0, 199.0, 1000.0, 345.0, 11.0]
2026-01-23 03:48:32,930 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 17 seconds)
2026-01-23 03:50:01,761 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:50:06,790 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1850.98901 ± 1272.254
2026-01-23 03:50:06,791 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [39.25109, 1336.8955, 3184.0525, 3212.4563, 563.65155, 1129.5275, 133.63533, 2987.406, 3210.4626, 2712.5535]
2026-01-23 03:50:06,791 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [37.0, 392.0, 1000.0, 1000.0, 180.0, 323.0, 67.0, 922.0, 1000.0, 854.0]
2026-01-23 03:50:06,803 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 37 seconds)
2026-01-23 03:51:40,876 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:51:41,819 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 330.09366 ± 605.538
2026-01-23 03:51:41,819 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [5.0759826, 1583.6248, 1496.8447, 8.822566, 38.811325, 33.828304, 10.123284, 42.14577, 40.28791, 41.372032]
2026-01-23 03:51:41,819 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [8.0, 461.0, 485.0, 11.0, 34.0, 40.0, 12.0, 45.0, 31.0, 31.0]
2026-01-23 03:51:41,830 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1299 [DEBUG]: Training session finished
