2026-01-23 01:10:07,945 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-hopper/DatasetOffice-sac-aug-mem2
2026-01-23 01:10:07,945 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-hopper/DatasetOffice-sac-aug-mem2
2026-01-23 01:10:07,945 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x1498430f9f50>}
2026-01-23 01:10:07,945 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1159 [DEBUG]: using device: cuda
2026-01-23 01:10:07,945 baseline-sac-noisy-hopper:77 [WARNING]: args.memorize_actions != args.horizon: 2 != 32
2026-01-23 01:10:08,087 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1181 [INFO]: Creating new trainer
2026-01-23 01:10:08,103 baseline-sac-noisy-hopper:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=17, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2026-01-23 01:10:08,103 baseline-sac-noisy-hopper:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=20, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-23 01:10:09,038 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1242 [DEBUG]: Starting training session...
2026-01-23 01:10:09,038 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 1/100
2026-01-23 01:11:31,335 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:11:31,450 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 9.02661 ± 0.505
2026-01-23 01:11:31,450 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [9.737859, 8.1250515, 9.209725, 9.453854, 9.199772, 8.513339, 9.316459, 9.522592, 8.512507, 8.674972]
2026-01-23 01:11:31,450 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [13.0, 12.0, 13.0, 13.0, 13.0, 12.0, 13.0, 13.0, 12.0, 12.0]
2026-01-23 01:11:31,450 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (9.03) for latency DatasetOffice
2026-01-23 01:11:31,454 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 15 minutes, 59 seconds)
2026-01-23 01:13:05,758 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:13:09,002 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 510.18790 ± 347.304
2026-01-23 01:13:09,002 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [915.6554, 190.38829, 200.24362, 216.48474, 188.72113, 1170.162, 506.78116, 920.25525, 506.86267, 286.3248]
2026-01-23 01:13:09,002 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [603.0, 143.0, 144.0, 145.0, 137.0, 850.0, 319.0, 605.0, 362.0, 195.0]
2026-01-23 01:13:09,002 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (510.19) for latency DatasetOffice
2026-01-23 01:13:09,005 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 26 minutes, 58 seconds)
2026-01-23 01:14:41,510 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:14:51,681 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1022.37512 ± 7.307
2026-01-23 01:14:51,681 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1022.5429, 1022.26135, 1026.7753, 1029.6276, 1003.7644, 1023.3319, 1023.3374, 1015.6647, 1027.8307, 1028.6146]
2026-01-23 01:14:51,681 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:14:51,681 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (1022.38) for latency DatasetOffice
2026-01-23 01:14:51,684 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 32 minutes, 18 seconds)
2026-01-23 01:16:17,626 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:16:23,393 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 704.60938 ± 208.901
2026-01-23 01:16:23,393 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [477.8667, 590.22955, 554.2811, 979.3926, 996.5819, 570.135, 495.85425, 788.1965, 1023.6144, 569.9421]
2026-01-23 01:16:23,393 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [356.0, 421.0, 380.0, 1000.0, 875.0, 447.0, 377.0, 611.0, 1000.0, 451.0]
2026-01-23 01:16:23,401 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 29 minutes, 44 seconds)
2026-01-23 01:17:53,246 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:17:54,105 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 217.23758 ± 33.381
2026-01-23 01:17:54,105 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [143.32872, 244.87277, 220.44797, 233.23985, 229.31134, 242.25877, 164.11644, 230.80994, 246.77528, 217.21483]
2026-01-23 01:17:54,105 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [76.0, 107.0, 97.0, 102.0, 100.0, 106.0, 81.0, 101.0, 109.0, 96.0]
2026-01-23 01:17:54,109 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 27 minutes, 16 seconds)
2026-01-23 01:19:29,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:19:35,120 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 565.22400 ± 392.602
2026-01-23 01:19:35,120 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1011.24146, 1012.3248, 729.3672, 1010.6894, 821.31445, 582.82623, 30.2906, 38.944263, 334.84485, 80.39656]
2026-01-23 01:19:35,120 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 717.0, 1000.0, 774.0, 579.0, 34.0, 40.0, 336.0, 74.0]
2026-01-23 01:19:35,123 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 31 minutes, 32 seconds)
2026-01-23 01:21:01,787 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:21:06,735 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 578.88635 ± 406.618
2026-01-23 01:21:06,735 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [340.6987, 16.244421, 17.793665, 1039.97, 1040.5059, 1041.3652, 391.5659, 550.57965, 1043.3684, 306.77194]
2026-01-23 01:21:06,735 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [213.0, 18.0, 16.0, 1000.0, 1000.0, 1000.0, 214.0, 365.0, 1000.0, 135.0]
2026-01-23 01:21:06,739 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 28 minutes, 5 seconds)
2026-01-23 01:22:36,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:22:37,423 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 187.58212 ± 9.238
2026-01-23 01:22:37,424 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [188.59479, 188.81721, 188.75484, 195.55226, 190.2969, 187.80356, 190.54102, 160.86098, 194.67041, 189.92926]
2026-01-23 01:22:37,424 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [93.0, 87.0, 91.0, 88.0, 87.0, 86.0, 91.0, 81.0, 88.0, 91.0]
2026-01-23 01:22:37,426 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 22 minutes, 49 seconds)
2026-01-23 01:24:06,939 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:24:08,121 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 308.21771 ± 36.966
2026-01-23 01:24:08,122 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [290.82397, 325.26587, 315.2215, 207.82405, 355.56378, 327.10196, 320.16696, 316.65836, 304.4933, 319.05734]
2026-01-23 01:24:08,122 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [136.0, 138.0, 135.0, 106.0, 143.0, 141.0, 139.0, 140.0, 136.0, 134.0]
2026-01-23 01:24:08,125 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 20 minutes, 57 seconds)
2026-01-23 01:25:38,231 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:25:38,646 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 73.11174 ± 105.616
2026-01-23 01:25:38,646 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [65.725426, 20.314259, 51.900627, 43.519524, 29.206865, 47.65931, 34.626896, 16.1164, 34.905552, 387.1426]
2026-01-23 01:25:38,646 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [49.0, 21.0, 32.0, 38.0, 24.0, 37.0, 23.0, 15.0, 64.0, 165.0]
2026-01-23 01:25:38,650 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 19 minutes, 21 seconds)
2026-01-23 01:27:08,716 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:27:09,866 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 353.38312 ± 6.566
2026-01-23 01:27:09,867 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [351.02023, 366.42874, 340.7157, 348.4168, 359.869, 352.5502, 350.34314, 357.14957, 352.2816, 355.05637]
2026-01-23 01:27:09,867 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [128.0, 132.0, 125.0, 127.0, 130.0, 129.0, 128.0, 130.0, 130.0, 129.0]
2026-01-23 01:27:09,873 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 14 minutes, 54 seconds)
2026-01-23 01:28:39,276 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:28:40,776 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 484.53607 ± 34.108
2026-01-23 01:28:40,776 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [528.4305, 490.44635, 484.65347, 426.68594, 520.48346, 479.42505, 507.88443, 519.018, 446.7286, 441.6049]
2026-01-23 01:28:40,776 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [173.0, 169.0, 175.0, 154.0, 172.0, 170.0, 181.0, 188.0, 163.0, 152.0]
2026-01-23 01:28:40,780 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 13 minutes, 11 seconds)
2026-01-23 01:30:10,804 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:30:12,377 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 449.72797 ± 71.276
2026-01-23 01:30:12,377 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [426.24994, 465.80444, 361.49612, 429.25967, 456.2736, 400.6348, 613.9063, 535.6008, 424.40634, 383.64777]
2026-01-23 01:30:12,377 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [167.0, 187.0, 153.0, 173.0, 177.0, 171.0, 230.0, 207.0, 178.0, 156.0]
2026-01-23 01:30:12,380 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 11 minutes, 56 seconds)
2026-01-23 01:31:42,603 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:31:43,928 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 405.77496 ± 110.485
2026-01-23 01:31:43,928 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [638.7192, 352.07364, 359.72507, 353.65894, 341.63715, 358.76956, 346.19568, 613.66064, 344.87888, 348.43076]
2026-01-23 01:31:43,928 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [202.0, 139.0, 139.0, 139.0, 134.0, 141.0, 137.0, 195.0, 137.0, 137.0]
2026-01-23 01:31:43,933 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 10 minutes, 39 seconds)
2026-01-23 01:33:15,197 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:33:16,859 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 575.34363 ± 61.431
2026-01-23 01:33:16,860 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [637.4293, 577.7248, 630.7779, 574.0257, 404.10907, 605.235, 589.607, 562.7077, 585.1543, 586.66626]
2026-01-23 01:33:16,860 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [201.0, 182.0, 201.0, 183.0, 147.0, 196.0, 187.0, 178.0, 190.0, 186.0]
2026-01-23 01:33:16,863 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 9 minutes, 49 seconds)
2026-01-23 01:34:46,314 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:34:47,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 423.83554 ± 200.813
2026-01-23 01:34:47,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [314.62195, 280.8448, 763.12915, 714.49805, 707.87427, 275.88367, 273.89145, 335.40692, 301.42844, 270.77664]
2026-01-23 01:34:47,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [134.0, 123.0, 232.0, 231.0, 217.0, 122.0, 121.0, 133.0, 129.0, 118.0]
2026-01-23 01:34:47,694 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 8 minutes, 11 seconds)
2026-01-23 01:36:18,003 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:36:20,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 802.44153 ± 278.773
2026-01-23 01:36:20,338 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [948.7089, 321.43958, 312.65, 929.6719, 917.39496, 939.0569, 1147.292, 561.3514, 994.7095, 952.14044]
2026-01-23 01:36:20,338 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [288.0, 135.0, 135.0, 281.0, 290.0, 291.0, 360.0, 217.0, 302.0, 304.0]
2026-01-23 01:36:20,341 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 7 minutes, 8 seconds)
2026-01-23 01:37:51,724 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:37:55,522 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1285.79822 ± 426.222
2026-01-23 01:37:55,523 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [560.33105, 1211.5248, 1660.425, 865.67896, 1078.4286, 1695.9786, 1806.1589, 1231.2799, 890.66223, 1857.5142]
2026-01-23 01:37:55,523 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [235.0, 459.0, 516.0, 314.0, 317.0, 492.0, 559.0, 393.0, 290.0, 559.0]
2026-01-23 01:37:55,523 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (1285.80) for latency DatasetOffice
2026-01-23 01:37:55,526 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 6 minutes, 35 seconds)
2026-01-23 01:39:27,130 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:39:32,508 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1722.33752 ± 871.755
2026-01-23 01:39:32,508 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [723.4324, 2784.3152, 2843.1228, 703.0768, 2917.438, 1911.3214, 951.4709, 2026.0194, 1652.576, 710.60144]
2026-01-23 01:39:32,508 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [226.0, 1000.0, 1000.0, 221.0, 930.0, 608.0, 289.0, 662.0, 542.0, 219.0]
2026-01-23 01:39:32,508 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (1722.34) for latency DatasetOffice
2026-01-23 01:39:32,515 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 6 minutes, 31 seconds)
2026-01-23 01:41:04,634 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:41:10,717 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1695.36108 ± 585.575
2026-01-23 01:41:10,717 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1794.6683, 2559.2942, 1567.0663, 2555.6152, 1771.1335, 985.67175, 1209.5148, 1514.588, 769.939, 2226.1194]
2026-01-23 01:41:10,717 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [696.0, 1000.0, 617.0, 1000.0, 695.0, 341.0, 466.0, 512.0, 237.0, 852.0]
2026-01-23 01:41:10,721 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 6 minutes, 21 seconds)
2026-01-23 01:42:44,341 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:42:49,360 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1434.66321 ± 1061.182
2026-01-23 01:42:49,360 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [111.79519, 1128.137, 901.40326, 2639.2112, 2669.7017, 1421.2717, 2628.3828, 2585.3638, 187.83832, 73.527695]
2026-01-23 01:42:49,360 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [79.0, 331.0, 279.0, 1000.0, 1000.0, 411.0, 1000.0, 1000.0, 89.0, 46.0]
2026-01-23 01:42:49,365 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 6 minutes, 50 seconds)
2026-01-23 01:44:18,007 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:44:22,564 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1252.59204 ± 1018.798
2026-01-23 01:44:22,564 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [71.288445, 25.050686, 54.009308, 1265.9412, 1213.4607, 1429.9948, 2665.7637, 2563.3193, 612.44946, 2624.643]
2026-01-23 01:44:22,565 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [61.0, 35.0, 35.0, 470.0, 455.0, 537.0, 1000.0, 1000.0, 218.0, 1000.0]
2026-01-23 01:44:22,574 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 5 minutes, 22 seconds)
2026-01-23 01:45:58,070 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:46:05,443 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2012.41541 ± 827.851
2026-01-23 01:46:05,443 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [2657.327, 1767.4443, 991.5781, 2619.969, 1227.6198, 2664.415, 334.3864, 2609.3975, 2621.2556, 2630.761]
2026-01-23 01:46:05,443 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 659.0, 383.0, 1000.0, 475.0, 1000.0, 145.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:46:05,443 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (2012.42) for latency DatasetOffice
2026-01-23 01:46:05,447 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 5 minutes, 44 seconds)
2026-01-23 01:47:33,367 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:47:37,776 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1502.79065 ± 772.453
2026-01-23 01:47:37,776 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1019.04865, 1164.0929, 3019.374, 3041.843, 1314.1696, 1060.0785, 1196.1335, 1026.6333, 1268.8262, 917.70715]
2026-01-23 01:47:37,776 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [299.0, 376.0, 1000.0, 1000.0, 439.0, 331.0, 344.0, 304.0, 371.0, 269.0]
2026-01-23 01:47:37,780 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 2 minutes, 56 seconds)
2026-01-23 01:49:14,169 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:49:23,695 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2933.99463 ± 232.249
2026-01-23 01:49:23,695 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [2998.8003, 2242.2502, 2993.9592, 3062.2925, 2982.767, 3031.2847, 3009.463, 2971.2532, 2996.361, 3051.515]
2026-01-23 01:49:23,695 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 751.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:49:23,695 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (2933.99) for latency DatasetOffice
2026-01-23 01:49:23,699 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 3 minutes, 14 seconds)
2026-01-23 01:51:15,557 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:51:24,062 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2814.19092 ± 718.092
2026-01-23 01:51:24,062 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3228.5767, 3190.3835, 3165.6072, 3168.0244, 3204.8914, 1744.261, 3043.3823, 3189.4695, 1080.3887, 3126.9238]
2026-01-23 01:51:24,062 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 551.0, 971.0, 1000.0, 308.0, 1000.0]
2026-01-23 01:51:24,068 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 6 minutes, 57 seconds)
2026-01-23 01:52:52,663 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:52:59,074 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2074.43799 ± 1344.206
2026-01-23 01:52:59,074 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [43.80987, 12.727912, 3133.721, 3148.7458, 3152.2021, 3163.4263, 1558.9152, 395.74783, 2981.3984, 3153.6873]
2026-01-23 01:52:59,074 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [31.0, 13.0, 1000.0, 1000.0, 1000.0, 1000.0, 487.0, 157.0, 940.0, 1000.0]
2026-01-23 01:52:59,078 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 5 minutes, 40 seconds)
2026-01-23 01:54:32,038 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:54:36,589 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1631.42126 ± 941.232
2026-01-23 01:54:36,589 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1015.96576, 3274.7012, 3284.7322, 1575.5889, 970.3957, 2427.9404, 719.3278, 1036.9858, 994.04944, 1014.524]
2026-01-23 01:54:36,590 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [297.0, 993.0, 1000.0, 443.0, 277.0, 726.0, 220.0, 302.0, 284.0, 300.0]
2026-01-23 01:55:18,250 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 12 minutes, 40 seconds)
2026-01-23 01:56:49,057 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:56:56,559 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2424.43628 ± 1167.407
2026-01-23 01:56:56,559 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3159.6611, 1066.8909, 3192.873, 970.2742, 20.064697, 3168.4426, 3142.9004, 3169.0103, 3148.5642, 3205.682]
2026-01-23 01:56:56,559 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 345.0, 1000.0, 325.0, 20.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:56:56,565 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 12 minutes, 14 seconds)
2026-01-23 01:58:34,984 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:58:43,895 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2902.03442 ± 801.397
2026-01-23 01:58:43,895 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3173.0322, 3164.3264, 3201.971, 500.20538, 3196.206, 3144.625, 3101.4858, 3188.8105, 3124.209, 3225.4736]
2026-01-23 01:58:43,895 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 170.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:58:43,900 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 10 minutes, 42 seconds)
2026-01-23 02:00:10,549 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:00:18,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2564.75684 ± 990.913
2026-01-23 02:00:18,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3141.8435, 3161.762, 3165.6797, 3104.4524, 2034.5393, 3092.8372, 3089.5938, 1600.8539, 53.567497, 3202.4373]
2026-01-23 02:00:18,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 645.0, 1000.0, 1000.0, 526.0, 32.0, 1000.0]
2026-01-23 02:00:18,519 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 2 minutes, 55 seconds)
2026-01-23 02:01:56,018 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:02:03,436 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2457.75928 ± 1202.815
2026-01-23 02:02:03,436 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3016.5183, 3209.3384, 3322.009, 3250.0574, 3292.9333, 1004.88245, 3278.774, 792.81757, 151.34068, 3258.9224]
2026-01-23 02:02:03,436 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [949.0, 1000.0, 1000.0, 1000.0, 1000.0, 294.0, 1000.0, 264.0, 87.0, 1000.0]
2026-01-23 02:02:03,441 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 3 minutes, 23 seconds)
2026-01-23 02:03:34,455 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:03:38,195 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1155.13647 ± 1324.385
2026-01-23 02:03:38,195 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [338.75717, 2930.0535, 3090.013, 1745.8121, 19.906574, 59.707886, 63.010464, 121.70821, 91.06646, 3091.3306]
2026-01-23 02:03:38,195 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [150.0, 938.0, 1000.0, 581.0, 24.0, 50.0, 64.0, 80.0, 77.0, 1000.0]
2026-01-23 02:03:38,200 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 51 minutes, 39 seconds)
2026-01-23 02:05:11,713 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:05:17,361 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1948.78809 ± 920.612
2026-01-23 02:05:17,361 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3230.915, 990.05975, 3172.3108, 997.8856, 1423.202, 1004.4102, 1091.6836, 2035.7456, 2417.825, 3123.8423]
2026-01-23 02:05:17,361 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 304.0, 1000.0, 296.0, 461.0, 302.0, 358.0, 653.0, 766.0, 1000.0]
2026-01-23 02:05:17,367 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 50 minutes, 10 seconds)
2026-01-23 02:06:41,599 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:06:50,146 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2864.80469 ± 952.839
2026-01-23 02:06:50,152 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [9.375352, 3083.6702, 3199.876, 3223.965, 3178.7742, 3211.5852, 3234.9736, 3163.4744, 3128.7976, 3213.5554]
2026-01-23 02:06:50,152 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [10.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:06:50,158 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 45 minutes, 21 seconds)
2026-01-23 02:08:29,279 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:08:35,640 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2092.70630 ± 1361.797
2026-01-23 02:08:35,640 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3151.625, 1496.6011, 3179.42, 3156.2153, 3142.5278, 3209.0579, 3126.0435, 382.51553, 60.174587, 22.882233]
2026-01-23 02:08:35,640 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 496.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 152.0, 45.0, 17.0]
2026-01-23 02:08:35,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 46 minutes, 3 seconds)
2026-01-23 02:09:59,095 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:10:05,341 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2169.28711 ± 1184.717
2026-01-23 02:10:05,341 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3284.0464, 3251.9238, 3300.5935, 523.6104, 3202.8352, 861.40814, 1198.1558, 513.4401, 2316.4597, 3240.3994]
2026-01-23 02:10:05,341 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 188.0, 1000.0, 289.0, 357.0, 189.0, 688.0, 1000.0]
2026-01-23 02:10:05,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 41 minutes, 12 seconds)
2026-01-23 02:11:37,861 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:11:46,389 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2868.33325 ± 690.027
2026-01-23 02:11:46,389 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1306.323, 3182.654, 3333.755, 3183.4712, 3113.8281, 3232.739, 3196.2153, 1700.7256, 3220.3276, 3213.2922]
2026-01-23 02:11:46,389 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [437.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 542.0, 1000.0, 1000.0]
2026-01-23 02:11:46,395 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 40 minutes, 53 seconds)
2026-01-23 02:13:17,355 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:13:19,075 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 552.97058 ± 1057.620
2026-01-23 02:13:19,075 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3316.3723, 1791.8646, 12.760425, 8.182639, 17.812832, 54.43435, 81.86343, 132.52556, 56.284325, 57.606083]
2026-01-23 02:13:19,075 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 562.0, 13.0, 10.0, 20.0, 48.0, 70.0, 82.0, 52.0, 32.0]
2026-01-23 02:13:19,081 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 37 minutes, 56 seconds)
2026-01-23 02:14:52,141 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:14:58,757 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2335.67236 ± 1105.054
2026-01-23 02:14:58,757 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1106.9998, 3368.254, 1284.7416, 3334.165, 2961.5344, 3262.1497, 2190.616, 2461.9497, 46.729984, 3339.582]
2026-01-23 02:14:58,757 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [368.0, 1000.0, 434.0, 1000.0, 906.0, 1000.0, 681.0, 742.0, 27.0, 1000.0]
2026-01-23 02:14:58,763 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 37 minutes, 43 seconds)
2026-01-23 02:16:25,836 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:16:34,464 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 3010.81812 ± 695.492
2026-01-23 02:16:34,464 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3221.713, 1012.4475, 3347.2485, 3332.4097, 3293.9092, 3340.4946, 3270.2212, 3349.009, 3293.449, 2647.2795]
2026-01-23 02:16:34,464 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 305.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 807.0]
2026-01-23 02:16:34,464 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (3010.82) for latency DatasetOffice
2026-01-23 02:16:34,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 34 minutes, 10 seconds)
2026-01-23 02:18:09,518 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:18:15,736 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2221.08911 ± 1383.008
2026-01-23 02:18:15,736 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3357.6572, 1040.4207, 3360.334, 3380.9338, 3291.3765, 1144.9598, 3326.7368, 3187.5337, 31.789457, 89.1494]
2026-01-23 02:18:15,736 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 309.0, 1000.0, 1000.0, 1000.0, 327.0, 1000.0, 970.0, 31.0, 49.0]
2026-01-23 02:18:15,743 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 34 minutes, 48 seconds)
2026-01-23 02:19:49,650 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:19:55,014 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1889.40564 ± 1239.166
2026-01-23 02:19:55,014 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3294.9246, 575.96906, 3254.4966, 1038.0032, 2097.885, 1173.6008, 3319.0513, 251.21637, 570.2438, 3318.6658]
2026-01-23 02:19:55,015 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 203.0, 1000.0, 302.0, 651.0, 339.0, 1000.0, 109.0, 187.0, 1000.0]
2026-01-23 02:19:55,020 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 32 minutes, 50 seconds)
2026-01-23 02:21:23,207 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:21:28,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1838.03149 ± 1015.234
2026-01-23 02:21:28,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [2385.4846, 3225.3738, 3282.4343, 1065.6927, 1034.4073, 1050.8107, 3288.7705, 989.1481, 1018.2852, 1039.9067]
2026-01-23 02:21:28,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [725.0, 971.0, 1000.0, 313.0, 307.0, 313.0, 1000.0, 285.0, 297.0, 309.0]
2026-01-23 02:21:28,241 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 31 minutes, 18 seconds)
2026-01-23 02:22:55,125 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:23:00,596 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1891.43481 ± 1295.908
2026-01-23 02:23:00,596 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3290.451, 1050.7838, 3216.0862, 2458.5503, 391.7436, 3099.6743, 3302.1775, 1884.5049, 44.366993, 176.0079]
2026-01-23 02:23:00,596 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 321.0, 1000.0, 765.0, 152.0, 930.0, 1000.0, 577.0, 26.0, 103.0]
2026-01-23 02:23:00,604 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 28 minutes, 20 seconds)
2026-01-23 02:24:34,463 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:24:40,028 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2005.29907 ± 1077.214
2026-01-23 02:24:40,028 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [843.28046, 954.79236, 3243.112, 889.6835, 1193.4016, 2321.1123, 3317.4436, 948.22614, 3340.759, 3001.1804]
2026-01-23 02:24:40,028 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [257.0, 296.0, 1000.0, 270.0, 373.0, 731.0, 1000.0, 294.0, 1000.0, 925.0]
2026-01-23 02:24:40,034 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 27 minutes, 24 seconds)
2026-01-23 02:26:09,305 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:26:17,090 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2739.87769 ± 859.789
2026-01-23 02:26:17,090 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1046.7299, 3193.1355, 3155.4155, 1189.8666, 3346.2363, 3265.1584, 3271.5244, 2319.4258, 3335.4797, 3275.8037]
2026-01-23 02:26:17,090 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [348.0, 1000.0, 1000.0, 350.0, 1000.0, 1000.0, 1000.0, 728.0, 1000.0, 1000.0]
2026-01-23 02:26:17,097 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 25 minutes, 2 seconds)
2026-01-23 02:27:52,145 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:27:59,352 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2538.43188 ± 974.433
2026-01-23 02:27:59,353 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [483.53494, 3238.2947, 3261.9749, 3256.532, 1695.7886, 3256.6765, 3261.7668, 3279.7197, 1362.933, 2287.0981]
2026-01-23 02:27:59,353 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [177.0, 1000.0, 1000.0, 1000.0, 510.0, 1000.0, 1000.0, 1000.0, 418.0, 723.0]
2026-01-23 02:27:59,362 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 23 minutes, 57 seconds)
2026-01-23 02:29:27,594 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:29:33,358 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2082.35010 ± 1255.483
2026-01-23 02:29:33,358 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3306.781, 1479.3623, 3321.0933, 588.84705, 1155.9956, 3297.7083, 3289.7354, 549.6731, 513.973, 3320.333]
2026-01-23 02:29:33,358 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 422.0, 1000.0, 204.0, 325.0, 1000.0, 1000.0, 194.0, 173.0, 1000.0]
2026-01-23 02:29:33,365 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 22 minutes, 28 seconds)
2026-01-23 02:30:58,791 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:31:06,612 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2726.47656 ± 987.971
2026-01-23 02:31:06,612 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3214.963, 2644.7666, 3260.8193, 3190.895, 3267.6208, 3304.8699, 3197.0435, 1808.1417, 80.86779, 3294.7786]
2026-01-23 02:31:06,612 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 833.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 569.0, 50.0, 1000.0]
2026-01-23 02:31:06,622 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 21 minutes)
2026-01-23 02:32:34,246 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:32:36,925 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 905.15234 ± 1248.341
2026-01-23 02:32:36,925 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [384.60855, 3046.4097, 3255.1296, 1926.9009, 125.5523, 22.894318, 52.133774, 82.122696, 52.034897, 103.73595]
2026-01-23 02:32:36,925 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [164.0, 927.0, 1000.0, 585.0, 79.0, 36.0, 35.0, 50.0, 32.0, 79.0]
2026-01-23 02:32:36,932 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 17 minutes, 53 seconds)
2026-01-23 02:34:04,955 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:34:11,699 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2437.63721 ± 978.440
2026-01-23 02:34:11,699 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1009.5897, 880.2616, 3348.489, 3383.8142, 1868.7175, 3342.6138, 3308.444, 1573.7365, 2367.7136, 3292.9934]
2026-01-23 02:34:11,699 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [316.0, 303.0, 1000.0, 1000.0, 592.0, 1000.0, 1000.0, 488.0, 731.0, 1000.0]
2026-01-23 02:34:11,707 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 15 minutes, 56 seconds)
2026-01-23 02:35:46,205 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:35:55,378 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 3336.84619 ± 28.307
2026-01-23 02:35:55,378 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3288.052, 3352.9324, 3294.4128, 3373.56, 3361.1045, 3353.457, 3338.3047, 3319.186, 3321.9119, 3365.54]
2026-01-23 02:35:55,378 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:35:55,378 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (3336.85) for latency DatasetOffice
2026-01-23 02:35:55,384 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 14 minutes, 34 seconds)
2026-01-23 02:37:24,620 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:37:29,389 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1706.96875 ± 1649.918
2026-01-23 02:37:29,389 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3381.9155, 3370.9893, 3327.3564, 3332.0571, 3370.508, 38.55894, 22.495733, 70.171616, 25.185669, 130.4485]
2026-01-23 02:37:29,389 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 29.0, 19.0, 44.0, 21.0, 101.0]
2026-01-23 02:37:29,395 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 12 minutes, 59 seconds)
2026-01-23 02:38:55,038 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:39:00,951 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2168.27734 ± 1210.954
2026-01-23 02:39:00,951 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3297.3843, 1152.4761, 3394.5713, 1198.2307, 3302.5017, 3310.8386, 1516.498, 21.699812, 1177.4003, 3311.1738]
2026-01-23 02:39:00,951 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 324.0, 1000.0, 357.0, 1000.0, 1000.0, 470.0, 19.0, 332.0, 1000.0]
2026-01-23 02:39:00,957 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 11 minutes, 9 seconds)
2026-01-23 02:40:25,526 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:40:33,934 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 3096.32422 ± 535.420
2026-01-23 02:40:33,934 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3333.0276, 2285.7278, 3370.5906, 3366.6064, 3356.1375, 3332.7788, 3389.7854, 1809.0757, 3364.374, 3355.1365]
2026-01-23 02:40:33,934 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 688.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 559.0, 1000.0, 1000.0]
2026-01-23 02:40:33,943 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 9 minutes, 57 seconds)
2026-01-23 02:42:05,483 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:42:07,781 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 763.13672 ± 1273.974
2026-01-23 02:42:07,781 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3344.336, 640.5287, 24.503717, 21.585253, 197.71367, 73.512634, 50.127815, 38.486317, 14.236812, 3226.3364]
2026-01-23 02:42:07,781 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 226.0, 24.0, 26.0, 100.0, 47.0, 54.0, 60.0, 13.0, 1000.0]
2026-01-23 02:42:07,791 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 8 minutes, 14 seconds)
2026-01-23 02:43:41,871 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:43:46,126 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1598.55859 ± 1147.554
2026-01-23 02:43:46,126 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3355.879, 1714.0645, 3351.6108, 1373.86, 983.92145, 2708.207, 1025.6354, 1430.6228, 20.501951, 21.282803]
2026-01-23 02:43:46,126 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 540.0, 1000.0, 380.0, 288.0, 809.0, 306.0, 443.0, 17.0, 18.0]
2026-01-23 02:43:46,133 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 5 minutes, 54 seconds)
2026-01-23 02:45:12,120 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:45:20,918 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 3148.37305 ± 455.757
2026-01-23 02:45:20,919 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1786.9873, 3288.0125, 3246.6055, 3315.988, 3329.9597, 3333.455, 3201.2646, 3308.4473, 3329.662, 3343.3467]
2026-01-23 02:45:20,919 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [562.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:45:20,925 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 4 minutes, 26 seconds)
2026-01-23 02:46:46,715 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:46:53,901 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2608.17139 ± 1193.290
2026-01-23 02:46:53,901 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [817.86694, 3327.036, 3299.9507, 3328.96, 3370.9521, 3360.359, 3394.447, 3342.3298, 1789.977, 49.83602]
2026-01-23 02:46:53,901 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [258.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 546.0, 30.0]
2026-01-23 02:46:53,908 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 3 minutes, 3 seconds)
2026-01-23 02:48:25,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:48:31,763 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2224.43994 ± 1183.162
2026-01-23 02:48:31,763 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3368.7075, 1216.8663, 2309.2544, 3282.1582, 539.21576, 1026.9546, 3341.758, 570.57, 3313.003, 3275.9126]
2026-01-23 02:48:31,763 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 371.0, 718.0, 1000.0, 176.0, 311.0, 1000.0, 164.0, 1000.0, 1000.0]
2026-01-23 02:48:31,771 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 2 minutes, 7 seconds)
2026-01-23 02:50:02,147 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:50:08,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2267.58496 ± 1361.267
2026-01-23 02:50:08,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3300.9285, 3370.809, 3297.235, 953.5846, 3331.8403, 3320.4446, 1672.161, 104.21404, 11.520888, 3313.111]
2026-01-23 02:50:08,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 310.0, 1000.0, 1000.0, 521.0, 58.0, 12.0, 1000.0]
2026-01-23 02:50:08,417 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 52 seconds)
2026-01-23 02:51:35,683 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:51:38,273 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 915.57190 ± 1180.524
2026-01-23 02:51:38,274 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [831.12476, 2665.625, 3351.5205, 1793.5098, 16.782948, 27.920748, 136.0598, 215.98637, 71.18601, 46.003407]
2026-01-23 02:51:38,274 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [249.0, 834.0, 1000.0, 554.0, 17.0, 19.0, 87.0, 97.0, 48.0, 33.0]
2026-01-23 02:51:38,283 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 64/100 (estimated time remaining: 58 minutes, 13 seconds)
2026-01-23 02:53:04,794 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:53:10,870 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2188.45654 ± 1321.053
2026-01-23 02:53:10,870 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3338.7458, 3334.996, 3306.7788, 1851.3286, 3345.9756, 21.267181, 3250.565, 938.6971, 2463.5762, 32.634212]
2026-01-23 02:53:10,870 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 539.0, 1000.0, 17.0, 1000.0, 310.0, 759.0, 24.0]
2026-01-23 02:53:10,877 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 65/100 (estimated time remaining: 56 minutes, 23 seconds)
2026-01-23 02:54:38,870 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:54:45,582 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2421.08984 ± 1210.007
2026-01-23 02:54:45,582 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1018.4354, 12.699812, 3250.5867, 2382.7754, 3344.5518, 3325.495, 954.61975, 3269.517, 3374.3328, 3277.8857]
2026-01-23 02:54:45,582 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [332.0, 12.0, 1000.0, 687.0, 1000.0, 1000.0, 314.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:54:45,589 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 66/100 (estimated time remaining: 55 minutes, 1 second)
2026-01-23 02:56:14,370 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:56:21,977 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2730.16772 ± 1016.295
2026-01-23 02:56:21,977 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [2497.286, 3307.7512, 82.79479, 3284.9443, 1694.9324, 3315.9639, 3284.5347, 3298.98, 3237.9456, 3296.5442]
2026-01-23 02:56:21,977 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [783.0, 1000.0, 49.0, 1000.0, 530.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:56:21,984 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 67/100 (estimated time remaining: 53 minutes, 17 seconds)
2026-01-23 02:57:50,006 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:57:56,608 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2334.50684 ± 1289.513
2026-01-23 02:57:56,608 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3271.0554, 3313.461, 3318.1167, 2113.1558, 3250.5757, 139.29166, 3269.5508, 1345.2246, 24.99441, 3299.6428]
2026-01-23 02:57:56,608 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 657.0, 1000.0, 70.0, 1000.0, 442.0, 23.0, 1000.0]
2026-01-23 02:57:56,614 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 68/100 (estimated time remaining: 51 minutes, 30 seconds)
2026-01-23 02:59:25,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:59:33,943 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 3004.86963 ± 930.725
2026-01-23 02:59:33,943 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3344.5466, 3289.6948, 3339.4832, 3339.6892, 3295.1313, 3283.9827, 3283.7947, 213.5926, 3322.321, 3336.461]
2026-01-23 02:59:33,943 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 93.0, 1000.0, 1000.0]
2026-01-23 02:59:33,950 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 69/100 (estimated time remaining: 50 minutes, 44 seconds)
2026-01-23 03:01:02,489 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:01:04,146 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 556.80725 ± 1048.189
2026-01-23 03:01:04,146 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3305.3435, 1755.4248, 22.124687, 31.069023, 266.14154, 62.857143, 18.74643, 25.35399, 37.809845, 43.200836]
2026-01-23 03:01:04,146 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 547.0, 21.0, 30.0, 126.0, 40.0, 24.0, 29.0, 30.0, 28.0]
2026-01-23 03:01:04,153 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 70/100 (estimated time remaining: 48 minutes, 54 seconds)
2026-01-23 03:02:31,952 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:02:39,031 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2599.02026 ± 680.945
2026-01-23 03:02:39,031 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [2317.332, 2250.189, 3321.52, 3380.135, 2165.4622, 3316.4998, 2258.8962, 2513.5737, 1182.0038, 3284.591]
2026-01-23 03:02:39,031 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [709.0, 664.0, 1000.0, 1000.0, 653.0, 1000.0, 687.0, 766.0, 345.0, 1000.0]
2026-01-23 03:02:39,041 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 71/100 (estimated time remaining: 47 minutes, 20 seconds)
2026-01-23 03:04:06,789 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:04:15,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 3151.07764 ± 397.605
2026-01-23 03:04:15,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3331.5098, 3350.6665, 3317.9902, 3375.271, 3374.145, 3321.6316, 3368.1028, 2652.8154, 2132.3152, 3286.328]
2026-01-23 03:04:15,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 769.0, 652.0, 1000.0]
2026-01-23 03:04:15,356 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 72/100 (estimated time remaining: 45 minutes, 45 seconds)
2026-01-23 03:05:50,547 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:05:55,456 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1780.25659 ± 1437.537
2026-01-23 03:05:55,456 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3351.2507, 1033.6837, 3281.7363, 2650.8706, 3265.1511, 3317.7556, 765.4041, 73.30935, 28.096125, 35.31106]
2026-01-23 03:05:55,456 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 315.0, 1000.0, 785.0, 1000.0, 1000.0, 252.0, 43.0, 35.0, 28.0]
2026-01-23 03:05:55,464 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 73/100 (estimated time remaining: 44 minutes, 41 seconds)
2026-01-23 03:07:24,258 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:07:30,477 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2308.33374 ± 1313.636
2026-01-23 03:07:30,477 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1203.764, 3372.4048, 3225.6252, 954.13293, 3386.5361, 3420.8586, 772.625, 19.11815, 3353.6375, 3374.6355]
2026-01-23 03:07:30,477 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [349.0, 1000.0, 962.0, 274.0, 1000.0, 1000.0, 259.0, 16.0, 1000.0, 1000.0]
2026-01-23 03:07:30,489 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 74/100 (estimated time remaining: 42 minutes, 53 seconds)
2026-01-23 03:09:00,866 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:09:04,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1337.69922 ± 1031.384
2026-01-23 03:09:04,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [76.63591, 974.0892, 893.8237, 1075.3627, 3302.1511, 960.60614, 883.90454, 938.11084, 909.9292, 3362.3784]
2026-01-23 03:09:04,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [47.0, 284.0, 262.0, 317.0, 1000.0, 280.0, 259.0, 278.0, 268.0, 1000.0]
2026-01-23 03:09:04,309 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 75/100 (estimated time remaining: 41 minutes, 36 seconds)
2026-01-23 03:10:30,135 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:10:32,462 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 971.56067 ± 105.370
2026-01-23 03:10:32,462 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [894.04395, 845.6198, 926.07385, 940.5474, 927.2704, 943.05194, 961.8511, 946.4216, 1226.6313, 1104.095]
2026-01-23 03:10:32,462 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [257.0, 249.0, 275.0, 275.0, 268.0, 271.0, 278.0, 280.0, 357.0, 315.0]
2026-01-23 03:10:32,476 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 76/100 (estimated time remaining: 39 minutes, 27 seconds)
2026-01-23 03:11:56,008 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:11:57,870 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 649.67383 ± 1034.051
2026-01-23 03:11:57,870 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [20.940779, 32.541847, 1264.5776, 227.98012, 74.09952, 40.392433, 15.422387, 43.35683, 3347.7766, 1429.6505]
2026-01-23 03:11:57,870 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [26.0, 34.0, 425.0, 112.0, 49.0, 42.0, 14.0, 27.0, 1000.0, 413.0]
2026-01-23 03:11:57,878 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 77/100 (estimated time remaining: 37 minutes)
2026-01-23 03:13:27,860 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:13:36,200 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 3002.60547 ± 696.367
2026-01-23 03:13:36,200 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3344.4734, 3384.629, 3290.1987, 3271.4001, 3268.119, 3370.8215, 2616.172, 3283.771, 1011.66864, 3184.8003]
2026-01-23 03:13:36,200 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 792.0, 1000.0, 347.0, 1000.0]
2026-01-23 03:13:36,210 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 78/100 (estimated time remaining: 35 minutes, 19 seconds)
2026-01-23 03:15:11,212 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:15:17,528 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2328.77393 ± 1205.578
2026-01-23 03:15:17,528 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1532.0017, 422.94363, 280.4518, 3381.8586, 3409.3467, 2532.0146, 1620.8889, 3322.1572, 3379.2827, 3406.79]
2026-01-23 03:15:17,528 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [470.0, 157.0, 120.0, 1000.0, 1000.0, 762.0, 494.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:15:17,537 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 79/100 (estimated time remaining: 34 minutes, 15 seconds)
2026-01-23 03:16:38,804 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:16:46,729 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2903.59033 ± 791.980
2026-01-23 03:16:46,729 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1029.0083, 1791.9694, 2681.0251, 3357.815, 3377.6692, 3363.289, 3370.1455, 3356.9248, 3325.157, 3382.8994]
2026-01-23 03:16:46,729 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [309.0, 562.0, 810.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:16:46,737 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 80/100 (estimated time remaining: 32 minutes, 22 seconds)
2026-01-23 03:18:15,932 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:18:23,628 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2803.56519 ± 929.992
2026-01-23 03:18:23,628 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3348.7, 1219.8538, 3351.7998, 3349.692, 2632.8682, 3338.8877, 3358.0293, 3288.8809, 785.0663, 3361.8745]
2026-01-23 03:18:23,628 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 360.0, 1000.0, 1000.0, 807.0, 1000.0, 1000.0, 1000.0, 269.0, 1000.0]
2026-01-23 03:18:23,638 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 81/100 (estimated time remaining: 31 minutes, 24 seconds)
2026-01-23 03:19:51,032 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:19:57,110 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2234.06299 ± 1384.789
2026-01-23 03:19:57,110 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3357.1807, 901.5979, 3382.8604, 3376.9438, 3383.8684, 3398.0562, 1078.847, 91.369415, 217.30212, 3152.605]
2026-01-23 03:19:57,111 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 268.0, 1000.0, 1000.0, 1000.0, 1000.0, 343.0, 65.0, 97.0, 942.0]
2026-01-23 03:19:57,119 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 82/100 (estimated time remaining: 30 minutes, 21 seconds)
2026-01-23 03:21:26,380 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:21:33,810 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2688.97241 ± 1071.993
2026-01-23 03:21:33,810 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3261.4775, 3361.3047, 1071.3168, 3339.3323, 3365.4485, 1939.167, 3390.6465, 406.0818, 3365.689, 3389.2603]
2026-01-23 03:21:33,810 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 346.0, 1000.0, 1000.0, 595.0, 1000.0, 152.0, 1000.0, 1000.0]
2026-01-23 03:21:33,821 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 83/100 (estimated time remaining: 28 minutes, 39 seconds)
2026-01-23 03:23:04,412 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:23:09,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1910.12622 ± 1142.876
2026-01-23 03:23:09,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [327.65338, 2907.3625, 3415.974, 1099.253, 1313.1923, 1024.5045, 3343.81, 985.52594, 1289.8896, 3394.0977]
2026-01-23 03:23:09,472 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [132.0, 889.0, 1000.0, 317.0, 378.0, 301.0, 1000.0, 286.0, 365.0, 1000.0]
2026-01-23 03:23:09,480 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 84/100 (estimated time remaining: 26 minutes, 44 seconds)
2026-01-23 03:24:35,465 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:24:41,840 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2301.30615 ± 1324.303
2026-01-23 03:24:41,840 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1456.3695, 3217.8396, 3363.2144, 3394.0544, 422.77512, 37.854214, 1029.3142, 3379.466, 3308.8408, 3403.334]
2026-01-23 03:24:41,840 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [459.0, 1000.0, 1000.0, 1000.0, 157.0, 33.0, 306.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:24:41,850 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 85/100 (estimated time remaining: 25 minutes, 20 seconds)
2026-01-23 03:26:11,493 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:26:17,700 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2332.42236 ± 1167.853
2026-01-23 03:26:17,700 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3398.4307, 344.46085, 1287.6827, 3407.1646, 3402.478, 3409.3835, 3401.5088, 2340.3904, 1427.7275, 904.9971]
2026-01-23 03:26:17,700 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 139.0, 363.0, 1000.0, 1000.0, 1000.0, 1000.0, 698.0, 442.0, 265.0]
2026-01-23 03:26:17,708 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 86/100 (estimated time remaining: 23 minutes, 42 seconds)
2026-01-23 03:27:48,777 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:27:56,642 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2887.98120 ± 1011.457
2026-01-23 03:27:56,642 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3394.3103, 360.21545, 3350.25, 3357.9863, 1501.7205, 3379.2908, 3397.2378, 3343.5457, 3412.1196, 3383.1367]
2026-01-23 03:27:56,642 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 142.0, 1000.0, 1000.0, 461.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:27:56,650 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 87/100 (estimated time remaining: 22 minutes, 22 seconds)
2026-01-23 03:29:24,777 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:29:29,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1707.94604 ± 1297.226
2026-01-23 03:29:29,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3354.356, 2964.7378, 1909.522, 552.12585, 3011.5942, 1009.1311, 3370.77, 732.3956, 40.263268, 134.56403]
2026-01-23 03:29:29,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 904.0, 590.0, 188.0, 892.0, 302.0, 1000.0, 245.0, 41.0, 75.0]
2026-01-23 03:29:29,476 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 88/100 (estimated time remaining: 20 minutes, 36 seconds)
2026-01-23 03:31:01,748 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:31:06,640 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1843.17346 ± 1010.933
2026-01-23 03:31:06,640 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1020.07666, 3315.6672, 901.63715, 3386.4302, 1726.6403, 1031.6572, 1154.283, 1138.2595, 3350.4348, 1406.6487]
2026-01-23 03:31:06,640 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [302.0, 1000.0, 262.0, 1000.0, 504.0, 305.0, 363.0, 374.0, 1000.0, 410.0]
2026-01-23 03:31:06,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 89/100 (estimated time remaining: 19 minutes, 5 seconds)
2026-01-23 03:32:28,587 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:32:33,794 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1902.89331 ± 1235.919
2026-01-23 03:32:33,794 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1018.9411, 1177.7903, 3168.741, 17.161919, 367.43256, 3401.1243, 3348.4817, 1756.1097, 1484.3701, 3288.7812]
2026-01-23 03:32:33,794 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [327.0, 384.0, 960.0, 17.0, 163.0, 1000.0, 1000.0, 554.0, 471.0, 1000.0]
2026-01-23 03:32:33,804 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 90/100 (estimated time remaining: 17 minutes, 18 seconds)
2026-01-23 03:34:06,738 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:34:13,175 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2364.19360 ± 1062.244
2026-01-23 03:34:13,175 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1601.4493, 1616.4657, 3344.285, 3310.8914, 3350.145, 499.84256, 1020.7978, 3367.5225, 2171.587, 3358.9504]
2026-01-23 03:34:13,175 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [511.0, 499.0, 1000.0, 1000.0, 1000.0, 174.0, 308.0, 1000.0, 662.0, 1000.0]
2026-01-23 03:34:13,188 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 91/100 (estimated time remaining: 15 minutes, 50 seconds)
2026-01-23 03:35:41,355 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:35:49,646 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 3003.16089 ± 559.564
2026-01-23 03:35:49,646 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1591.2694, 2445.7231, 3368.7139, 3344.4753, 3340.6567, 3319.18, 3342.897, 2712.7173, 3225.2522, 3340.7246]
2026-01-23 03:35:49,646 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [508.0, 742.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 826.0, 1000.0, 1000.0]
2026-01-23 03:35:49,656 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 92/100 (estimated time remaining: 14 minutes, 11 seconds)
2026-01-23 03:37:16,387 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:37:20,217 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1456.80933 ± 989.875
2026-01-23 03:37:20,217 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3343.0657, 1037.7173, 1052.7258, 3397.8125, 1035.1641, 1262.2101, 269.46924, 1122.8713, 919.2552, 1127.803]
2026-01-23 03:37:20,217 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 304.0, 307.0, 1000.0, 303.0, 368.0, 111.0, 330.0, 304.0, 330.0]
2026-01-23 03:37:20,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 93/100 (estimated time remaining: 12 minutes, 33 seconds)
2026-01-23 03:38:51,914 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:38:58,224 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2294.69946 ± 1061.137
2026-01-23 03:38:58,224 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1029.9319, 3356.198, 1871.3893, 2977.094, 3329.0217, 1276.8182, 3235.2866, 364.96173, 2139.1433, 3367.1497]
2026-01-23 03:38:58,224 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [307.0, 1000.0, 577.0, 897.0, 1000.0, 400.0, 962.0, 158.0, 670.0, 1000.0]
2026-01-23 03:38:58,236 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes)
2026-01-23 03:40:29,959 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:40:37,459 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2759.28662 ± 1042.321
2026-01-23 03:40:37,459 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3376.4465, 3374.4636, 3397.9912, 3169.635, 3397.0012, 1034.5352, 3410.1562, 443.26355, 2641.9756, 3347.3965]
2026-01-23 03:40:37,459 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 948.0, 1000.0, 306.0, 1000.0, 162.0, 777.0, 1000.0]
2026-01-23 03:40:37,469 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 95/100 (estimated time remaining: 9 minutes, 40 seconds)
2026-01-23 03:41:59,367 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:42:06,706 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2712.83472 ± 1013.061
2026-01-23 03:42:06,707 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3316.2456, 3411.5676, 3387.547, 883.5958, 3361.1382, 3360.2493, 3389.531, 1148.2439, 3359.6868, 1510.5409]
2026-01-23 03:42:06,707 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 301.0, 1000.0, 1000.0, 1000.0, 341.0, 1000.0, 467.0]
2026-01-23 03:42:06,716 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 96/100 (estimated time remaining: 7 minutes, 53 seconds)
2026-01-23 03:43:40,467 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:43:47,469 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2516.02490 ± 1307.197
2026-01-23 03:43:47,470 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [389.84787, 45.42113, 3385.735, 3307.6152, 3356.8755, 3367.2827, 3364.7627, 3319.2861, 1263.0549, 3360.368]
2026-01-23 03:43:47,470 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [151.0, 40.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 403.0, 1000.0]
2026-01-23 03:43:47,479 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 22 seconds)
2026-01-23 03:45:11,246 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:45:14,259 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1181.44983 ± 782.129
2026-01-23 03:45:14,280 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [884.6705, 919.2909, 998.1734, 949.3974, 1084.6257, 3368.247, 1098.3275, 1370.2948, 187.96028, 953.5112]
2026-01-23 03:45:14,280 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [256.0, 269.0, 290.0, 274.0, 312.0, 1000.0, 321.0, 402.0, 87.0, 276.0]
2026-01-23 03:45:14,290 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 98/100 (estimated time remaining: 4 minutes, 44 seconds)
2026-01-23 03:46:43,417 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:46:50,746 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2720.87256 ± 924.992
2026-01-23 03:46:50,746 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3383.7695, 928.51306, 3372.8684, 3387.691, 3370.03, 1528.1951, 2901.7534, 1582.1245, 3364.6519, 3389.129]
2026-01-23 03:46:50,746 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 272.0, 1000.0, 1000.0, 1000.0, 474.0, 879.0, 486.0, 1000.0, 1000.0]
2026-01-23 03:46:50,760 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 9 seconds)
2026-01-23 03:48:18,211 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:48:21,970 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1424.72388 ± 946.734
2026-01-23 03:48:21,970 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3301.2527, 1011.62384, 3317.0735, 1023.29395, 972.6716, 1018.3568, 806.3304, 1043.9053, 1000.06976, 752.66223]
2026-01-23 03:48:21,970 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 298.0, 1000.0, 311.0, 287.0, 308.0, 245.0, 320.0, 298.0, 254.0]
2026-01-23 03:48:21,982 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 32 seconds)
2026-01-23 03:49:57,615 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:50:03,425 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2206.33032 ± 1016.433
2026-01-23 03:50:03,426 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3347.705, 2240.3015, 1096.127, 1371.7455, 1048.3553, 3379.9812, 1185.602, 3402.7432, 3428.0134, 1562.7289]
2026-01-23 03:50:03,426 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 682.0, 316.0, 395.0, 301.0, 1000.0, 337.0, 1000.0, 1000.0, 452.0]
2026-01-23 03:50:03,435 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1299 [DEBUG]: Training session finished
