2026-01-23 01:08:36,150 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-hopper/DatasetOffice-sac-aug-mem5 
2026-01-23 01:08:36,150 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-hopper/DatasetOffice-sac-aug-mem5 
2026-01-23 01:08:36,150 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x145d14e602d0>}
2026-01-23 01:08:36,150 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1159 [DEBUG]: using device: cuda
2026-01-23 01:08:36,150 baseline-sac-noisy-hopper:77 [WARNING]: args.memorize_actions != args.horizon: 5 != 32
2026-01-23 01:08:36,292 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1181 [INFO]: Creating new trainer
2026-01-23 01:08:36,350 baseline-sac-noisy-hopper:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=26, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2026-01-23 01:08:36,350 baseline-sac-noisy-hopper:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=29, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-23 01:08:37,304 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1242 [DEBUG]: Starting training session...
2026-01-23 01:08:37,304 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 1/100
2026-01-23 01:10:01,533 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:10:01,764 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 43.68493 ± 1.010
2026-01-23 01:10:01,764 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [45.3349, 44.527607, 42.619965, 43.425053, 43.115547, 42.968784, 42.909107, 42.37828, 44.72991, 44.84012]
2026-01-23 01:10:01,764 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [26.0, 26.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 26.0, 26.0]
2026-01-23 01:10:01,764 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (43.68) for latency DatasetOffice
2026-01-23 01:10:01,769 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 19 minutes, 21 seconds)
2026-01-23 01:11:33,047 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:11:34,767 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 291.44662 ± 100.170
2026-01-23 01:11:34,767 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [374.2318, 359.75745, 328.3969, 255.676, 325.40295, 269.09628, 331.59265, 331.7997, 8.908324, 329.6045]
2026-01-23 01:11:34,767 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [251.0, 236.0, 205.0, 167.0, 204.0, 177.0, 208.0, 209.0, 15.0, 207.0]
2026-01-23 01:11:34,767 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (291.45) for latency DatasetOffice
2026-01-23 01:11:34,772 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 24 minutes, 55 seconds)
2026-01-23 01:13:06,589 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:13:08,626 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 355.58481 ± 14.937
2026-01-23 01:13:08,626 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [368.54333, 344.61703, 363.7023, 377.81927, 348.3186, 363.4547, 354.9692, 339.5872, 368.3748, 326.4617]
2026-01-23 01:13:08,626 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [220.0, 219.0, 236.0, 253.0, 227.0, 187.0, 228.0, 209.0, 240.0, 203.0]
2026-01-23 01:13:08,626 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (355.58) for latency DatasetOffice
2026-01-23 01:13:08,630 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 26 minutes, 12 seconds)
2026-01-23 01:14:39,353 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:14:43,715 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 611.64148 ± 194.173
2026-01-23 01:14:43,716 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [678.68066, 929.655, 728.59503, 656.13403, 543.49567, 764.6247, 592.7727, 293.2715, 667.98364, 261.2015]
2026-01-23 01:14:43,716 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [514.0, 742.0, 558.0, 481.0, 416.0, 575.0, 541.0, 228.0, 493.0, 185.0]
2026-01-23 01:14:43,716 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (611.64) for latency DatasetOffice
2026-01-23 01:14:43,720 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 26 minutes, 33 seconds)
2026-01-23 01:16:17,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:16:18,561 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 327.61844 ± 39.453
2026-01-23 01:16:18,561 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [341.92926, 333.86185, 361.01382, 330.8489, 329.47418, 349.8911, 327.66803, 340.22272, 213.09937, 348.1751]
2026-01-23 01:16:18,561 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [147.0, 143.0, 147.0, 146.0, 142.0, 150.0, 144.0, 146.0, 102.0, 149.0]
2026-01-23 01:16:18,567 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 26 minutes, 3 seconds)
2026-01-23 01:17:49,544 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:17:50,598 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 332.30722 ± 10.609
2026-01-23 01:17:50,598 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [345.61478, 330.79996, 330.01718, 331.08286, 316.4727, 341.31113, 317.60907, 332.82925, 326.20892, 351.12653]
2026-01-23 01:17:50,598 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [128.0, 123.0, 123.0, 122.0, 119.0, 125.0, 121.0, 124.0, 122.0, 128.0]
2026-01-23 01:17:50,602 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 26 minutes, 54 seconds)
2026-01-23 01:19:21,698 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:19:23,036 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 407.74472 ± 13.121
2026-01-23 01:19:23,036 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [423.13153, 426.5708, 421.41486, 385.7789, 410.84384, 400.10446, 406.37292, 405.22415, 409.77582, 388.22986]
2026-01-23 01:19:23,036 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [153.0, 157.0, 154.0, 146.0, 150.0, 149.0, 154.0, 152.0, 153.0, 146.0]
2026-01-23 01:19:23,040 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 25 minutes, 9 seconds)
2026-01-23 01:20:53,477 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:20:54,701 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 388.61606 ± 18.998
2026-01-23 01:20:54,701 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [395.11227, 415.7798, 383.8073, 398.27548, 396.57083, 392.9256, 346.74292, 361.94446, 395.4496, 399.55234]
2026-01-23 01:20:54,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [145.0, 144.0, 140.0, 144.0, 143.0, 143.0, 132.0, 133.0, 147.0, 147.0]
2026-01-23 01:20:54,706 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 22 minutes, 55 seconds)
2026-01-23 01:22:25,956 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:22:27,719 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 570.33234 ± 163.367
2026-01-23 01:22:27,719 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [748.3419, 638.8824, 284.72537, 445.2786, 652.3887, 277.49316, 651.5379, 631.0253, 727.9515, 645.69824]
2026-01-23 01:22:27,719 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [245.0, 222.0, 132.0, 166.0, 220.0, 124.0, 225.0, 227.0, 244.0, 221.0]
2026-01-23 01:22:27,724 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 20 minutes, 44 seconds)
2026-01-23 01:23:59,817 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:24:02,545 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 903.48425 ± 313.213
2026-01-23 01:24:02,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1122.8353, 302.4089, 1060.8484, 780.3521, 722.76605, 794.213, 1330.6975, 542.2131, 1262.5217, 1115.9865]
2026-01-23 01:24:02,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [369.0, 137.0, 332.0, 263.0, 251.0, 263.0, 426.0, 217.0, 398.0, 363.0]
2026-01-23 01:24:02,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (903.48) for latency DatasetOffice
2026-01-23 01:24:02,549 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 19 minutes, 11 seconds)
2026-01-23 01:25:34,541 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:25:36,567 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 735.62634 ± 44.777
2026-01-23 01:25:36,567 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [801.68353, 716.12115, 669.6016, 788.3462, 730.78644, 807.3503, 706.00867, 715.03625, 725.69806, 695.63104]
2026-01-23 01:25:36,567 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [252.0, 225.0, 211.0, 243.0, 229.0, 248.0, 226.0, 227.0, 230.0, 219.0]
2026-01-23 01:25:36,572 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 18 minutes, 14 seconds)
2026-01-23 01:27:06,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:27:10,898 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1629.40210 ± 597.904
2026-01-23 01:27:10,899 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [670.6937, 2309.2803, 1679.2955, 2042.1104, 2010.7703, 1261.027, 2717.7708, 1151.6539, 1246.3353, 1205.0836]
2026-01-23 01:27:10,899 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [248.0, 750.0, 525.0, 650.0, 655.0, 399.0, 869.0, 349.0, 393.0, 409.0]
2026-01-23 01:27:10,899 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (1629.40) for latency DatasetOffice
2026-01-23 01:27:10,902 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 17 minutes, 14 seconds)
2026-01-23 01:28:43,935 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:28:46,792 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 997.03162 ± 86.257
2026-01-23 01:28:46,792 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [886.2613, 956.4265, 917.50433, 1105.3292, 939.8732, 962.596, 984.19165, 964.83496, 1166.494, 1086.8057]
2026-01-23 01:28:46,792 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [296.0, 305.0, 289.0, 348.0, 308.0, 304.0, 321.0, 312.0, 362.0, 337.0]
2026-01-23 01:28:46,796 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 16 minutes, 54 seconds)
2026-01-23 01:30:18,441 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:30:20,764 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 823.58411 ± 150.095
2026-01-23 01:30:20,764 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [869.8653, 976.928, 764.20953, 683.34924, 1111.2606, 703.94037, 993.37354, 708.4313, 791.23846, 633.2444]
2026-01-23 01:30:20,764 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [264.0, 306.0, 238.0, 214.0, 330.0, 221.0, 305.0, 225.0, 240.0, 238.0]
2026-01-23 01:30:20,767 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 15 minutes, 36 seconds)
2026-01-23 01:31:52,892 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:31:55,803 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1019.83704 ± 279.696
2026-01-23 01:31:55,803 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [806.54926, 731.058, 1376.4054, 1634.3053, 1141.1875, 1128.1849, 849.8944, 853.1866, 834.823, 842.775]
2026-01-23 01:31:55,803 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [256.0, 251.0, 432.0, 504.0, 362.0, 356.0, 271.0, 265.0, 268.0, 263.0]
2026-01-23 01:31:55,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 14 minutes, 5 seconds)
2026-01-23 01:33:27,277 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:33:29,972 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 961.03271 ± 312.415
2026-01-23 01:33:29,972 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1545.3811, 1145.5516, 1112.883, 845.78766, 794.1131, 761.2923, 1335.673, 775.9974, 896.1939, 397.45267]
2026-01-23 01:33:29,972 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [469.0, 359.0, 332.0, 261.0, 252.0, 244.0, 415.0, 243.0, 282.0, 154.0]
2026-01-23 01:33:29,975 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 12 minutes, 33 seconds)
2026-01-23 01:35:00,892 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:35:01,995 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 346.69727 ± 418.973
2026-01-23 01:35:01,995 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [931.71594, 814.63324, 1172.6569, 54.237896, 69.654076, 68.96155, 114.45364, 138.80235, 65.47463, 36.382614]
2026-01-23 01:35:01,995 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [292.0, 251.0, 375.0, 37.0, 42.0, 44.0, 61.0, 71.0, 44.0, 37.0]
2026-01-23 01:35:01,999 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 10 minutes, 20 seconds)
2026-01-23 01:36:32,984 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:36:36,519 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1233.63940 ± 255.650
2026-01-23 01:36:36,519 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1407.5231, 1250.1213, 1467.5188, 1236.5214, 1062.0847, 610.6462, 1362.1973, 1101.577, 1264.1163, 1574.0887]
2026-01-23 01:36:36,519 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [439.0, 392.0, 457.0, 389.0, 322.0, 221.0, 430.0, 349.0, 395.0, 485.0]
2026-01-23 01:36:36,523 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 8 minutes, 23 seconds)
2026-01-23 01:38:09,183 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:38:12,499 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1184.83301 ± 248.828
2026-01-23 01:38:12,499 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [881.3369, 847.64984, 1783.9778, 1021.65857, 1175.6943, 1267.2018, 1260.2123, 1246.864, 1257.1465, 1106.5881]
2026-01-23 01:38:12,499 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [274.0, 260.0, 556.0, 307.0, 363.0, 383.0, 383.0, 403.0, 421.0, 353.0]
2026-01-23 01:38:12,503 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 7 minutes, 22 seconds)
2026-01-23 01:39:44,901 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:39:50,727 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1823.03162 ± 946.637
2026-01-23 01:39:50,727 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [2075.8406, 683.53955, 1970.2158, 89.15527, 1411.3218, 1160.0521, 3146.0771, 2955.1365, 2777.2178, 1961.7605]
2026-01-23 01:39:50,727 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [637.0, 258.0, 636.0, 50.0, 451.0, 372.0, 1000.0, 1000.0, 1000.0, 661.0]
2026-01-23 01:39:50,727 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (1823.03) for latency DatasetOffice
2026-01-23 01:39:50,731 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 6 minutes, 38 seconds)
2026-01-23 01:41:22,084 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:41:25,964 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1379.29749 ± 372.426
2026-01-23 01:41:25,964 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [824.90576, 1576.4989, 1720.8462, 2011.0281, 1463.3114, 1491.6053, 1529.7452, 926.21564, 852.8321, 1395.9851]
2026-01-23 01:41:25,964 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [252.0, 491.0, 544.0, 636.0, 449.0, 456.0, 477.0, 286.0, 262.0, 424.0]
2026-01-23 01:41:25,969 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 5 minutes, 20 seconds)
2026-01-23 01:43:01,865 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:43:04,549 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 956.14911 ± 174.260
2026-01-23 01:43:04,549 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [977.4611, 798.86206, 833.26825, 731.0376, 1274.2498, 847.88574, 911.60516, 1019.3252, 1258.7876, 909.0089]
2026-01-23 01:43:04,549 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [300.0, 251.0, 260.0, 235.0, 393.0, 261.0, 283.0, 310.0, 395.0, 277.0]
2026-01-23 01:43:04,553 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 5 minutes, 27 seconds)
2026-01-23 01:44:33,153 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:44:38,656 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1810.43237 ± 761.996
2026-01-23 01:44:38,657 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [2306.0679, 1805.46, 1752.6215, 949.7482, 1418.3517, 940.0421, 3053.364, 2173.0881, 797.97076, 2907.61]
2026-01-23 01:44:38,657 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [746.0, 548.0, 615.0, 295.0, 436.0, 290.0, 1000.0, 700.0, 272.0, 1000.0]
2026-01-23 01:44:38,661 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 3 minutes, 44 seconds)
2026-01-23 01:46:14,561 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:46:17,419 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 860.50262 ± 1174.710
2026-01-23 01:46:17,420 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [2126.5073, 2921.3735, 2837.3762, 223.14426, 47.581005, 48.77393, 103.09751, 116.64773, 121.50008, 59.024845]
2026-01-23 01:46:17,420 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [652.0, 1000.0, 907.0, 104.0, 37.0, 28.0, 55.0, 73.0, 64.0, 78.0]
2026-01-23 01:46:17,424 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 2 minutes, 50 seconds)
2026-01-23 01:47:47,026 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:47:52,207 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1723.31091 ± 739.747
2026-01-23 01:47:52,207 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [897.201, 1116.3121, 2793.9087, 1927.8743, 1067.3647, 1925.3558, 1221.0247, 940.64996, 2449.8135, 2893.6035]
2026-01-23 01:47:52,207 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [280.0, 339.0, 877.0, 595.0, 323.0, 590.0, 376.0, 290.0, 807.0, 1000.0]
2026-01-23 01:47:52,214 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 22 seconds)
2026-01-23 01:49:25,857 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:49:34,130 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2489.26099 ± 771.907
2026-01-23 01:49:34,130 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3012.027, 2716.1995, 2753.6343, 2947.9104, 2934.5713, 2747.1572, 1022.5043, 2909.8362, 894.3076, 2954.462]
2026-01-23 01:49:34,130 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 966.0, 845.0, 1000.0, 1000.0, 1000.0, 317.0, 1000.0, 306.0, 1000.0]
2026-01-23 01:49:34,130 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (2489.26) for latency DatasetOffice
2026-01-23 01:49:34,137 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 24 seconds)
2026-01-23 01:51:05,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:51:08,055 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 803.07898 ± 802.113
2026-01-23 01:51:08,055 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [393.51758, 1164.4225, 973.6892, 782.40753, 2338.153, 2077.5454, 36.31324, 49.72606, 74.471016, 140.5445]
2026-01-23 01:51:08,055 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [165.0, 360.0, 300.0, 268.0, 899.0, 662.0, 27.0, 29.0, 44.0, 73.0]
2026-01-23 01:51:08,061 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 28/100 (estimated time remaining: 1 hour, 57 minutes, 39 seconds)
2026-01-23 01:52:38,252 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:52:41,033 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1028.34351 ± 150.179
2026-01-23 01:52:41,034 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1423.6023, 884.09955, 1088.2432, 914.2142, 950.6822, 960.7998, 1093.1007, 898.2162, 1029.2472, 1041.2294]
2026-01-23 01:52:41,034 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [432.0, 271.0, 335.0, 283.0, 290.0, 296.0, 333.0, 275.0, 312.0, 310.0]
2026-01-23 01:52:41,039 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 29/100 (estimated time remaining: 1 hour, 55 minutes, 46 seconds)
2026-01-23 01:54:13,685 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:54:20,499 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2220.41064 ± 830.896
2026-01-23 01:54:20,499 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [2666.6465, 1315.6104, 3157.0054, 2887.9094, 1784.395, 1131.9932, 2939.8738, 830.62994, 3052.448, 2437.5977]
2026-01-23 01:54:20,499 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [832.0, 393.0, 1000.0, 1000.0, 585.0, 351.0, 1000.0, 247.0, 1000.0, 778.0]
2026-01-23 01:54:20,505 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 30/100 (estimated time remaining: 1 hour, 54 minutes, 19 seconds)
2026-01-23 01:55:54,073 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:55:56,984 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1060.84802 ± 218.037
2026-01-23 01:55:56,984 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1008.0444, 953.37286, 942.3293, 983.56964, 887.32837, 951.90985, 1674.5659, 1155.7528, 1099.8171, 951.7902]
2026-01-23 01:55:56,984 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [307.0, 293.0, 293.0, 303.0, 269.0, 296.0, 506.0, 351.0, 331.0, 292.0]
2026-01-23 01:55:56,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 53 minutes, 6 seconds)
2026-01-23 01:57:28,366 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:57:30,767 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 841.86212 ± 500.360
2026-01-23 01:57:30,768 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [105.58231, 94.92155, 280.02563, 976.4698, 801.9296, 1312.3121, 1341.75, 994.5402, 1595.7003, 915.3895]
2026-01-23 01:57:30,768 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [59.0, 68.0, 118.0, 332.0, 244.0, 392.0, 403.0, 304.0, 482.0, 280.0]
2026-01-23 01:57:30,773 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 49 minutes, 37 seconds)
2026-01-23 01:59:06,659 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:59:10,569 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1384.96191 ± 599.621
2026-01-23 01:59:10,569 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1114.3182, 1779.301, 1317.5801, 1220.8193, 707.08026, 909.25977, 996.53107, 1264.5862, 2947.2705, 1592.8728]
2026-01-23 01:59:10,569 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [340.0, 542.0, 389.0, 378.0, 245.0, 279.0, 304.0, 379.0, 902.0, 484.0]
2026-01-23 01:59:10,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 49 minutes, 22 seconds)
2026-01-23 02:00:43,132 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:00:48,407 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1813.23572 ± 790.577
2026-01-23 02:00:48,407 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [661.56647, 2675.384, 989.01733, 2384.663, 1244.8999, 2747.0186, 1333.6538, 3059.289, 1626.107, 1410.7583]
2026-01-23 02:00:48,407 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [230.0, 821.0, 303.0, 724.0, 386.0, 845.0, 402.0, 1000.0, 481.0, 433.0]
2026-01-23 02:00:48,414 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 48 minutes, 50 seconds)
2026-01-23 02:02:23,978 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:02:27,394 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1086.02490 ± 1039.943
2026-01-23 02:02:27,394 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [2330.1746, 374.88214, 2828.4202, 2396.06, 1442.7786, 1082.6583, 87.749725, 80.560326, 87.03195, 149.93237]
2026-01-23 02:02:27,394 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [767.0, 156.0, 945.0, 736.0, 435.0, 377.0, 51.0, 47.0, 57.0, 85.0]
2026-01-23 02:02:27,399 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 47 minutes, 7 seconds)
2026-01-23 02:04:02,322 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:04:05,909 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1312.16052 ± 346.575
2026-01-23 02:04:05,909 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1312.702, 1178.7388, 1978.804, 853.73486, 1103.6456, 1355.1202, 1855.4319, 897.0102, 1380.4857, 1205.9325]
2026-01-23 02:04:05,909 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [401.0, 355.0, 604.0, 254.0, 328.0, 407.0, 564.0, 268.0, 419.0, 366.0]
2026-01-23 02:04:05,915 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 45 minutes, 56 seconds)
2026-01-23 02:05:38,490 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:05:43,255 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1637.16443 ± 845.456
2026-01-23 02:05:43,255 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [2948.6545, 930.11566, 975.8611, 1519.9236, 1097.0216, 713.8238, 2586.0632, 1104.5316, 1410.3475, 3085.3015]
2026-01-23 02:05:43,255 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [887.0, 284.0, 301.0, 465.0, 328.0, 244.0, 806.0, 331.0, 424.0, 1000.0]
2026-01-23 02:05:43,265 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 45 minutes, 3 seconds)
2026-01-23 02:07:12,078 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:07:18,403 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2113.86206 ± 973.801
2026-01-23 02:07:18,404 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [2770.9045, 3141.1946, 1136.2427, 3186.3599, 915.449, 3067.033, 3187.9744, 1336.7466, 962.6465, 1434.0696]
2026-01-23 02:07:18,404 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [860.0, 1000.0, 346.0, 1000.0, 330.0, 1000.0, 1000.0, 399.0, 294.0, 435.0]
2026-01-23 02:07:18,411 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 42 minutes, 26 seconds)
2026-01-23 02:08:51,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:08:56,551 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1715.69604 ± 679.584
2026-01-23 02:08:56,552 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1160.5109, 3114.5586, 2222.0366, 910.88586, 1384.2955, 2516.4885, 1923.1667, 1601.3864, 1042.5918, 1281.0394]
2026-01-23 02:08:56,552 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [357.0, 1000.0, 683.0, 274.0, 412.0, 778.0, 584.0, 483.0, 316.0, 382.0]
2026-01-23 02:08:56,558 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 40 minutes, 52 seconds)
2026-01-23 02:10:25,633 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:10:31,399 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1948.74731 ± 736.324
2026-01-23 02:10:31,399 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1760.5284, 1215.1842, 1439.9734, 3035.8054, 1760.1405, 1140.5651, 3014.0977, 1722.8378, 1360.279, 3038.0605]
2026-01-23 02:10:31,399 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [553.0, 367.0, 434.0, 1000.0, 531.0, 338.0, 1000.0, 537.0, 414.0, 1000.0]
2026-01-23 02:10:31,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 38 minutes, 24 seconds)
2026-01-23 02:12:07,801 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:12:12,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1767.44922 ± 819.610
2026-01-23 02:12:12,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [942.10864, 3026.0825, 1151.525, 1899.2596, 1162.9508, 2714.0042, 3084.1501, 1590.3754, 1103.9669, 1000.0676]
2026-01-23 02:12:12,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [292.0, 1000.0, 339.0, 573.0, 348.0, 816.0, 1000.0, 541.0, 348.0, 337.0]
2026-01-23 02:12:12,995 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 37 minutes, 24 seconds)
2026-01-23 02:13:39,520 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:13:43,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1369.36853 ± 860.553
2026-01-23 02:13:43,472 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3050.0518, 1379.8896, 1620.2352, 1405.9955, 1309.2255, 2516.4219, 298.60178, 1166.0939, 923.2281, 23.94147]
2026-01-23 02:13:43,472 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 420.0, 531.0, 426.0, 389.0, 764.0, 129.0, 353.0, 296.0, 27.0]
2026-01-23 02:13:43,478 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 34 minutes, 26 seconds)
2026-01-23 02:15:17,377 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:15:20,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1248.34631 ± 420.728
2026-01-23 02:15:20,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [884.2469, 1060.0187, 953.0116, 1925.0067, 2001.0706, 1098.7705, 1700.4498, 993.99677, 950.86, 916.0299]
2026-01-23 02:15:20,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [265.0, 330.0, 290.0, 586.0, 609.0, 327.0, 513.0, 303.0, 315.0, 274.0]
2026-01-23 02:15:20,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 33 minutes, 15 seconds)
2026-01-23 02:16:53,175 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:16:59,183 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1955.18091 ± 932.107
2026-01-23 02:16:59,184 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [2947.1052, 1726.0702, 2949.3247, 979.57965, 1656.3665, 2968.9246, 1438.3748, 3045.4985, 214.2985, 1626.2661]
2026-01-23 02:16:59,184 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 541.0, 1000.0, 297.0, 503.0, 1000.0, 436.0, 1000.0, 95.0, 505.0]
2026-01-23 02:16:59,190 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 31 minutes, 42 seconds)
2026-01-23 02:18:38,763 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:18:44,239 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1868.33667 ± 932.603
2026-01-23 02:18:44,240 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [930.32715, 2667.8953, 1615.1252, 1429.3447, 1417.5885, 2659.6575, 3055.0203, 3118.561, 1677.8009, 112.047554]
2026-01-23 02:18:44,240 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [279.0, 811.0, 495.0, 439.0, 433.0, 809.0, 1000.0, 1000.0, 555.0, 59.0]
2026-01-23 02:18:44,247 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 31 minutes, 59 seconds)
2026-01-23 02:20:08,853 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:20:12,433 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1321.70032 ± 362.707
2026-01-23 02:20:12,433 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [955.4201, 1173.053, 1149.5787, 934.3598, 1953.4868, 1005.7691, 1391.1414, 1996.2771, 1430.6853, 1227.2313]
2026-01-23 02:20:12,434 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [293.0, 347.0, 343.0, 280.0, 590.0, 309.0, 425.0, 606.0, 450.0, 382.0]
2026-01-23 02:20:12,440 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 27 minutes, 53 seconds)
2026-01-23 02:21:43,968 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:21:50,336 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2138.19580 ± 796.800
2026-01-23 02:21:50,336 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [950.9525, 3072.2188, 3046.8716, 2298.875, 2416.027, 1118.1027, 1063.7753, 2469.1863, 1920.8953, 3025.0547]
2026-01-23 02:21:50,336 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [291.0, 1000.0, 1000.0, 697.0, 720.0, 333.0, 358.0, 755.0, 583.0, 1000.0]
2026-01-23 02:21:50,342 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 27 minutes, 38 seconds)
2026-01-23 02:23:26,133 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:23:32,196 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1974.94116 ± 1010.545
2026-01-23 02:23:32,196 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3109.5288, 2992.6016, 2505.4731, 2756.9268, 367.49484, 421.44534, 2911.3638, 937.6252, 1760.8073, 1986.1467]
2026-01-23 02:23:32,196 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 783.0, 839.0, 149.0, 165.0, 1000.0, 285.0, 529.0, 679.0]
2026-01-23 02:23:32,204 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 26 minutes, 48 seconds)
2026-01-23 02:25:03,305 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:25:06,279 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1112.79260 ± 373.548
2026-01-23 02:25:06,279 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1005.6806, 937.1518, 961.6112, 950.0522, 2195.423, 964.38354, 931.75854, 962.07587, 1274.3633, 945.4257]
2026-01-23 02:25:06,279 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [315.0, 280.0, 290.0, 295.0, 664.0, 296.0, 281.0, 298.0, 382.0, 286.0]
2026-01-23 02:25:06,290 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 24 minutes, 25 seconds)
2026-01-23 02:26:37,835 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:26:41,159 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1200.60840 ± 465.433
2026-01-23 02:26:41,160 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [937.975, 2471.2795, 933.9878, 956.13446, 1450.6371, 1428.2737, 960.75287, 955.39923, 956.8915, 954.753]
2026-01-23 02:26:41,160 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [279.0, 758.0, 285.0, 294.0, 447.0, 431.0, 294.0, 288.0, 294.0, 292.0]
2026-01-23 02:26:41,168 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 21 minutes, 4 seconds)
2026-01-23 02:28:12,616 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:28:19,705 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2385.45947 ± 706.852
2026-01-23 02:28:19,706 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1545.7754, 3114.0442, 3014.4714, 1123.4955, 1914.4618, 2995.757, 3038.7126, 1815.0209, 3026.9778, 2265.8784]
2026-01-23 02:28:19,706 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [468.0, 1000.0, 959.0, 336.0, 583.0, 1000.0, 1000.0, 612.0, 1000.0, 691.0]
2026-01-23 02:28:19,714 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 21 minutes, 12 seconds)
2026-01-23 02:29:51,985 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:29:57,677 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1915.59802 ± 1109.897
2026-01-23 02:29:57,678 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3014.3772, 3000.7346, 116.968544, 938.7037, 968.99927, 3012.187, 970.96814, 3124.1846, 1205.325, 2803.5325]
2026-01-23 02:29:57,678 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 62.0, 290.0, 296.0, 1000.0, 296.0, 1000.0, 371.0, 844.0]
2026-01-23 02:29:57,685 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 19 minutes, 35 seconds)
2026-01-23 02:31:28,014 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:31:34,789 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2322.23828 ± 677.355
2026-01-23 02:31:34,789 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1464.255, 1704.6461, 2392.6814, 2227.5767, 3167.2036, 3060.0044, 1584.3975, 3046.1736, 1536.8943, 3038.5508]
2026-01-23 02:31:34,789 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [450.0, 517.0, 722.0, 669.0, 1000.0, 1000.0, 477.0, 1000.0, 459.0, 1000.0]
2026-01-23 02:31:34,796 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 17 minutes, 12 seconds)
2026-01-23 02:33:12,482 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:33:20,207 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2561.45142 ± 753.915
2026-01-23 02:33:20,207 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3043.956, 2998.4983, 942.22675, 3204.6682, 1477.6152, 2020.3373, 2801.905, 3084.143, 2974.1265, 3067.0369]
2026-01-23 02:33:20,207 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 288.0, 1000.0, 501.0, 683.0, 857.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:33:20,207 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (2561.45) for latency DatasetOffice
2026-01-23 02:33:20,214 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 17 minutes, 22 seconds)
2026-01-23 02:34:47,617 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:34:54,313 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2075.66016 ± 1081.223
2026-01-23 02:34:54,313 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [674.8118, 843.8719, 654.1627, 840.941, 2940.8252, 2981.5503, 2977.826, 2937.2593, 2922.3232, 2983.0322]
2026-01-23 02:34:54,313 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [241.0, 369.0, 229.0, 255.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:34:54,320 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 15 minutes, 36 seconds)
2026-01-23 02:36:29,913 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:36:33,742 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1417.38379 ± 674.419
2026-01-23 02:36:33,742 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [2059.9956, 870.16187, 3120.0325, 923.94855, 1194.8256, 1185.8854, 924.05176, 1224.3864, 1720.1705, 950.37946]
2026-01-23 02:36:33,742 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [624.0, 262.0, 1000.0, 275.0, 357.0, 351.0, 274.0, 389.0, 513.0, 285.0]
2026-01-23 02:36:33,749 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 14 minutes, 6 seconds)
2026-01-23 02:38:06,172 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:38:09,239 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1172.89185 ± 360.585
2026-01-23 02:38:09,239 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [957.0474, 949.79236, 912.7007, 1756.9136, 1017.5788, 883.2457, 946.5789, 1131.1827, 1968.5353, 1205.343]
2026-01-23 02:38:09,239 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [289.0, 285.0, 301.0, 525.0, 305.0, 261.0, 289.0, 336.0, 588.0, 372.0]
2026-01-23 02:38:09,250 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 12 minutes, 5 seconds)
2026-01-23 02:39:39,113 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:39:44,682 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1785.31519 ± 1386.186
2026-01-23 02:39:44,682 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3072.2742, 2557.929, 144.4566, 56.147846, 183.01685, 52.89143, 2459.6897, 3098.6592, 3122.2966, 3105.7903]
2026-01-23 02:39:44,682 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 837.0, 131.0, 38.0, 96.0, 37.0, 759.0, 1000.0, 999.0, 1000.0]
2026-01-23 02:39:44,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 10 minutes, 13 seconds)
2026-01-23 02:41:13,869 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:41:21,044 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2416.63916 ± 1015.172
2026-01-23 02:41:21,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [2933.5396, 802.94574, 877.5354, 3213.7551, 3031.353, 3153.4858, 3001.683, 3152.4763, 932.5086, 3067.111]
2026-01-23 02:41:21,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [887.0, 280.0, 265.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 303.0, 959.0]
2026-01-23 02:41:21,052 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 7 minutes, 19 seconds)
2026-01-23 02:42:54,551 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:43:01,599 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2396.40625 ± 833.655
2026-01-23 02:43:01,600 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3142.2983, 1172.7336, 3080.3464, 3150.3877, 948.53, 3121.6804, 2482.312, 3161.5435, 2013.5593, 1690.671]
2026-01-23 02:43:01,600 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 387.0, 1000.0, 1000.0, 291.0, 1000.0, 744.0, 1000.0, 660.0, 517.0]
2026-01-23 02:43:01,608 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 6 minutes, 35 seconds)
2026-01-23 02:44:28,177 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:44:33,132 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1589.39343 ± 1157.062
2026-01-23 02:44:33,132 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [2994.3723, 2951.9033, 2602.8755, 2927.7615, 32.5088, 1766.9487, 1243.4816, 235.31339, 216.03818, 922.73236]
2026-01-23 02:44:33,132 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 988.0, 777.0, 951.0, 25.0, 606.0, 469.0, 104.0, 98.0, 275.0]
2026-01-23 02:44:33,142 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 3 minutes, 55 seconds)
2026-01-23 02:46:08,134 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:46:15,965 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2599.04736 ± 752.689
2026-01-23 02:46:15,965 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1242.0735, 3063.885, 3044.9314, 3120.211, 3072.251, 1364.2792, 3085.0208, 1795.7006, 3059.91, 3142.2097]
2026-01-23 02:46:15,965 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [376.0, 1000.0, 1000.0, 1000.0, 1000.0, 452.0, 1000.0, 558.0, 1000.0, 1000.0]
2026-01-23 02:46:15,965 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (2599.05) for latency DatasetOffice
2026-01-23 02:46:15,974 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 3 minutes, 16 seconds)
2026-01-23 02:47:49,019 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:47:53,166 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1506.18347 ± 676.736
2026-01-23 02:47:53,166 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1183.9636, 2290.669, 919.3742, 1007.3053, 1433.4961, 1369.5106, 1099.2771, 1421.8032, 1120.8375, 3215.5986]
2026-01-23 02:47:53,166 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [362.0, 691.0, 277.0, 305.0, 432.0, 409.0, 330.0, 430.0, 337.0, 1000.0]
2026-01-23 02:47:53,175 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 1 minute, 52 seconds)
2026-01-23 02:49:25,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:49:29,135 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1162.38879 ± 816.730
2026-01-23 02:49:29,136 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1956.4559, 1146.0309, 917.9563, 1141.8275, 2450.7595, 99.16556, 276.989, 45.956806, 1370.1162, 2218.6296]
2026-01-23 02:49:29,136 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [660.0, 347.0, 275.0, 344.0, 791.0, 58.0, 182.0, 64.0, 433.0, 680.0]
2026-01-23 02:49:29,145 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 11 seconds)
2026-01-23 02:50:58,567 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:51:01,946 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1216.43982 ± 681.710
2026-01-23 02:51:01,946 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [946.5275, 1471.6825, 890.35297, 720.721, 1078.2656, 953.2265, 939.7534, 1225.6088, 3164.7722, 773.4871]
2026-01-23 02:51:01,946 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [283.0, 439.0, 263.0, 254.0, 335.0, 285.0, 282.0, 370.0, 1000.0, 248.0]
2026-01-23 02:51:01,957 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 65/100 (estimated time remaining: 57 minutes, 38 seconds)
2026-01-23 02:52:29,819 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:52:32,360 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 978.22919 ± 73.527
2026-01-23 02:52:32,360 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1098.1531, 954.42615, 946.5384, 940.51166, 869.12964, 1090.2803, 912.34955, 1011.1724, 1038.1642, 921.5665]
2026-01-23 02:52:32,360 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [328.0, 293.0, 293.0, 288.0, 256.0, 327.0, 277.0, 307.0, 308.0, 279.0]
2026-01-23 02:52:32,368 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 66/100 (estimated time remaining: 55 minutes, 54 seconds)
2026-01-23 02:54:04,478 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:54:09,007 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1621.51758 ± 845.655
2026-01-23 02:54:09,007 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [960.56445, 963.4669, 951.3316, 1071.3145, 983.4589, 1816.3998, 3136.1104, 947.9147, 2544.749, 2839.8662]
2026-01-23 02:54:09,007 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [291.0, 296.0, 288.0, 362.0, 315.0, 549.0, 1000.0, 286.0, 769.0, 866.0]
2026-01-23 02:54:09,015 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 67/100 (estimated time remaining: 53 minutes, 36 seconds)
2026-01-23 02:55:39,568 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:55:43,823 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1332.32568 ± 1218.027
2026-01-23 02:55:43,823 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [46.173912, 1695.6707, 47.531456, 960.5498, 314.5442, 193.24954, 3011.0688, 938.89594, 3059.6638, 3055.9084]
2026-01-23 02:55:43,823 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [63.0, 581.0, 51.0, 315.0, 216.0, 87.0, 1000.0, 308.0, 1000.0, 1000.0]
2026-01-23 02:55:43,831 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 68/100 (estimated time remaining: 51 minutes, 46 seconds)
2026-01-23 02:57:21,608 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:57:24,788 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1199.60706 ± 502.879
2026-01-23 02:57:24,788 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1037.5087, 1220.0894, 904.4784, 1160.3948, 884.44226, 992.79517, 2676.6086, 1115.9264, 953.2383, 1050.5874]
2026-01-23 02:57:24,788 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [312.0, 364.0, 267.0, 345.0, 263.0, 297.0, 833.0, 330.0, 286.0, 316.0]
2026-01-23 02:57:24,796 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 69/100 (estimated time remaining: 50 minutes, 44 seconds)
2026-01-23 02:58:57,840 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:59:04,265 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2231.91162 ± 948.823
2026-01-23 02:59:04,265 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3184.7217, 1337.1644, 1071.063, 2019.6997, 3051.195, 1134.262, 3138.5798, 1035.7476, 3181.1848, 3165.4983]
2026-01-23 02:59:04,265 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 453.0, 321.0, 612.0, 923.0, 338.0, 1000.0, 312.0, 1000.0, 1000.0]
2026-01-23 02:59:04,273 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 70/100 (estimated time remaining: 49 minutes, 50 seconds)
2026-01-23 03:00:31,054 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:00:35,132 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1315.98010 ± 1214.548
2026-01-23 03:00:35,132 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3025.2969, 3069.715, 145.28336, 82.21828, 187.3447, 155.47127, 1008.4152, 3082.2417, 1189.1066, 1214.7079]
2026-01-23 03:00:35,132 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 73.0, 88.0, 101.0, 77.0, 341.0, 1000.0, 364.0, 377.0]
2026-01-23 03:00:35,142 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 71/100 (estimated time remaining: 48 minutes, 16 seconds)
2026-01-23 03:02:03,490 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:02:07,714 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1523.70105 ± 693.540
2026-01-23 03:02:07,714 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1024.0653, 1730.5283, 933.6774, 3125.7065, 955.95807, 1896.8124, 1241.3372, 926.3365, 1134.8059, 2267.7832]
2026-01-23 03:02:07,714 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [308.0, 526.0, 278.0, 1000.0, 288.0, 599.0, 380.0, 274.0, 340.0, 687.0]
2026-01-23 03:02:07,724 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 72/100 (estimated time remaining: 46 minutes, 16 seconds)
2026-01-23 03:03:43,150 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:03:48,455 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1855.80396 ± 820.485
2026-01-23 03:03:48,455 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1480.7872, 1206.135, 2346.047, 972.7921, 3123.965, 3191.4983, 2188.045, 972.2206, 962.0225, 2114.5244]
2026-01-23 03:03:48,456 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [462.0, 362.0, 729.0, 296.0, 1000.0, 1000.0, 719.0, 298.0, 293.0, 652.0]
2026-01-23 03:03:48,464 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 73/100 (estimated time remaining: 45 minutes, 13 seconds)
2026-01-23 03:05:15,345 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:05:19,183 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1359.51917 ± 897.566
2026-01-23 03:05:19,183 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1467.9818, 3124.0022, 319.34964, 937.5167, 1211.5417, 1005.4058, 328.37143, 1103.8754, 1198.414, 2898.7322]
2026-01-23 03:05:19,183 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [440.0, 1000.0, 135.0, 281.0, 359.0, 308.0, 136.0, 328.0, 356.0, 899.0]
2026-01-23 03:05:19,192 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 74/100 (estimated time remaining: 42 minutes, 41 seconds)
2026-01-23 03:06:57,962 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:07:01,500 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1320.93994 ± 550.008
2026-01-23 03:07:01,500 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [546.5248, 967.767, 2494.3037, 1941.6287, 959.6306, 1105.5975, 1781.683, 957.8922, 1224.2219, 1230.1503]
2026-01-23 03:07:01,500 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [185.0, 289.0, 739.0, 579.0, 291.0, 327.0, 539.0, 284.0, 371.0, 364.0]
2026-01-23 03:07:01,509 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 75/100 (estimated time remaining: 41 minutes, 21 seconds)
2026-01-23 03:08:28,555 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:08:35,835 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2501.96191 ± 925.977
2026-01-23 03:08:35,835 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1972.8651, 3168.6746, 3120.2134, 50.565685, 3127.6606, 2326.634, 2681.5076, 2213.8845, 3130.8533, 3226.759]
2026-01-23 03:08:35,835 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [605.0, 1000.0, 1000.0, 30.0, 1000.0, 697.0, 797.0, 666.0, 1000.0, 1000.0]
2026-01-23 03:08:35,845 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 76/100 (estimated time remaining: 40 minutes, 3 seconds)
2026-01-23 03:10:08,620 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:10:14,082 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1874.43384 ± 1015.806
2026-01-23 03:10:14,083 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1050.4636, 3031.6133, 951.3358, 880.9228, 1169.1013, 1022.2239, 3011.8557, 3213.0273, 1224.2808, 3189.5137]
2026-01-23 03:10:14,083 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [358.0, 1000.0, 288.0, 261.0, 342.0, 308.0, 957.0, 1000.0, 375.0, 1000.0]
2026-01-23 03:10:14,091 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 77/100 (estimated time remaining: 38 minutes, 54 seconds)
2026-01-23 03:11:48,267 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:11:53,980 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1983.48218 ± 863.967
2026-01-23 03:11:53,980 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3266.8186, 970.1502, 1979.905, 1018.4296, 3111.8508, 1036.5399, 1744.7404, 1483.1177, 2067.1921, 3156.0774]
2026-01-23 03:11:53,980 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 317.0, 596.0, 310.0, 1000.0, 313.0, 526.0, 449.0, 628.0, 1000.0]
2026-01-23 03:11:53,988 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 78/100 (estimated time remaining: 37 minutes, 13 seconds)
2026-01-23 03:13:17,282 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:13:24,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2475.16284 ± 875.352
2026-01-23 03:13:24,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3125.6008, 3076.646, 3097.3835, 1096.3414, 3076.9028, 3057.9045, 3096.0757, 2783.6292, 1209.0931, 1132.0499]
2026-01-23 03:13:24,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 329.0, 1000.0, 1000.0, 1000.0, 912.0, 365.0, 349.0]
2026-01-23 03:13:24,815 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 79/100 (estimated time remaining: 35 minutes, 36 seconds)
2026-01-23 03:14:59,802 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:15:06,813 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2316.50781 ± 914.660
2026-01-23 03:15:06,813 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [521.22546, 2556.0847, 1222.7115, 3073.1304, 3038.7434, 3149.2197, 1344.9486, 3079.2246, 2128.0781, 3051.71]
2026-01-23 03:15:06,813 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [189.0, 797.0, 363.0, 1000.0, 1000.0, 1000.0, 408.0, 1000.0, 701.0, 1000.0]
2026-01-23 03:15:06,821 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 80/100 (estimated time remaining: 33 minutes, 58 seconds)
2026-01-23 03:16:35,638 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:16:39,020 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1244.84351 ± 632.517
2026-01-23 03:16:39,020 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [947.8991, 966.3716, 968.76434, 3123.8835, 1084.9866, 969.72455, 1176.0083, 1060.5563, 1198.5856, 951.6549]
2026-01-23 03:16:39,020 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [280.0, 296.0, 295.0, 1000.0, 336.0, 288.0, 350.0, 319.0, 369.0, 280.0]
2026-01-23 03:16:39,032 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 81/100 (estimated time remaining: 32 minutes, 12 seconds)
2026-01-23 03:18:16,567 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:18:23,659 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2383.42188 ± 920.758
2026-01-23 03:18:23,659 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1356.2949, 3135.8416, 1035.5771, 3093.1663, 1827.8536, 3071.2485, 3134.6843, 3081.3513, 935.63556, 3162.5657]
2026-01-23 03:18:23,659 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [404.0, 1000.0, 311.0, 1000.0, 593.0, 1000.0, 1000.0, 1000.0, 285.0, 1000.0]
2026-01-23 03:18:23,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 82/100 (estimated time remaining: 31 minutes)
2026-01-23 03:19:50,666 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:19:57,556 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2291.80811 ± 1111.298
2026-01-23 03:19:57,556 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3094.4902, 3129.3992, 3147.168, 3090.1362, 1030.7277, 2231.382, 63.339508, 3091.816, 950.47156, 3089.1533]
2026-01-23 03:19:57,556 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 310.0, 719.0, 37.0, 1000.0, 292.0, 1000.0]
2026-01-23 03:19:57,566 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 83/100 (estimated time remaining: 29 minutes)
2026-01-23 03:21:33,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:21:41,803 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2825.12524 ± 427.942
2026-01-23 03:21:41,803 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3045.7048, 3012.8452, 2985.7236, 2969.7996, 1644.8066, 3011.2014, 3052.3752, 2459.653, 3035.2192, 3033.924]
2026-01-23 03:21:41,804 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 517.0, 1000.0, 1000.0, 822.0, 1000.0, 1000.0]
2026-01-23 03:21:41,804 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (2825.13) for latency DatasetOffice
2026-01-23 03:21:41,816 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 84/100 (estimated time remaining: 28 minutes, 9 seconds)
2026-01-23 03:23:09,071 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:23:13,052 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1469.44836 ± 694.932
2026-01-23 03:23:13,052 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1637.3973, 992.5037, 1073.6731, 1103.2611, 1199.119, 2227.2104, 1081.7144, 1985.4729, 448.99377, 2945.1377]
2026-01-23 03:23:13,052 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [490.0, 301.0, 322.0, 326.0, 358.0, 673.0, 322.0, 599.0, 170.0, 905.0]
2026-01-23 03:23:13,062 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 85/100 (estimated time remaining: 25 minutes, 55 seconds)
2026-01-23 03:24:47,967 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:24:54,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2283.57861 ± 976.296
2026-01-23 03:24:54,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1286.6409, 3155.9766, 1086.7163, 3085.719, 1983.0146, 3118.391, 2642.5518, 385.6713, 2942.356, 3148.749]
2026-01-23 03:24:54,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [413.0, 1000.0, 323.0, 1000.0, 592.0, 1000.0, 849.0, 153.0, 936.0, 1000.0]
2026-01-23 03:24:54,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 86/100 (estimated time remaining: 24 minutes, 47 seconds)
2026-01-23 03:26:23,932 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:26:28,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1687.16138 ± 963.338
2026-01-23 03:26:28,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1069.5566, 983.4714, 3167.9526, 1183.7917, 1006.1868, 3147.342, 1227.4602, 950.9441, 3144.2598, 990.6479]
2026-01-23 03:26:28,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [320.0, 301.0, 1000.0, 362.0, 305.0, 1000.0, 366.0, 289.0, 1000.0, 302.0]
2026-01-23 03:26:28,765 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 87/100 (estimated time remaining: 22 minutes, 38 seconds)
2026-01-23 03:28:04,700 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:28:12,330 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2529.30518 ± 1011.760
2026-01-23 03:28:12,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3143.1272, 3158.2869, 90.3659, 3139.074, 3160.7202, 1753.032, 3121.5222, 3125.2078, 1484.6261, 3117.0896]
2026-01-23 03:28:12,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 52.0, 1000.0, 1000.0, 523.0, 1000.0, 1000.0, 454.0, 1000.0]
2026-01-23 03:28:12,342 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 88/100 (estimated time remaining: 21 minutes, 26 seconds)
2026-01-23 03:29:36,873 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:29:44,936 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2633.75220 ± 905.816
2026-01-23 03:29:44,936 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1354.6759, 3051.7947, 3083.6016, 3088.3342, 395.37088, 3020.9211, 3098.6055, 3084.2146, 3142.0088, 3017.9937]
2026-01-23 03:29:44,936 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [460.0, 1000.0, 1000.0, 1000.0, 154.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:29:44,946 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 89/100 (estimated time remaining: 19 minutes, 19 seconds)
2026-01-23 03:31:21,881 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:31:26,294 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1598.29187 ± 787.932
2026-01-23 03:31:26,294 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3185.6619, 1556.837, 2061.9749, 986.69904, 1020.7362, 2842.3645, 1148.1818, 984.57935, 836.5743, 1359.3098]
2026-01-23 03:31:26,294 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 475.0, 617.0, 304.0, 310.0, 865.0, 344.0, 304.0, 298.0, 406.0]
2026-01-23 03:31:26,309 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 90/100 (estimated time remaining: 18 minutes, 5 seconds)
2026-01-23 03:32:52,436 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:33:00,242 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2695.10938 ± 717.006
2026-01-23 03:33:00,243 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1241.4326, 3188.7122, 3121.3892, 3063.7734, 3161.514, 1718.9956, 3133.2842, 3159.5242, 1924.5082, 3237.9607]
2026-01-23 03:33:00,243 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [377.0, 1000.0, 1000.0, 939.0, 1000.0, 511.0, 1000.0, 1000.0, 574.0, 1000.0]
2026-01-23 03:33:00,253 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 91/100 (estimated time remaining: 16 minutes, 10 seconds)
2026-01-23 03:34:31,661 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:34:37,351 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1915.68591 ± 1190.955
2026-01-23 03:34:37,351 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1362.5062, 2558.572, 3161.641, 3127.6482, 360.56677, 411.5813, 2889.7166, 3248.6082, 1862.2191, 173.80083]
2026-01-23 03:34:37,351 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [403.0, 783.0, 1000.0, 1000.0, 146.0, 161.0, 933.0, 1000.0, 578.0, 81.0]
2026-01-23 03:34:37,362 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 92/100 (estimated time remaining: 14 minutes, 39 seconds)
2026-01-23 03:36:14,368 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:36:22,277 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2616.07959 ± 817.014
2026-01-23 03:36:22,277 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3134.4124, 3171.5193, 3147.98, 3127.344, 3151.6807, 1734.1581, 1505.1384, 3099.6672, 956.4659, 3132.4282]
2026-01-23 03:36:22,277 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 522.0, 489.0, 1000.0, 293.0, 1000.0]
2026-01-23 03:36:22,287 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 3 seconds)
2026-01-23 03:37:53,473 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:38:00,322 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2364.04028 ± 812.338
2026-01-23 03:38:00,323 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3150.4956, 3125.8347, 1656.0671, 2824.361, 1478.1112, 2655.575, 3116.5107, 3145.4304, 1500.3143, 987.7038]
2026-01-23 03:38:00,323 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 513.0, 855.0, 447.0, 804.0, 1000.0, 1000.0, 458.0, 330.0]
2026-01-23 03:38:00,333 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 33 seconds)
2026-01-23 03:39:26,413 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:39:33,763 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2371.91162 ± 1200.766
2026-01-23 03:39:33,764 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [3076.4949, 3147.6228, 3055.5745, 3066.4783, 3128.8396, 3097.8708, 3086.616, 1907.8717, 95.272156, 56.478336]
2026-01-23 03:39:33,764 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 630.0, 61.0, 53.0]
2026-01-23 03:39:33,773 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 95/100 (estimated time remaining: 9 minutes, 44 seconds)
2026-01-23 03:41:04,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:41:09,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1950.01050 ± 795.111
2026-01-23 03:41:09,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [2002.9735, 1229.1941, 2365.4355, 2308.2441, 1646.0369, 1132.5898, 3026.3367, 2194.4119, 479.6055, 3115.2764]
2026-01-23 03:41:09,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [608.0, 374.0, 730.0, 713.0, 487.0, 335.0, 909.0, 655.0, 177.0, 1000.0]
2026-01-23 03:41:09,935 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 9 seconds)
2026-01-23 03:42:44,650 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:42:52,612 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2711.50049 ± 788.048
2026-01-23 03:42:52,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [2572.0698, 3144.091, 3190.2183, 3166.9731, 3166.383, 3167.2466, 2706.5532, 527.95544, 3174.4783, 2299.0354]
2026-01-23 03:42:52,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [783.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 822.0, 193.0, 1000.0, 703.0]
2026-01-23 03:42:52,622 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 36 seconds)
2026-01-23 03:44:27,627 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:44:30,166 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 845.84534 ± 1092.192
2026-01-23 03:44:30,166 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1768.517, 2230.301, 686.45306, 61.587692, 121.88961, 87.726555, 94.79744, 97.30998, 75.72113, 3234.1501]
2026-01-23 03:44:30,166 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [545.0, 660.0, 240.0, 46.0, 83.0, 51.0, 54.0, 54.0, 45.0, 1000.0]
2026-01-23 03:44:30,176 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 98/100 (estimated time remaining: 4 minutes, 52 seconds)
2026-01-23 03:46:00,591 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:46:05,562 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 1773.08655 ± 845.359
2026-01-23 03:46:05,562 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1171.4546, 2800.371, 1760.0974, 1073.3969, 2760.2986, 3231.8145, 2048.708, 724.99677, 1104.8513, 1054.8743]
2026-01-23 03:46:05,562 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [348.0, 849.0, 533.0, 320.0, 829.0, 1000.0, 623.0, 250.0, 333.0, 315.0]
2026-01-23 03:46:05,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 14 seconds)
2026-01-23 03:47:34,010 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:47:43,325 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 3025.58643 ± 278.389
2026-01-23 03:47:43,325 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [2197.4575, 3053.3767, 3121.5952, 3137.3726, 3086.6992, 3107.7556, 3187.249, 3160.6853, 3112.3518, 3091.3215]
2026-01-23 03:47:43,325 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [704.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:47:43,325 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (3025.59) for latency DatasetOffice
2026-01-23 03:47:43,335 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 37 seconds)
2026-01-23 03:49:13,604 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:49:20,700 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 2477.16748 ± 568.267
2026-01-23 03:49:20,700 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [1800.7507, 2912.146, 2014.1354, 3230.3892, 2401.6436, 3159.13, 1777.2522, 3169.207, 2428.918, 1878.1046]
2026-01-23 03:49:20,701 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [541.0, 904.0, 602.0, 1000.0, 724.0, 1000.0, 536.0, 1000.0, 741.0, 608.0]
2026-01-23 03:49:20,711 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1299 [DEBUG]: Training session finished
