2026-01-23 00:54:40,977 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-halfcheetah/DatasetOffice-sac-aug-mem5 
2026-01-23 00:54:40,977 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-halfcheetah/DatasetOffice-sac-aug-mem5 
2026-01-23 00:54:40,977 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x14c4dd0e6b50>}
2026-01-23 00:54:40,977 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1159 [DEBUG]: using device: cuda
2026-01-23 00:54:40,977 baseline-sac-noisy-halfcheetah:77 [WARNING]: args.memorize_actions != args.horizon: 5 != 32
2026-01-23 00:54:41,126 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1181 [INFO]: Creating new trainer
2026-01-23 00:54:41,143 baseline-sac-noisy-halfcheetah:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=47, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2026-01-23 00:54:41,143 baseline-sac-noisy-halfcheetah:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=53, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-23 00:54:42,052 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1242 [DEBUG]: Starting training session...
2026-01-23 00:54:42,052 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 1/100
2026-01-23 00:56:09,211 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:56:18,151 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: -409.97675 ± 77.610
2026-01-23 00:56:18,152 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [-357.07648, -502.19998, -310.31653, -284.71292, -484.9601, -501.19623, -418.49783, -384.90665, -364.64468, -491.25632]
2026-01-23 00:56:18,152 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:56:18,152 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (-409.98) for latency DatasetOffice
2026-01-23 00:56:18,155 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 38 minutes, 34 seconds)
2026-01-23 00:57:50,162 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:57:59,111 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: -327.76752 ± 64.450
2026-01-23 00:57:59,112 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [-250.5623, -271.78973, -358.23682, -236.48643, -337.3924, -342.88458, -464.4597, -389.04742, -311.16907, -315.64703]
2026-01-23 00:57:59,112 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:57:59,112 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (-327.77) for latency DatasetOffice
2026-01-23 00:57:59,120 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 40 minutes, 56 seconds)
2026-01-23 00:59:31,275 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:59:40,288 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: -194.29178 ± 60.958
2026-01-23 00:59:40,288 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [-127.43157, -314.16592, -212.80444, -263.45322, -196.88968, -203.27443, -221.29355, -162.53798, -136.15118, -104.91583]
2026-01-23 00:59:40,288 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:59:40,288 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (-194.29) for latency DatasetOffice
2026-01-23 00:59:40,292 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 40 minutes, 43 seconds)
2026-01-23 01:01:12,458 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:01:21,319 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: -137.35416 ± 59.159
2026-01-23 01:01:21,319 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [-87.030426, -127.786766, -16.5365, -132.7953, -143.1513, -216.96442, -158.62805, -211.52437, -191.62775, -87.49661]
2026-01-23 01:01:21,319 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:01:21,319 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (-137.35) for latency DatasetOffice
2026-01-23 01:01:21,322 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 39 minutes, 42 seconds)
2026-01-23 01:02:53,511 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:03:02,423 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: -147.71524 ± 81.227
2026-01-23 01:03:02,423 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [-193.55135, -167.70102, -202.82478, -19.0202, -187.77744, -66.82739, -204.6715, -67.097534, -75.72619, -291.95505]
2026-01-23 01:03:02,423 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:03:02,426 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 38 minutes, 27 seconds)
2026-01-23 01:04:34,499 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:04:43,464 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: -144.30936 ± 74.011
2026-01-23 01:04:43,464 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [-119.45305, -125.167404, -110.55452, -264.22897, -227.17737, -138.72244, -118.823235, -211.21878, 17.69836, -145.44612]
2026-01-23 01:04:43,464 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:04:43,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 38 minutes, 19 seconds)
2026-01-23 01:06:15,683 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:06:24,658 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: -68.13995 ± 100.767
2026-01-23 01:06:24,659 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [-55.51045, -54.641026, 58.00368, -197.31348, -128.17007, 44.950993, -173.03438, -220.04639, -7.9283133, 52.289932]
2026-01-23 01:06:24,659 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:06:24,659 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (-68.14) for latency DatasetOffice
2026-01-23 01:06:24,662 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 36 minutes, 43 seconds)
2026-01-23 01:07:56,906 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:08:05,905 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: -62.11047 ± 68.289
2026-01-23 01:08:05,905 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [-33.742126, -41.13291, -170.42332, -149.69061, 10.132898, -159.07053, 15.70918, -3.0134008, -69.71746, -20.156347]
2026-01-23 01:08:05,906 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:08:05,906 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (-62.11) for latency DatasetOffice
2026-01-23 01:08:05,916 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 35 minutes, 3 seconds)
2026-01-23 01:09:38,143 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:09:47,171 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: -100.61305 ± 46.085
2026-01-23 01:09:47,171 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [-132.26059, -83.30395, -178.7782, -45.945415, -105.02355, -161.24304, -97.43239, -40.941795, -43.37361, -117.827965]
2026-01-23 01:09:47,171 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:09:47,175 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 33 minutes, 26 seconds)
2026-01-23 01:11:19,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:11:28,427 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: -73.76028 ± 122.163
2026-01-23 01:11:28,427 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [-89.48403, -42.23145, -13.728657, -6.071189, -83.400734, 167.97885, -307.33148, -103.236626, -31.260866, -228.83664]
2026-01-23 01:11:28,427 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:11:28,431 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 31 minutes, 48 seconds)
2026-01-23 01:13:00,710 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:13:09,654 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 51.45919 ± 78.304
2026-01-23 01:13:09,654 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [-2.9927566, 217.85988, 43.051746, 43.808723, 111.976746, 141.7179, -21.939535, -12.228723, 37.1341, -43.796173]
2026-01-23 01:13:09,654 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:13:09,654 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (51.46) for latency DatasetOffice
2026-01-23 01:13:09,660 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 30 minutes, 10 seconds)
2026-01-23 01:14:41,957 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:14:50,881 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 40.68455 ± 175.902
2026-01-23 01:14:50,881 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [115.64404, -29.614176, 110.37428, -175.21782, -195.39687, 148.69772, -168.26749, 62.37846, 394.58835, 143.659]
2026-01-23 01:14:50,881 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:14:50,885 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 28 minutes, 29 seconds)
2026-01-23 01:16:23,182 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:16:32,148 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 26.71080 ± 148.384
2026-01-23 01:16:32,148 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [-236.9859, 54.160522, 122.47309, -45.820072, 202.26321, 137.38737, -104.11602, -176.40767, 160.79468, 153.35876]
2026-01-23 01:16:32,148 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:16:32,152 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 26 minutes, 48 seconds)
2026-01-23 01:18:04,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:18:13,355 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 207.35100 ± 179.145
2026-01-23 01:18:13,355 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [254.88254, 306.6729, 228.05533, 150.43706, 370.8246, -116.4662, 144.32207, 403.71118, 418.05164, -86.98113]
2026-01-23 01:18:13,355 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:18:13,355 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (207.35) for latency DatasetOffice
2026-01-23 01:18:13,359 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 25 minutes, 6 seconds)
2026-01-23 01:19:45,676 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:19:54,574 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 251.61246 ± 284.328
2026-01-23 01:19:54,574 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [28.473333, 199.80963, 258.76764, -417.76733, 761.21405, 395.7961, 271.5466, 316.4968, 311.99908, 389.78882]
2026-01-23 01:19:54,574 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:19:54,574 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (251.61) for latency DatasetOffice
2026-01-23 01:19:54,578 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 23 minutes, 24 seconds)
2026-01-23 01:21:26,761 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:21:35,662 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 168.22546 ± 335.329
2026-01-23 01:21:35,662 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [48.89481, -164.02333, 171.54411, 508.5874, 113.16649, 513.70624, 859.8394, -180.30853, -203.31422, 14.162301]
2026-01-23 01:21:35,662 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:21:35,665 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 21 minutes, 40 seconds)
2026-01-23 01:23:07,973 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:23:16,902 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 198.34526 ± 208.791
2026-01-23 01:23:16,902 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [51.602047, 161.09378, 580.51263, -58.181736, 175.25035, 219.43475, 338.2532, 510.26218, -55.2285, 60.45394]
2026-01-23 01:23:16,902 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:23:16,906 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 19 minutes, 59 seconds)
2026-01-23 01:24:49,137 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:24:58,043 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 250.30974 ± 333.819
2026-01-23 01:24:58,043 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [216.03978, 601.5976, 291.12567, -12.725187, -92.72002, 855.57666, -116.47433, 454.4226, -193.1551, 499.40988]
2026-01-23 01:24:58,043 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:24:58,047 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 18 minutes, 16 seconds)
2026-01-23 01:26:30,335 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:26:39,198 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 754.81360 ± 408.264
2026-01-23 01:26:39,198 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [903.47565, -4.1476436, 982.7218, 852.38324, 1157.657, 1097.9011, 722.4648, 829.4098, 1048.0771, -41.807266]
2026-01-23 01:26:39,198 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:26:39,198 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (754.81) for latency DatasetOffice
2026-01-23 01:26:39,205 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 16 minutes, 34 seconds)
2026-01-23 01:28:11,414 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:28:20,446 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 787.12244 ± 261.410
2026-01-23 01:28:20,446 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [413.23087, 1219.7021, 950.8167, 910.5279, 844.6538, 671.91144, 742.02924, 697.1206, 1082.0883, 339.14297]
2026-01-23 01:28:20,446 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:28:20,446 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (787.12) for latency DatasetOffice
2026-01-23 01:28:20,451 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 14 minutes, 53 seconds)
2026-01-23 01:29:52,458 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:30:01,286 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1139.31348 ± 250.364
2026-01-23 01:30:01,286 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [974.02313, 1031.3826, 981.4617, 791.68097, 1567.8282, 1221.3948, 806.8332, 1320.2738, 1450.0996, 1248.1567]
2026-01-23 01:30:01,286 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:30:01,286 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (1139.31) for latency DatasetOffice
2026-01-23 01:30:01,290 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 13 minutes, 8 seconds)
2026-01-23 01:31:32,478 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:31:41,206 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 834.82013 ± 372.359
2026-01-23 01:31:41,206 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [1244.4954, 1222.353, 878.3632, 620.22876, 546.3249, 233.91582, 708.8855, 1152.4655, 1348.5581, 392.61063]
2026-01-23 01:31:41,206 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:31:41,211 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 11 minutes, 7 seconds)
2026-01-23 01:33:12,046 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:33:20,744 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1052.47461 ± 462.821
2026-01-23 01:33:20,744 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [918.19226, -104.97061, 609.3178, 1358.18, 1373.4506, 1266.757, 1341.3845, 953.9682, 1453.6917, 1354.7742]
2026-01-23 01:33:20,744 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:33:20,749 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 9 minutes, 1 second)
2026-01-23 01:34:51,455 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:35:00,251 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1374.79126 ± 457.193
2026-01-23 01:35:00,251 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [1834.7178, 1445.9509, 1539.8466, 1704.9154, 356.6535, 1636.5734, 1718.3564, 668.9377, 1344.2803, 1497.6809]
2026-01-23 01:35:00,251 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:35:00,252 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (1374.79) for latency DatasetOffice
2026-01-23 01:35:00,257 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 6 minutes, 56 seconds)
2026-01-23 01:36:30,481 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:36:39,173 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1012.39050 ± 641.743
2026-01-23 01:36:39,173 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [-131.08795, 1439.7285, -159.25647, 1149.6575, 727.06683, 1225.0078, 1169.3711, 1256.1277, 1639.5308, 1807.7599]
2026-01-23 01:36:39,173 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:36:39,180 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 4 minutes, 40 seconds)
2026-01-23 01:38:09,269 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:38:18,096 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1621.30310 ± 668.464
2026-01-23 01:38:18,096 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [1761.5908, 1878.9847, 2057.8494, -346.4411, 1914.8408, 1960.1179, 1928.4403, 1628.5006, 1655.2712, 1773.8768]
2026-01-23 01:38:18,096 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:38:18,096 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (1621.30) for latency DatasetOffice
2026-01-23 01:38:18,102 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 2 minutes, 32 seconds)
2026-01-23 01:39:48,180 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:39:56,991 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1832.60742 ± 165.521
2026-01-23 01:39:56,991 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2058.9976, 1602.9059, 1847.346, 1709.3698, 1541.9829, 1950.4602, 1898.567, 1965.067, 1999.3627, 1752.016]
2026-01-23 01:39:56,991 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:39:56,991 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (1832.61) for latency DatasetOffice
2026-01-23 01:39:56,996 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 38 seconds)
2026-01-23 01:41:26,917 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:41:35,692 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1789.11230 ± 192.912
2026-01-23 01:41:35,692 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2059.906, 2018.6941, 1825.1471, 1872.7382, 1383.7391, 1751.4733, 1542.5253, 1741.7557, 1880.687, 1814.4569]
2026-01-23 01:41:35,692 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:41:35,700 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 29/100 (estimated time remaining: 1 hour, 58 minutes, 47 seconds)
2026-01-23 01:43:05,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:43:13,891 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2187.31519 ± 128.892
2026-01-23 01:43:13,891 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2126.2312, 2343.1738, 1939.2717, 2360.264, 2069.2644, 2306.0923, 2236.6875, 2255.1719, 2151.0007, 2085.9963]
2026-01-23 01:43:13,891 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:43:13,891 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (2187.32) for latency DatasetOffice
2026-01-23 01:43:13,900 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 30/100 (estimated time remaining: 1 hour, 56 minutes, 49 seconds)
2026-01-23 01:44:43,345 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:44:52,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2144.63135 ± 291.233
2026-01-23 01:44:52,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [1510.3225, 2502.8357, 2465.0298, 2078.4204, 2138.6543, 2080.079, 2066.124, 1858.9669, 2288.528, 2457.3523]
2026-01-23 01:44:52,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:44:52,102 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 55 minutes)
2026-01-23 01:46:21,506 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:46:30,223 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2018.52966 ± 653.287
2026-01-23 01:46:30,223 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [1288.3553, 2438.3997, 1325.2573, 2544.927, 2181.374, 2410.0, 2454.1729, 2468.9187, 597.6855, 2476.2046]
2026-01-23 01:46:30,223 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:46:30,230 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 53 minutes, 11 seconds)
2026-01-23 01:47:59,697 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:48:08,469 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2320.66724 ± 510.783
2026-01-23 01:48:08,469 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [829.01324, 2393.447, 2690.307, 2501.9453, 2474.7322, 2708.188, 2445.7205, 2337.1575, 2446.7947, 2379.3667]
2026-01-23 01:48:08,469 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:48:08,469 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (2320.67) for latency DatasetOffice
2026-01-23 01:48:08,477 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 51 minutes, 24 seconds)
2026-01-23 01:49:38,028 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:49:46,808 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2273.26318 ± 469.444
2026-01-23 01:49:46,809 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2434.5278, 2521.275, 2379.4653, 2401.4612, 2529.7798, 2401.5032, 2565.3113, 2468.778, 2120.9304, 909.599]
2026-01-23 01:49:46,809 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:49:46,815 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 49 minutes, 40 seconds)
2026-01-23 01:51:16,246 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:51:24,994 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1047.23743 ± 824.878
2026-01-23 01:51:24,995 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [1930.6224, 53.18632, 83.94338, 1965.9554, 312.88742, 205.96803, 916.9591, 2396.7056, 1148.48, 1457.6674]
2026-01-23 01:51:24,995 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:51:25,003 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 48 minutes, 2 seconds)
2026-01-23 01:52:55,777 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:53:04,574 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1971.62793 ± 524.886
2026-01-23 01:53:04,574 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [1818.1395, 2328.1736, 2444.4377, 1950.5503, 1765.0486, 2363.3748, 1996.0358, 561.4421, 2092.7983, 2396.2798]
2026-01-23 01:53:04,574 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:53:04,580 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 46 minutes, 42 seconds)
2026-01-23 01:54:34,283 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:54:43,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2180.85791 ± 561.762
2026-01-23 01:54:43,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2492.4036, 2678.474, 2049.947, 2181.0115, 2360.0403, 635.9749, 2086.5032, 2116.6875, 2650.94, 2556.5972]
2026-01-23 01:54:43,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:54:43,052 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 45 minutes, 8 seconds)
2026-01-23 01:56:12,881 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:56:21,640 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2117.88647 ± 738.799
2026-01-23 01:56:21,640 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2578.3464, 2634.9294, 2488.0305, 2334.5535, 2437.6895, 2482.0679, 2188.731, 1309.5444, 187.68211, 2537.2908]
2026-01-23 01:56:21,640 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:56:21,647 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 43 minutes, 33 seconds)
2026-01-23 01:57:51,326 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:57:59,955 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2310.86279 ± 525.761
2026-01-23 01:57:59,955 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2429.876, 2585.335, 2365.3757, 2324.0415, 2585.6765, 815.4446, 2105.3025, 2570.7705, 2636.6838, 2690.124]
2026-01-23 01:57:59,955 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:57:59,962 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 41 minutes, 55 seconds)
2026-01-23 01:59:29,720 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:59:38,350 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1890.52930 ± 613.832
2026-01-23 01:59:38,350 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2726.7825, 2283.7327, 2077.0881, 647.0885, 1280.0603, 2313.299, 2513.1897, 1289.4198, 2014.9016, 1759.7312]
2026-01-23 01:59:38,350 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:59:38,358 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 40 minutes, 18 seconds)
2026-01-23 02:01:08,065 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:01:16,757 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2368.29980 ± 634.884
2026-01-23 02:01:16,757 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [1810.9615, 2707.7327, 2913.0635, 2716.8345, 2703.7788, 2557.7937, 2330.3071, 677.4637, 2770.7937, 2494.2722]
2026-01-23 02:01:16,757 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:01:16,757 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (2368.30) for latency DatasetOffice
2026-01-23 02:01:16,763 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 38 minutes, 26 seconds)
2026-01-23 02:02:46,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:02:55,179 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2304.14600 ± 672.514
2026-01-23 02:02:55,179 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2560.4539, 2664.6384, 2682.6306, 2437.9983, 2759.898, 1439.6748, 2785.5664, 2559.0903, 613.19604, 2538.3164]
2026-01-23 02:02:55,179 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:02:55,185 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 36 minutes, 47 seconds)
2026-01-23 02:04:24,892 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:04:33,652 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2447.05225 ± 764.127
2026-01-23 02:04:33,652 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2819.8945, 2719.6287, 2744.2036, 2381.1917, 2565.64, 2918.999, 2658.9954, 218.54086, 2462.7788, 2980.651]
2026-01-23 02:04:33,652 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:04:33,652 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (2447.05) for latency DatasetOffice
2026-01-23 02:04:33,659 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 35 minutes, 7 seconds)
2026-01-23 02:06:03,188 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:06:11,799 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2502.07251 ± 417.839
2026-01-23 02:06:11,799 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2891.6914, 2466.5657, 2668.5222, 2638.6238, 2560.9348, 2816.984, 2753.796, 1338.0872, 2346.3596, 2539.162]
2026-01-23 02:06:11,799 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:06:11,799 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (2502.07) for latency DatasetOffice
2026-01-23 02:06:11,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 33 minutes, 27 seconds)
2026-01-23 02:07:41,404 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:07:49,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2713.74561 ± 150.428
2026-01-23 02:07:49,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2570.4648, 2431.1545, 2582.4084, 2684.8525, 2973.5254, 2811.8005, 2868.8108, 2783.1277, 2685.265, 2746.0442]
2026-01-23 02:07:49,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:07:49,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (2713.75) for latency DatasetOffice
2026-01-23 02:07:49,997 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 31 minutes, 46 seconds)
2026-01-23 02:09:19,584 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:09:28,321 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2251.57251 ± 726.429
2026-01-23 02:09:28,321 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [1785.3688, 2901.219, 2519.9304, 2164.2104, 2518.863, 2967.1716, 2626.2944, 2240.905, 307.42282, 2484.3398]
2026-01-23 02:09:28,321 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:09:28,328 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 30 minutes, 7 seconds)
2026-01-23 02:10:57,928 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:11:06,636 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2663.46436 ± 504.487
2026-01-23 02:11:06,636 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2415.0027, 2735.4036, 1266.1628, 2850.0747, 2890.4773, 3014.2522, 3128.7947, 2578.57, 2861.3398, 2894.5652]
2026-01-23 02:11:06,636 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:11:06,643 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 28 minutes, 27 seconds)
2026-01-23 02:12:36,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:12:44,940 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2814.50952 ± 172.607
2026-01-23 02:12:44,940 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2966.0588, 2851.866, 2496.0898, 2776.181, 2775.9797, 2919.8564, 2618.3188, 3090.636, 2971.9717, 2678.1392]
2026-01-23 02:12:44,940 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:12:44,940 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (2814.51) for latency DatasetOffice
2026-01-23 02:12:44,947 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 26 minutes, 47 seconds)
2026-01-23 02:14:14,584 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:14:23,308 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2474.65283 ± 657.552
2026-01-23 02:14:23,308 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2377.3796, 2946.1245, 2887.0415, 2553.4043, 1743.2776, 3047.2104, 2888.7363, 822.1867, 2671.8513, 2809.3188]
2026-01-23 02:14:23,308 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:14:23,315 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 25 minutes, 11 seconds)
2026-01-23 02:15:52,813 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:16:01,590 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2975.70972 ± 147.083
2026-01-23 02:16:01,591 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2880.2537, 2747.5508, 2869.415, 3236.8809, 2987.6755, 2871.56, 3212.1365, 3043.034, 2995.0293, 2913.5613]
2026-01-23 02:16:01,591 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:16:01,591 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (2975.71) for latency DatasetOffice
2026-01-23 02:16:01,600 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 23 minutes, 34 seconds)
2026-01-23 02:17:31,236 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:17:39,900 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2395.38599 ± 776.921
2026-01-23 02:17:39,900 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [914.6386, 2719.1418, 3081.196, 2981.4893, 2911.6658, 3061.7952, 1819.4563, 1145.9353, 2357.5745, 2960.966]
2026-01-23 02:17:39,900 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:17:39,910 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 21 minutes, 55 seconds)
2026-01-23 02:19:09,521 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:19:18,119 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3078.03101 ± 76.725
2026-01-23 02:19:18,119 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3074.8687, 3121.457, 3007.1501, 3244.096, 3118.8257, 3096.6392, 3044.463, 2939.6477, 3036.4104, 3096.7522]
2026-01-23 02:19:18,119 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:19:18,119 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (3078.03) for latency DatasetOffice
2026-01-23 02:19:18,134 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 20 minutes, 16 seconds)
2026-01-23 02:20:47,695 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:20:56,405 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2575.75635 ± 690.662
2026-01-23 02:20:56,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2984.0227, 1531.9164, 2983.1343, 2689.9597, 2817.7913, 3090.529, 2988.1536, 2964.4124, 2761.505, 946.1374]
2026-01-23 02:20:56,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:20:56,413 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 18 minutes, 38 seconds)
2026-01-23 02:22:25,956 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:22:34,699 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2521.65527 ± 916.374
2026-01-23 02:22:34,699 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [1054.0479, 1718.0187, 3084.344, 3092.81, 2918.902, 3260.3, 3048.807, 3200.7092, 3099.7344, 738.8788]
2026-01-23 02:22:34,699 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:22:34,707 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 16 minutes, 59 seconds)
2026-01-23 02:24:04,288 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:24:13,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2899.39111 ± 277.003
2026-01-23 02:24:13,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3107.237, 2725.114, 2760.667, 2963.704, 2218.1665, 3217.0642, 2895.4631, 2870.0254, 3184.212, 3052.2583]
2026-01-23 02:24:13,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:24:13,029 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 15 minutes, 21 seconds)
2026-01-23 02:25:42,574 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:25:51,223 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2828.64087 ± 498.270
2026-01-23 02:25:51,223 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [1922.6421, 3238.9434, 3237.8708, 3252.7483, 3110.9724, 2972.9148, 2962.4326, 2594.011, 1883.4708, 3110.4023]
2026-01-23 02:25:51,223 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:25:51,231 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 13 minutes, 41 seconds)
2026-01-23 02:27:20,798 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:27:29,401 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3236.21826 ± 83.013
2026-01-23 02:27:29,401 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3171.6128, 3247.6592, 3165.1062, 3191.3162, 3305.6196, 3347.7393, 3258.2227, 3068.853, 3340.8245, 3265.2312]
2026-01-23 02:27:29,401 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:27:29,401 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (3236.22) for latency DatasetOffice
2026-01-23 02:27:29,410 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 12 minutes, 3 seconds)
2026-01-23 02:28:58,984 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:29:07,698 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3123.89502 ± 89.625
2026-01-23 02:29:07,698 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3164.3582, 3138.1892, 3137.176, 2945.129, 3222.9958, 3176.3943, 2996.2717, 3102.2744, 3252.7363, 3103.4285]
2026-01-23 02:29:07,698 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:29:07,707 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 10 minutes, 25 seconds)
2026-01-23 02:30:37,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:30:45,979 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3211.53320 ± 171.530
2026-01-23 02:30:45,979 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3251.8113, 3344.344, 2764.0083, 3110.49, 3366.2065, 3298.7356, 3137.8853, 3364.5933, 3202.3838, 3274.8743]
2026-01-23 02:30:45,979 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:30:45,988 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 8 minutes, 46 seconds)
2026-01-23 02:32:15,495 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:32:24,091 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2950.60400 ± 446.001
2026-01-23 02:32:24,091 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3120.294, 3083.3904, 2333.7134, 3347.39, 2734.1802, 3236.2275, 3135.5054, 3289.5813, 1942.3884, 3283.3713]
2026-01-23 02:32:24,091 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:32:24,101 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 7 minutes, 6 seconds)
2026-01-23 02:33:53,596 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:34:02,315 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3067.12231 ± 365.525
2026-01-23 02:34:02,315 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2180.3977, 3216.503, 3245.7964, 3265.2852, 3069.8794, 3257.6516, 3010.17, 2625.4546, 3400.1047, 3399.9792]
2026-01-23 02:34:02,315 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:34:02,324 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 5 minutes, 28 seconds)
2026-01-23 02:35:31,854 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:35:40,557 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3132.40771 ± 125.061
2026-01-23 02:35:40,558 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3119.6003, 3205.517, 3233.8713, 3155.122, 3339.859, 2914.5137, 3093.9211, 3021.7473, 3253.2544, 2986.6716]
2026-01-23 02:35:40,558 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:35:40,566 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 3 minutes, 51 seconds)
2026-01-23 02:37:10,133 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:37:18,802 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3063.92432 ± 603.453
2026-01-23 02:37:18,802 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3493.8381, 3246.2507, 3199.328, 1484.6342, 3385.5972, 3449.0305, 2396.9824, 3345.0977, 3269.3918, 3369.0933]
2026-01-23 02:37:18,802 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:37:18,812 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 2 minutes, 12 seconds)
2026-01-23 02:38:48,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:38:57,047 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3087.06299 ± 528.483
2026-01-23 02:38:57,047 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3183.3474, 2961.7952, 3382.9226, 3315.1082, 3245.4785, 3417.8494, 1543.1233, 3215.2114, 3319.978, 3285.8142]
2026-01-23 02:38:57,048 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:38:57,056 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 33 seconds)
2026-01-23 02:40:26,581 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:40:35,219 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3006.46655 ± 905.332
2026-01-23 02:40:35,220 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3238.4072, 3387.9473, 3389.504, 3499.1199, 3354.1252, 3304.2898, 356.48175, 3388.561, 3405.2676, 2740.961]
2026-01-23 02:40:35,220 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:40:35,229 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 65/100 (estimated time remaining: 58 minutes, 56 seconds)
2026-01-23 02:42:04,836 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:42:13,473 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3042.70264 ± 586.500
2026-01-23 02:42:13,473 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2289.228, 3440.2122, 3154.898, 3512.8728, 3093.562, 3469.4358, 2953.9421, 1637.1919, 3446.1987, 3429.484]
2026-01-23 02:42:13,474 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:42:13,485 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 66/100 (estimated time remaining: 57 minutes, 18 seconds)
2026-01-23 02:43:43,148 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:43:51,708 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3407.06689 ± 133.142
2026-01-23 02:43:51,708 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3567.6606, 3595.2964, 3230.6948, 3227.6978, 3560.188, 3288.016, 3416.2593, 3378.7495, 3318.707, 3487.398]
2026-01-23 02:43:51,708 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:43:51,708 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (3407.07) for latency DatasetOffice
2026-01-23 02:43:51,718 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 67/100 (estimated time remaining: 55 minutes, 39 seconds)
2026-01-23 02:45:20,474 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:45:29,026 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3447.63989 ± 146.105
2026-01-23 02:45:29,026 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3309.326, 3624.1343, 3455.21, 3337.7988, 3315.0342, 3486.4058, 3222.967, 3433.852, 3682.8853, 3608.7854]
2026-01-23 02:45:29,026 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:45:29,027 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (3447.64) for latency DatasetOffice
2026-01-23 02:45:29,037 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 68/100 (estimated time remaining: 53 minutes, 55 seconds)
2026-01-23 02:46:57,682 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:47:06,218 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1825.07263 ± 1208.526
2026-01-23 02:47:06,218 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3407.1016, 2968.0928, 204.7111, 2691.4026, 2464.724, 566.95636, 393.29218, 2925.1162, 2222.0837, 407.2461]
2026-01-23 02:47:06,218 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:47:06,229 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 69/100 (estimated time remaining: 52 minutes, 10 seconds)
2026-01-23 02:48:34,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:48:43,256 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3248.99878 ± 593.616
2026-01-23 02:48:43,256 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3546.4697, 3386.8152, 1494.751, 3526.333, 3397.649, 3363.904, 3257.6228, 3419.1282, 3643.804, 3453.5115]
2026-01-23 02:48:43,256 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:48:43,265 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 70/100 (estimated time remaining: 50 minutes, 25 seconds)
2026-01-23 02:50:11,996 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:50:20,647 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3002.93042 ± 796.985
2026-01-23 02:50:20,647 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2067.0046, 3435.4539, 3535.0452, 3381.8252, 3298.9243, 3390.0696, 1000.71014, 2832.5786, 3516.1753, 3571.518]
2026-01-23 02:50:20,647 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:50:20,656 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 71/100 (estimated time remaining: 48 minutes, 43 seconds)
2026-01-23 02:51:49,412 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:51:58,037 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2497.83789 ± 1098.823
2026-01-23 02:51:58,038 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [953.84827, 3379.8655, 3102.6787, 3447.1052, 3043.0024, 3560.8872, 2934.693, 578.7908, 2944.2925, 1033.2155]
2026-01-23 02:51:58,038 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:51:58,049 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 72/100 (estimated time remaining: 47 minutes)
2026-01-23 02:53:26,611 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:53:35,142 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2726.09692 ± 842.950
2026-01-23 02:53:35,142 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3632.4287, 3293.5525, 1686.663, 3011.9678, 1276.0381, 2218.427, 3269.8906, 3606.0225, 1810.6956, 3455.2837]
2026-01-23 02:53:35,142 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:53:35,151 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 73/100 (estimated time remaining: 45 minutes, 22 seconds)
2026-01-23 02:55:03,846 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:55:12,425 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2036.22827 ± 974.573
2026-01-23 02:55:12,425 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [1036.3307, 3201.7468, 2116.7727, 1362.621, 579.90015, 950.3861, 2880.1174, 3232.4136, 3135.2239, 1866.7704]
2026-01-23 02:55:12,425 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:55:12,434 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 74/100 (estimated time remaining: 43 minutes, 45 seconds)
2026-01-23 02:56:41,074 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:56:49,647 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2889.29077 ± 772.448
2026-01-23 02:56:49,647 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3440.0088, 3045.3623, 3393.4602, 1231.3242, 1567.435, 2927.1401, 3407.472, 3054.0957, 3559.05, 3267.5603]
2026-01-23 02:56:49,647 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:56:49,656 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 75/100 (estimated time remaining: 42 minutes, 9 seconds)
2026-01-23 02:58:18,380 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:58:26,942 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2869.04590 ± 906.592
2026-01-23 02:58:26,942 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2280.2195, 3606.8271, 1387.094, 1064.9548, 3325.9922, 3641.2617, 3376.3838, 3008.646, 3506.3015, 3492.7798]
2026-01-23 02:58:26,942 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:58:26,951 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 76/100 (estimated time remaining: 40 minutes, 31 seconds)
2026-01-23 02:59:55,735 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:00:04,207 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3394.51807 ± 175.099
2026-01-23 03:00:04,208 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3272.4346, 3465.9993, 3220.8062, 3475.202, 3679.7993, 3129.6775, 3208.8193, 3381.9385, 3477.6091, 3632.8948]
2026-01-23 03:00:04,208 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:00:04,226 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 77/100 (estimated time remaining: 38 minutes, 53 seconds)
2026-01-23 03:01:32,926 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:01:41,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3122.47754 ± 980.768
2026-01-23 03:01:41,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3520.3613, 3651.8303, 3225.0828, 3550.5344, 537.099, 3808.808, 3601.0977, 3595.7144, 3657.0928, 2077.1536]
2026-01-23 03:01:41,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:01:41,523 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 78/100 (estimated time remaining: 37 minutes, 17 seconds)
2026-01-23 03:03:10,318 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:03:18,890 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3612.67773 ± 151.017
2026-01-23 03:03:18,890 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3413.7288, 3794.046, 3478.5605, 3863.256, 3640.948, 3357.5579, 3615.448, 3612.0356, 3653.26, 3697.9368]
2026-01-23 03:03:18,890 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:03:18,890 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (3612.68) for latency DatasetOffice
2026-01-23 03:03:18,900 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 79/100 (estimated time remaining: 35 minutes, 40 seconds)
2026-01-23 03:04:47,747 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:04:56,346 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3373.55151 ± 446.566
2026-01-23 03:04:56,346 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2361.0566, 3575.4377, 3220.671, 2736.1262, 3627.654, 3715.8567, 3447.3425, 3797.874, 3586.937, 3666.5579]
2026-01-23 03:04:56,346 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:04:56,356 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 80/100 (estimated time remaining: 34 minutes, 4 seconds)
2026-01-23 03:06:25,169 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:06:33,812 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3177.33716 ± 417.058
2026-01-23 03:06:33,812 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2533.2136, 2921.4636, 3615.6287, 2517.4832, 3426.3696, 3746.6616, 3223.842, 2896.4202, 3545.543, 3346.745]
2026-01-23 03:06:33,812 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:06:33,822 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 81/100 (estimated time remaining: 32 minutes, 27 seconds)
2026-01-23 03:08:02,694 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:08:11,285 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3786.26245 ± 161.609
2026-01-23 03:08:11,285 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3755.8518, 3619.3362, 4022.8123, 3912.8044, 3593.29, 3893.2126, 3846.8315, 3518.956, 3730.2815, 3969.2473]
2026-01-23 03:08:11,285 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:08:11,285 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (3786.26) for latency DatasetOffice
2026-01-23 03:08:11,296 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 82/100 (estimated time remaining: 30 minutes, 50 seconds)
2026-01-23 03:09:40,112 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:09:48,734 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3050.36523 ± 1098.144
2026-01-23 03:09:48,734 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3670.2256, 3545.2458, 3769.709, 3509.2368, 3602.262, 184.16609, 3408.1726, 3535.9988, 3499.714, 1778.9193]
2026-01-23 03:09:48,734 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:09:48,745 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 83/100 (estimated time remaining: 29 minutes, 13 seconds)
2026-01-23 03:11:17,427 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:11:26,029 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3500.78003 ± 402.852
2026-01-23 03:11:26,029 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3665.752, 3765.9065, 3551.536, 3816.9133, 3328.4265, 3740.5889, 2412.3445, 3661.6062, 3284.264, 3780.4646]
2026-01-23 03:11:26,029 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:11:26,041 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 84/100 (estimated time remaining: 27 minutes, 36 seconds)
2026-01-23 03:12:54,855 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:13:03,450 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3541.75903 ± 207.070
2026-01-23 03:13:03,450 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3603.0654, 3139.926, 3189.7822, 3543.8174, 3792.539, 3547.8433, 3629.1023, 3787.0305, 3636.1106, 3548.372]
2026-01-23 03:13:03,450 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:13:03,463 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 85/100 (estimated time remaining: 25 minutes, 58 seconds)
2026-01-23 03:14:32,154 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:14:40,699 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3373.21167 ± 380.698
2026-01-23 03:14:40,699 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2475.703, 3551.3147, 3601.2317, 3743.989, 3209.252, 3343.477, 3539.9827, 2960.6057, 3794.5125, 3512.0522]
2026-01-23 03:14:40,699 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:14:40,709 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 86/100 (estimated time remaining: 24 minutes, 20 seconds)
2026-01-23 03:16:09,405 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:16:17,894 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3580.08936 ± 284.393
2026-01-23 03:16:17,894 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3684.6, 3749.8447, 3528.5305, 3683.395, 3842.865, 3760.1074, 3681.6177, 3430.1746, 3645.3088, 2794.451]
2026-01-23 03:16:17,894 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:16:17,904 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 87/100 (estimated time remaining: 22 minutes, 42 seconds)
2026-01-23 03:17:46,723 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:17:55,353 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3644.10229 ± 130.476
2026-01-23 03:17:55,353 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3541.6096, 3704.1902, 3443.532, 3652.9033, 3805.3977, 3640.871, 3681.0125, 3577.8958, 3498.7002, 3894.9092]
2026-01-23 03:17:55,353 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:17:55,366 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 88/100 (estimated time remaining: 21 minutes, 5 seconds)
2026-01-23 03:19:24,077 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:19:32,598 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3553.34766 ± 629.147
2026-01-23 03:19:32,598 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3817.726, 3811.3367, 3741.7554, 3873.644, 3430.104, 3876.5598, 3456.5247, 3832.44, 1733.5714, 3959.8135]
2026-01-23 03:19:32,598 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:19:32,609 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 89/100 (estimated time remaining: 19 minutes, 27 seconds)
2026-01-23 03:21:01,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:21:09,826 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4048.50854 ± 145.336
2026-01-23 03:21:09,826 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4025.9739, 3887.3848, 3843.892, 3890.7842, 4143.03, 4139.181, 4047.3315, 4104.5356, 4361.033, 4041.9363]
2026-01-23 03:21:09,826 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:21:09,826 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (4048.51) for latency DatasetOffice
2026-01-23 03:21:09,837 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 90/100 (estimated time remaining: 17 minutes, 50 seconds)
2026-01-23 03:22:38,519 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:22:47,128 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3167.14502 ± 1033.011
2026-01-23 03:22:47,128 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [202.86914, 3585.1946, 3720.7031, 3560.3984, 3645.7087, 3833.3289, 3395.8923, 2708.004, 3312.5415, 3706.811]
2026-01-23 03:22:47,128 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:22:47,140 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 91/100 (estimated time remaining: 16 minutes, 12 seconds)
2026-01-23 03:24:15,824 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:24:24,354 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3792.77686 ± 265.000
2026-01-23 03:24:24,354 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3759.6873, 3951.9744, 3928.546, 3900.0525, 4056.2434, 3937.9365, 3489.205, 3162.4438, 3706.9478, 4034.736]
2026-01-23 03:24:24,354 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:24:24,366 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 92/100 (estimated time remaining: 14 minutes, 35 seconds)
2026-01-23 03:25:53,155 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:26:01,804 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3409.03442 ± 475.621
2026-01-23 03:26:01,804 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3153.9324, 3707.8608, 2776.6353, 3511.635, 3672.227, 2838.5994, 3794.1426, 3906.6396, 4036.179, 2692.494]
2026-01-23 03:26:01,804 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:26:01,816 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 93/100 (estimated time remaining: 12 minutes, 58 seconds)
2026-01-23 03:27:30,741 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:27:39,251 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3089.36084 ± 1469.025
2026-01-23 03:27:39,251 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4139.6357, 1492.9049, 4071.3154, 4017.9468, 3983.2188, 4018.1067, 3994.1707, 179.55853, 1005.66113, 3991.089]
2026-01-23 03:27:39,251 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:27:39,267 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 21 seconds)
2026-01-23 03:29:08,121 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:29:16,666 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3420.23828 ± 1098.448
2026-01-23 03:29:16,666 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4271.1826, 3400.3337, 3932.0364, 1080.9479, 4133.9004, 1474.2034, 3859.2302, 4182.5996, 3866.7234, 4001.225]
2026-01-23 03:29:16,666 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:29:16,679 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 95/100 (estimated time remaining: 9 minutes, 44 seconds)
2026-01-23 03:30:45,388 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:30:54,012 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3496.19263 ± 883.606
2026-01-23 03:30:54,012 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2693.583, 3947.1526, 3825.224, 4057.0625, 3546.252, 4218.3687, 3689.3748, 1130.2996, 3985.1191, 3869.491]
2026-01-23 03:30:54,012 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:30:54,026 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 6 seconds)
2026-01-23 03:32:22,824 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:32:31,445 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3545.74805 ± 1042.581
2026-01-23 03:32:31,445 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3902.6042, 3699.69, 4033.5989, 3827.889, 3939.91, 3857.0952, 3953.4062, 429.62735, 3830.5066, 3983.1538]
2026-01-23 03:32:31,445 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:32:31,457 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 29 seconds)
2026-01-23 03:34:00,145 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:34:08,717 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3730.26953 ± 147.662
2026-01-23 03:34:08,717 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3726.2153, 4083.8757, 3749.7488, 3601.65, 3750.066, 3566.1626, 3640.4272, 3886.6052, 3687.7817, 3610.1648]
2026-01-23 03:34:08,717 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:34:08,729 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 98/100 (estimated time remaining: 4 minutes, 52 seconds)
2026-01-23 03:35:37,489 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:35:46,041 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4055.84058 ± 166.661
2026-01-23 03:35:46,041 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4184.4536, 4434.0234, 3913.8096, 4151.238, 4009.7996, 3896.3953, 4061.1572, 4087.4207, 4004.5212, 3815.5894]
2026-01-23 03:35:46,041 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:35:46,041 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (4055.84) for latency DatasetOffice
2026-01-23 03:35:46,053 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 14 seconds)
2026-01-23 03:37:14,917 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:37:23,413 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3881.06909 ± 159.249
2026-01-23 03:37:23,413 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3703.408, 3708.2876, 3926.4988, 3830.0723, 4019.248, 3696.126, 4170.965, 3758.704, 4060.0962, 3937.2864]
2026-01-23 03:37:23,413 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:37:23,426 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 37 seconds)
2026-01-23 03:38:52,177 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:39:00,796 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3236.83740 ± 1097.087
2026-01-23 03:39:00,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2590.2651, 3937.434, 3770.2188, 185.85902, 3738.4473, 4003.65, 3452.4536, 3196.1284, 3997.9, 3496.018]
2026-01-23 03:39:00,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:39:00,810 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1299 [DEBUG]: Training session finished
