2026-01-23 00:32:03,445 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-halfcheetah/DatasetOffice-sac-aug-mem1  
2026-01-23 00:32:03,445 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-halfcheetah/DatasetOffice-sac-aug-mem1  
2026-01-23 00:32:03,445 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x15271a816090>}
2026-01-23 00:32:03,445 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1159 [DEBUG]: using device: cuda
2026-01-23 00:32:03,445 baseline-sac-noisy-halfcheetah:77 [WARNING]: args.memorize_actions != args.horizon: 1 != 32
2026-01-23 00:32:03,587 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1181 [INFO]: Creating new trainer
2026-01-23 00:32:03,604 baseline-sac-noisy-halfcheetah:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=23, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2026-01-23 00:32:03,605 baseline-sac-noisy-halfcheetah:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=29, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-23 00:32:04,408 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1242 [DEBUG]: Starting training session...
2026-01-23 00:32:04,409 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 1/100
2026-01-23 00:33:29,825 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:33:38,596 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: -484.33121 ± 2.699
2026-01-23 00:33:38,596 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [-487.89526, -480.65958, -488.48044, -481.65936, -485.97235, -483.88284, -483.05127, -487.27676, -481.95322, -482.48093]
2026-01-23 00:33:38,596 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:33:38,596 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (-484.33) for latency DatasetOffice
2026-01-23 00:33:38,600 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 35 minutes, 24 seconds)
2026-01-23 00:35:08,745 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:35:17,345 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: -13.60513 ± 162.061
2026-01-23 00:35:17,345 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [47.307262, 14.893871, 119.259865, -280.7806, 166.9608, 239.8313, -49.176826, -265.62305, -30.336294, -98.38761]
2026-01-23 00:35:17,345 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:35:17,345 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (-13.61) for latency DatasetOffice
2026-01-23 00:35:17,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 37 minutes, 34 seconds)
2026-01-23 00:36:47,412 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:36:55,955 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 416.01440 ± 285.707
2026-01-23 00:36:55,955 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [172.44447, 716.4888, 446.57474, 636.60535, 373.24722, -162.46506, 626.6704, 271.36502, 843.5487, 235.66434]
2026-01-23 00:36:55,955 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:36:55,955 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (416.01) for latency DatasetOffice
2026-01-23 00:36:55,958 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 37 minutes, 6 seconds)
2026-01-23 00:38:26,130 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:38:34,711 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 711.03369 ± 562.901
2026-01-23 00:38:34,711 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [1448.4661, 62.096603, 964.67053, 568.3541, 568.50696, -7.3993344, -40.278545, 1629.7802, 797.0029, 1119.138]
2026-01-23 00:38:34,711 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:38:34,711 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (711.03) for latency DatasetOffice
2026-01-23 00:38:34,714 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 36 minutes, 7 seconds)
2026-01-23 00:40:04,659 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:40:13,215 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1833.40454 ± 195.086
2026-01-23 00:40:13,215 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [1959.3563, 2165.725, 1948.8424, 1603.2366, 1577.0278, 1989.0173, 2007.7152, 1646.5625, 1774.1107, 1662.4487]
2026-01-23 00:40:13,215 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:40:13,215 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (1833.40) for latency DatasetOffice
2026-01-23 00:40:13,218 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 34 minutes, 47 seconds)
2026-01-23 00:41:43,631 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:41:52,294 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1576.98315 ± 828.582
2026-01-23 00:41:52,295 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [1413.2197, 1767.862, 571.9219, 2302.4004, 725.82605, 2269.699, -45.487324, 2090.7817, 2409.1382, 2264.4702]
2026-01-23 00:41:52,295 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:41:52,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 34 minutes, 41 seconds)
2026-01-23 00:43:22,975 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:43:31,538 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2285.77930 ± 808.227
2026-01-23 00:43:31,538 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2861.573, 1768.5172, 2577.9226, 2639.1465, 2908.8992, 1782.412, 3125.5962, 221.38203, 2355.8, 2616.5432]
2026-01-23 00:43:31,538 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:43:31,538 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (2285.78) for latency DatasetOffice
2026-01-23 00:43:31,542 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 33 minutes, 11 seconds)
2026-01-23 00:45:02,098 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:45:10,734 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2618.40942 ± 1010.975
2026-01-23 00:45:10,735 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3160.7336, 1032.1044, 3824.1343, 3454.661, 2278.9636, 1348.6183, 3393.902, 3118.187, 1177.8145, 3394.9753]
2026-01-23 00:45:10,735 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:45:10,735 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (2618.41) for latency DatasetOffice
2026-01-23 00:45:10,739 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 31 minutes, 43 seconds)
2026-01-23 00:46:41,121 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:46:49,624 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3525.93433 ± 978.881
2026-01-23 00:46:49,625 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3939.3235, 4216.821, 4026.5388, 4176.8037, 2333.5825, 3874.2937, 3865.1335, 1039.3076, 3622.1055, 4165.435]
2026-01-23 00:46:49,625 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:46:49,625 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (3525.93) for latency DatasetOffice
2026-01-23 00:46:49,632 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 30 minutes, 7 seconds)
2026-01-23 00:48:20,041 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:48:28,716 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3458.16357 ± 1140.529
2026-01-23 00:48:28,716 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4519.1895, 4101.7793, 4002.4868, 642.8492, 1932.6494, 3844.8132, 3997.9976, 3973.5774, 3802.7031, 3763.5894]
2026-01-23 00:48:28,716 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:48:28,722 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 28 minutes, 39 seconds)
2026-01-23 00:49:59,139 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:50:07,624 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3527.24023 ± 997.762
2026-01-23 00:50:07,624 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4191.839, 3913.4036, 4179.654, 4179.6274, 2186.0364, 1054.2722, 3783.7112, 3877.4424, 3799.0928, 4107.3237]
2026-01-23 00:50:07,624 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:50:07,624 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (3527.24) for latency DatasetOffice
2026-01-23 00:50:07,628 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 26 minutes, 56 seconds)
2026-01-23 00:51:38,123 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:51:46,734 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4249.14551 ± 208.475
2026-01-23 00:51:46,734 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4333.9966, 4071.9944, 4095.6125, 4054.7896, 4152.0693, 4358.4424, 4559.9297, 3917.127, 4526.6826, 4420.8086]
2026-01-23 00:51:46,734 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:51:46,734 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (4249.15) for latency DatasetOffice
2026-01-23 00:51:46,739 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 25 minutes, 15 seconds)
2026-01-23 00:53:17,309 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:53:25,841 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4213.70898 ± 618.390
2026-01-23 00:53:25,841 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4460.543, 4358.9805, 4239.253, 4587.172, 2394.584, 4366.2217, 4474.334, 4195.908, 4552.8604, 4507.235]
2026-01-23 00:53:25,841 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:53:25,845 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 23 minutes, 34 seconds)
2026-01-23 00:54:56,206 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:55:04,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4609.93701 ± 84.621
2026-01-23 00:55:04,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4616.001, 4696.072, 4722.1694, 4651.18, 4611.0513, 4518.5, 4509.6997, 4695.4766, 4455.241, 4623.979]
2026-01-23 00:55:04,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:55:04,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (4609.94) for latency DatasetOffice
2026-01-23 00:55:04,759 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 21 minutes, 56 seconds)
2026-01-23 00:56:35,137 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:56:43,650 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4485.17871 ± 284.173
2026-01-23 00:56:43,650 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4824.3647, 4747.508, 4666.103, 3897.8774, 4219.292, 4654.8623, 4776.3677, 4347.7886, 4284.8755, 4432.7515]
2026-01-23 00:56:43,650 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:56:43,654 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 20 minutes, 13 seconds)
2026-01-23 00:58:14,246 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:58:22,832 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4456.90674 ± 248.156
2026-01-23 00:58:22,832 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4902.424, 4652.7075, 4648.442, 4546.1924, 4119.7144, 4218.178, 4209.805, 4318.1953, 4273.5327, 4679.88]
2026-01-23 00:58:22,832 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:58:22,835 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 18 minutes, 39 seconds)
2026-01-23 00:59:53,481 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:00:01,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4967.67285 ± 253.556
2026-01-23 01:00:01,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5006.6934, 4692.0435, 4634.9155, 4911.9175, 4966.9863, 5203.415, 5415.7397, 4589.926, 5154.236, 5100.8545]
2026-01-23 01:00:01,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:00:01,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (4967.67) for latency DatasetOffice
2026-01-23 01:00:01,994 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 17 minutes, 1 second)
2026-01-23 01:01:32,160 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:01:40,805 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4411.76904 ± 923.284
2026-01-23 01:01:40,805 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4949.6943, 4967.5283, 4763.6455, 5075.9624, 2862.711, 4850.981, 2348.1533, 4469.4365, 4936.4966, 4893.0854]
2026-01-23 01:01:40,805 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:01:40,809 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 15 minutes, 17 seconds)
2026-01-23 01:03:11,221 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:03:19,895 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3435.55933 ± 1699.389
2026-01-23 01:03:19,895 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2869.8872, 4942.5347, 4842.36, 1202.1954, 4162.295, 4466.8286, 4832.8916, 5198.394, 872.2252, 965.9826]
2026-01-23 01:03:19,895 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:03:19,899 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 13 minutes, 41 seconds)
2026-01-23 01:04:50,296 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:04:58,945 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3562.43701 ± 1821.431
2026-01-23 01:04:58,945 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2900.7324, 5348.408, 133.4139, 3734.1626, 4748.7114, 175.90102, 4886.027, 4722.861, 4291.3813, 4682.774]
2026-01-23 01:04:58,945 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:04:58,950 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 12 minutes, 4 seconds)
2026-01-23 01:06:29,178 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:06:37,751 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4862.73486 ± 239.343
2026-01-23 01:06:37,751 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5128.677, 4723.874, 4921.068, 5188.842, 4915.6694, 4476.3213, 4621.9644, 4635.8193, 4821.0483, 5194.0654]
2026-01-23 01:06:37,751 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:06:37,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 10 minutes, 19 seconds)
2026-01-23 01:08:08,211 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:08:16,753 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4780.58789 ± 1122.413
2026-01-23 01:08:16,754 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5320.6914, 4909.9575, 4898.8584, 5168.163, 5032.0986, 5348.6416, 5501.6143, 4850.6504, 1472.673, 5302.534]
2026-01-23 01:08:16,754 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:08:16,758 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 8 minutes, 38 seconds)
2026-01-23 01:09:47,076 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:09:55,626 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4984.54834 ± 657.811
2026-01-23 01:09:55,626 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5217.13, 5323.391, 5109.3555, 5379.43, 3072.43, 5287.9106, 5335.198, 4778.372, 5129.7847, 5212.479]
2026-01-23 01:09:55,626 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:09:55,626 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (4984.55) for latency DatasetOffice
2026-01-23 01:09:55,636 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 7 minutes)
2026-01-23 01:11:25,988 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:11:34,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5332.96826 ± 54.746
2026-01-23 01:11:34,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5264.536, 5307.114, 5402.9453, 5281.403, 5235.025, 5342.3647, 5389.0894, 5357.4536, 5382.6553, 5367.0977]
2026-01-23 01:11:34,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:11:34,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (5332.97) for latency DatasetOffice
2026-01-23 01:11:34,589 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 5 minutes, 19 seconds)
2026-01-23 01:13:04,787 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:13:13,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5148.17285 ± 337.687
2026-01-23 01:13:13,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5390.423, 5252.882, 5344.65, 4389.862, 4843.945, 5426.5317, 5658.956, 5090.147, 5030.7515, 5053.58]
2026-01-23 01:13:13,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:13:13,306 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 3 minutes, 35 seconds)
2026-01-23 01:14:43,551 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:14:52,157 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4998.11426 ± 344.918
2026-01-23 01:14:52,157 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5465.221, 5221.691, 5304.1797, 5303.5625, 4692.02, 4296.4814, 4890.255, 4750.9014, 4836.103, 5220.725]
2026-01-23 01:14:52,157 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:14:52,161 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 1 minute, 57 seconds)
2026-01-23 01:16:22,343 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:16:31,008 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5250.39160 ± 208.703
2026-01-23 01:16:31,008 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5324.244, 5111.295, 5239.466, 5240.6494, 5177.5605, 5474.64, 5547.23, 4748.3433, 5370.436, 5270.0493]
2026-01-23 01:16:31,008 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:16:31,013 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 16 seconds)
2026-01-23 01:18:01,694 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:18:10,188 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4927.87402 ± 985.675
2026-01-23 01:18:10,188 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5227.03, 5370.368, 5339.296, 5450.851, 2002.2869, 5191.235, 5385.308, 4911.4507, 5222.164, 5178.75]
2026-01-23 01:18:10,188 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:18:10,193 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 29/100 (estimated time remaining: 1 hour, 58 minutes, 41 seconds)
2026-01-23 01:19:40,528 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:19:49,171 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5398.34717 ± 57.539
2026-01-23 01:19:49,171 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5335.9526, 5440.9004, 5531.5303, 5326.9365, 5420.0737, 5356.6885, 5385.6426, 5407.5786, 5420.234, 5357.935]
2026-01-23 01:19:49,171 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:19:49,171 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (5398.35) for latency DatasetOffice
2026-01-23 01:19:49,181 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 30/100 (estimated time remaining: 1 hour, 57 minutes, 3 seconds)
2026-01-23 01:21:19,289 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:21:27,934 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5075.72412 ± 310.690
2026-01-23 01:21:27,934 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5347.0435, 5314.9976, 5234.53, 4309.2095, 4995.804, 5264.898, 5390.729, 5047.9116, 4794.3823, 5057.734]
2026-01-23 01:21:27,935 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:21:27,941 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 55 minutes, 24 seconds)
2026-01-23 01:22:58,152 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:23:06,696 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5094.81055 ± 302.317
2026-01-23 01:23:06,696 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5357.3096, 5200.2866, 5225.378, 5430.7793, 4984.871, 4407.561, 4751.679, 5037.324, 5148.8525, 5404.0674]
2026-01-23 01:23:06,696 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:23:06,703 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 53 minutes, 44 seconds)
2026-01-23 01:24:36,825 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:24:45,479 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5123.44629 ± 168.772
2026-01-23 01:24:45,479 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5319.003, 5078.3213, 4965.8994, 5028.2817, 5044.0127, 5297.4688, 5399.722, 4814.49, 5116.2896, 5170.968]
2026-01-23 01:24:45,479 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:24:45,486 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 52 minutes, 4 seconds)
2026-01-23 01:26:15,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:26:24,344 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5180.95508 ± 637.581
2026-01-23 01:26:24,344 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5514.739, 5385.819, 5393.564, 5505.322, 3328.3955, 5343.3843, 5411.756, 4943.5205, 5505.43, 5477.622]
2026-01-23 01:26:24,344 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:26:24,350 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 50 minutes, 21 seconds)
2026-01-23 01:27:54,827 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:28:03,315 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5511.36621 ± 68.001
2026-01-23 01:28:03,315 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5514.468, 5438.0303, 5678.974, 5527.741, 5486.828, 5407.072, 5531.7104, 5521.0625, 5489.852, 5517.9204]
2026-01-23 01:28:03,315 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:28:03,315 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (5511.37) for latency DatasetOffice
2026-01-23 01:28:03,321 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 48 minutes, 42 seconds)
2026-01-23 01:29:33,236 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:29:41,783 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5405.67725 ± 360.436
2026-01-23 01:29:41,783 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5803.575, 5727.9414, 5731.2046, 4647.66, 4953.5728, 5624.437, 5643.819, 5410.126, 5216.356, 5298.08]
2026-01-23 01:29:41,783 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:29:41,790 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 47 minutes)
2026-01-23 01:31:12,280 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:31:20,769 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5161.84961 ± 251.476
2026-01-23 01:31:20,769 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5452.029, 5182.8174, 5245.776, 5495.4062, 5120.89, 4829.7373, 4725.21, 4937.5283, 5192.3477, 5436.757]
2026-01-23 01:31:20,769 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:31:20,776 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 45 minutes, 24 seconds)
2026-01-23 01:32:51,106 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:32:59,700 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5366.01416 ± 192.973
2026-01-23 01:32:59,700 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5367.4136, 5235.271, 5243.639, 5455.301, 5215.6616, 5391.983, 5789.59, 5038.303, 5506.325, 5416.654]
2026-01-23 01:32:59,700 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:32:59,707 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 43 minutes, 47 seconds)
2026-01-23 01:34:30,081 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:34:38,536 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5267.48486 ± 680.396
2026-01-23 01:34:38,536 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5556.313, 5574.9087, 5520.8745, 5781.208, 3298.0325, 5343.062, 5470.9873, 5052.055, 5494.34, 5583.067]
2026-01-23 01:34:38,536 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:34:38,543 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 42 minutes, 7 seconds)
2026-01-23 01:36:08,847 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:36:17,504 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5611.52344 ± 102.493
2026-01-23 01:36:17,504 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5542.6895, 5513.0674, 5736.5537, 5811.469, 5437.834, 5605.556, 5596.2075, 5573.301, 5637.9575, 5660.5977]
2026-01-23 01:36:17,504 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:36:17,504 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (5611.52) for latency DatasetOffice
2026-01-23 01:36:17,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 40 minutes, 29 seconds)
2026-01-23 01:37:47,709 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:37:56,291 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5406.70459 ± 376.053
2026-01-23 01:37:56,291 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5725.206, 5699.263, 5644.1577, 4498.621, 5298.971, 5616.9307, 5768.448, 5334.0234, 5018.1396, 5463.2793]
2026-01-23 01:37:56,291 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:37:56,298 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 38 minutes, 54 seconds)
2026-01-23 01:39:26,479 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:39:35,112 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5253.08594 ± 343.346
2026-01-23 01:39:35,112 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5743.5312, 5474.7183, 5552.093, 5521.2666, 5047.191, 4697.218, 4781.8784, 4941.879, 5279.4136, 5491.6704]
2026-01-23 01:39:35,112 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:39:35,119 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 37 minutes, 13 seconds)
2026-01-23 01:41:05,321 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:41:13,866 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5473.23535 ± 186.682
2026-01-23 01:41:13,866 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5676.664, 5360.0464, 5260.1387, 5448.402, 5320.063, 5535.8794, 5676.734, 5142.266, 5695.3047, 5616.857]
2026-01-23 01:41:13,866 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:41:13,872 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 35 minutes, 32 seconds)
2026-01-23 01:42:44,274 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:42:52,875 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5369.10840 ± 716.557
2026-01-23 01:42:52,875 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5502.405, 5680.3145, 5652.1597, 5919.945, 3292.775, 5541.55, 5597.421, 5162.336, 5575.6416, 5766.534]
2026-01-23 01:42:52,875 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:42:52,881 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 33 minutes, 55 seconds)
2026-01-23 01:44:23,003 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:44:31,570 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5757.02637 ± 93.641
2026-01-23 01:44:31,571 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5722.3477, 5699.6616, 5928.329, 5735.1284, 5773.8677, 5779.4604, 5724.6343, 5858.328, 5556.848, 5791.6597]
2026-01-23 01:44:31,571 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:44:31,571 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (5757.03) for latency DatasetOffice
2026-01-23 01:44:31,577 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 32 minutes, 13 seconds)
2026-01-23 01:46:01,823 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:46:10,446 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5451.14062 ± 337.298
2026-01-23 01:46:10,446 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5779.2363, 5803.03, 5530.186, 4610.417, 5313.122, 5588.861, 5769.8994, 5380.7603, 5236.149, 5499.7446]
2026-01-23 01:46:10,446 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:46:10,455 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 30 minutes, 35 seconds)
2026-01-23 01:47:40,817 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:47:49,444 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5367.84668 ± 280.535
2026-01-23 01:47:49,444 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5836.622, 5516.565, 5399.366, 5638.737, 5325.986, 4825.992, 5176.927, 5067.283, 5310.311, 5580.6753]
2026-01-23 01:47:49,444 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:47:49,452 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 28 minutes, 58 seconds)
2026-01-23 01:49:19,624 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:49:28,181 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5570.17529 ± 182.826
2026-01-23 01:49:28,181 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5595.2266, 5437.036, 5434.179, 5546.3545, 5550.338, 5699.7456, 5972.709, 5234.191, 5632.4136, 5599.559]
2026-01-23 01:49:28,181 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:49:28,188 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 27 minutes, 19 seconds)
2026-01-23 01:50:58,527 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:51:07,072 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5473.88184 ± 698.290
2026-01-23 01:51:07,072 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5679.1743, 5830.526, 5570.084, 5808.9443, 3451.1865, 5712.2163, 5844.629, 5213.543, 5804.4863, 5824.029]
2026-01-23 01:51:07,072 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:51:07,079 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 25 minutes, 39 seconds)
2026-01-23 01:52:37,318 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:52:45,939 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5831.50586 ± 78.261
2026-01-23 01:52:45,939 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5799.332, 5854.952, 5792.4155, 5814.924, 5845.4795, 5902.0366, 5879.1567, 5899.4585, 5628.385, 5898.9175]
2026-01-23 01:52:45,939 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:52:45,939 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (5831.51) for latency DatasetOffice
2026-01-23 01:52:45,947 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 24 minutes, 2 seconds)
2026-01-23 01:54:16,083 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:54:24,526 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5535.49072 ± 335.757
2026-01-23 01:54:24,526 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5811.511, 5940.835, 5685.3257, 4762.582, 5429.0713, 5827.795, 5814.2793, 5384.023, 5366.49, 5332.9917]
2026-01-23 01:54:24,526 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:54:24,533 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 22 minutes, 20 seconds)
2026-01-23 01:55:55,213 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:56:03,699 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5482.83105 ± 306.987
2026-01-23 01:56:03,699 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5912.863, 5642.09, 5686.724, 5686.9214, 5451.651, 4954.568, 5126.079, 5122.215, 5432.242, 5812.9556]
2026-01-23 01:56:03,699 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:56:03,706 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 20 minutes, 43 seconds)
2026-01-23 01:57:33,993 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:57:42,529 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5658.10254 ± 201.312
2026-01-23 01:57:42,529 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5845.888, 5563.9404, 5501.6313, 5568.4546, 5561.295, 5870.65, 5898.432, 5224.5576, 5742.23, 5803.947]
2026-01-23 01:57:42,529 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:57:42,537 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 19 minutes, 5 seconds)
2026-01-23 01:59:12,890 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:59:21,421 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5447.29785 ± 687.875
2026-01-23 01:59:21,421 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5746.652, 5714.3706, 5645.6616, 5895.6655, 3433.2722, 5792.9893, 5673.9517, 5296.1523, 5572.6855, 5701.582]
2026-01-23 01:59:21,421 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:59:21,428 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 17 minutes, 26 seconds)
2026-01-23 02:00:52,525 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:01:01,109 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5848.62646 ± 81.790
2026-01-23 02:01:01,109 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5747.17, 5743.148, 6015.685, 5899.8047, 5884.4316, 5830.3423, 5782.956, 5924.7456, 5864.033, 5793.955]
2026-01-23 02:01:01,109 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:01:01,109 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (5848.63) for latency DatasetOffice
2026-01-23 02:01:01,122 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 15 minutes, 55 seconds)
2026-01-23 02:02:32,978 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:02:41,604 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5568.57568 ± 366.732
2026-01-23 02:02:41,605 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5971.9165, 5783.7153, 5778.088, 4844.6074, 5207.6963, 5905.3774, 5868.9087, 5589.0903, 5091.3047, 5645.0522]
2026-01-23 02:02:41,605 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:02:41,611 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 14 minutes, 33 seconds)
2026-01-23 02:04:12,737 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:04:21,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5466.26025 ± 334.054
2026-01-23 02:04:21,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5866.9434, 5692.9956, 5767.6167, 5857.181, 5338.778, 4795.4424, 5090.8354, 5280.426, 5391.821, 5580.564]
2026-01-23 02:04:21,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:04:21,309 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 12 minutes, 58 seconds)
2026-01-23 02:05:51,152 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:05:59,715 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5638.27979 ± 213.204
2026-01-23 02:05:59,715 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5827.067, 5339.584, 5509.4775, 5847.722, 5380.3623, 5757.0996, 5933.981, 5374.137, 5811.651, 5601.7153]
2026-01-23 02:05:59,716 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:05:59,722 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 11 minutes, 15 seconds)
2026-01-23 02:07:29,200 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:07:37,643 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5480.94434 ± 666.218
2026-01-23 02:07:37,643 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5440.9673, 5803.1226, 5831.4287, 5924.082, 3558.6042, 5622.971, 5903.8667, 5329.877, 5658.393, 5736.128]
2026-01-23 02:07:37,643 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:07:37,650 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 9 minutes, 28 seconds)
2026-01-23 02:09:06,377 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:09:14,636 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5861.83984 ± 119.999
2026-01-23 02:09:14,636 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5815.288, 5868.1055, 6023.4204, 5869.7354, 5944.465, 5958.08, 5549.3853, 5877.447, 5827.9663, 5884.5054]
2026-01-23 02:09:14,636 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:09:14,636 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (5861.84) for latency DatasetOffice
2026-01-23 02:09:14,643 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 7 minutes, 26 seconds)
2026-01-23 02:10:43,427 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:10:51,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5617.58203 ± 345.118
2026-01-23 02:10:51,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5910.039, 5993.931, 5741.373, 4752.078, 5540.215, 5939.222, 5722.3345, 5534.336, 5357.5454, 5684.744]
2026-01-23 02:10:51,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:10:51,796 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 5 minutes, 21 seconds)
2026-01-23 02:12:20,533 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:12:28,915 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5573.64062 ± 276.906
2026-01-23 02:12:28,915 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5950.1724, 5545.683, 5762.2266, 5880.053, 5544.8896, 5003.295, 5282.6484, 5367.9927, 5629.4473, 5769.999]
2026-01-23 02:12:28,915 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:12:28,925 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 3 minutes, 23 seconds)
2026-01-23 02:13:56,941 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:14:05,307 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5638.76807 ± 229.069
2026-01-23 02:14:05,307 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5924.332, 5588.3027, 5383.7783, 5773.3774, 5535.458, 5867.788, 5846.0986, 5149.668, 5584.233, 5734.642]
2026-01-23 02:14:05,307 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:14:05,315 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 1 minute, 30 seconds)
2026-01-23 02:15:33,486 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:15:41,830 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5571.45068 ± 710.966
2026-01-23 02:15:41,831 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5797.1875, 5961.976, 5905.765, 5978.7217, 3504.0063, 5821.3315, 5749.484, 5325.5503, 5887.7686, 5782.714]
2026-01-23 02:15:41,831 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:15:41,841 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 64/100 (estimated time remaining: 59 minutes, 43 seconds)
2026-01-23 02:17:09,364 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:17:17,752 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5991.03955 ± 84.804
2026-01-23 02:17:17,752 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5982.8, 5989.1733, 6125.6543, 6126.4263, 5948.852, 5952.359, 5986.7705, 5987.191, 5812.093, 5999.0747]
2026-01-23 02:17:17,752 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:17:17,752 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (5991.04) for latency DatasetOffice
2026-01-23 02:17:17,760 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 65/100 (estimated time remaining: 57 minutes, 58 seconds)
2026-01-23 02:18:45,175 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:18:53,488 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5607.23340 ± 223.134
2026-01-23 02:18:53,488 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5659.5596, 5884.5337, 5838.7417, 5084.3145, 5619.5312, 5528.4966, 5764.548, 5562.102, 5400.094, 5730.41]
2026-01-23 02:18:53,488 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:18:53,496 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 66/100 (estimated time remaining: 56 minutes, 11 seconds)
2026-01-23 02:20:20,851 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:20:29,145 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5585.29932 ± 190.089
2026-01-23 02:20:29,145 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5824.573, 5556.696, 5646.1514, 5904.5845, 5462.5024, 5319.9443, 5306.7397, 5483.259, 5609.887, 5738.654]
2026-01-23 02:20:29,145 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:20:29,155 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 67/100 (estimated time remaining: 54 minutes, 25 seconds)
2026-01-23 02:21:56,891 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:22:05,117 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5775.00732 ± 241.873
2026-01-23 02:22:05,117 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5961.132, 5548.493, 5686.44, 5829.8174, 5713.916, 5887.0225, 6197.669, 5256.9204, 5928.809, 5739.857]
2026-01-23 02:22:05,118 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:22:05,125 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 68/100 (estimated time remaining: 52 minutes, 46 seconds)
2026-01-23 02:23:32,778 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:23:40,957 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5364.59326 ± 1071.516
2026-01-23 02:23:40,957 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5710.675, 5618.528, 5896.508, 5798.441, 2202.3289, 5617.81, 5778.923, 5225.013, 5899.746, 5897.9604]
2026-01-23 02:23:40,957 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:23:40,965 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 69/100 (estimated time remaining: 51 minutes, 6 seconds)
2026-01-23 02:25:08,838 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:25:17,011 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5983.54053 ± 88.668
2026-01-23 02:25:17,011 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5914.679, 5857.987, 6085.7085, 5941.014, 6000.03, 5909.531, 5997.9697, 6151.106, 5911.9478, 6065.4316]
2026-01-23 02:25:17,011 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:25:17,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 70/100 (estimated time remaining: 49 minutes, 31 seconds)
2026-01-23 02:26:43,808 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:26:51,974 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5748.08643 ± 355.293
2026-01-23 02:26:51,974 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6129.3574, 6035.5723, 6013.6245, 4965.2915, 5481.7305, 5851.27, 5988.351, 5879.7544, 5313.506, 5822.4014]
2026-01-23 02:26:51,974 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:26:51,982 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 71/100 (estimated time remaining: 47 minutes, 50 seconds)
2026-01-23 02:28:19,272 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:28:27,504 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5566.65479 ± 278.442
2026-01-23 02:28:27,504 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6004.4194, 5547.548, 5716.021, 5898.0684, 5313.956, 5195.5376, 5383.1416, 5224.9443, 5507.096, 5875.8105]
2026-01-23 02:28:27,504 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:28:27,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 72/100 (estimated time remaining: 46 minutes, 14 seconds)
2026-01-23 02:29:55,202 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:30:03,435 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5804.15186 ± 209.913
2026-01-23 02:30:03,436 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5891.934, 5554.1978, 5688.7007, 5926.5317, 5675.0103, 5916.063, 6125.8887, 5388.887, 5908.841, 5965.4697]
2026-01-23 02:30:03,436 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:30:03,446 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 73/100 (estimated time remaining: 44 minutes, 38 seconds)
2026-01-23 02:31:31,046 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:31:39,309 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5683.22217 ± 725.864
2026-01-23 02:31:39,310 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5740.266, 6030.8203, 5879.845, 6111.901, 3580.7803, 5807.945, 6014.604, 5481.3257, 6115.733, 6069.0024]
2026-01-23 02:31:39,310 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:31:39,321 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 74/100 (estimated time remaining: 43 minutes, 3 seconds)
2026-01-23 02:33:06,853 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:33:15,126 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6006.99316 ± 141.869
2026-01-23 02:33:15,126 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6114.042, 5915.147, 6290.2197, 6047.561, 6123.3076, 5759.9106, 5904.033, 6028.6694, 5997.383, 5889.6562]
2026-01-23 02:33:15,126 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:33:15,126 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (6006.99) for latency DatasetOffice
2026-01-23 02:33:15,134 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 75/100 (estimated time remaining: 41 minutes, 26 seconds)
2026-01-23 02:34:42,255 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:34:50,597 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5712.86328 ± 400.184
2026-01-23 02:34:50,598 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6149.9233, 5955.214, 5965.0586, 4858.665, 5671.973, 5888.2056, 6101.1553, 5799.3735, 5112.5967, 5626.4673]
2026-01-23 02:34:50,598 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:34:50,606 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 76/100 (estimated time remaining: 39 minutes, 53 seconds)
2026-01-23 02:36:17,898 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:36:26,159 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5662.79688 ± 295.890
2026-01-23 02:36:26,159 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5960.0967, 5686.204, 5976.525, 6019.2925, 5536.924, 5163.968, 5348.099, 5343.8364, 5623.5107, 5969.5107]
2026-01-23 02:36:26,159 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:36:26,168 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 77/100 (estimated time remaining: 38 minutes, 17 seconds)
2026-01-23 02:37:53,401 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:38:01,747 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5782.77393 ± 207.401
2026-01-23 02:38:01,747 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5972.828, 5519.557, 5470.892, 5861.8174, 5789.441, 5857.1895, 6053.9927, 5479.4424, 5818.9717, 6003.6045]
2026-01-23 02:38:01,747 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:38:01,757 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 78/100 (estimated time remaining: 36 minutes, 40 seconds)
2026-01-23 02:39:29,287 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:39:37,560 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5630.43457 ± 704.239
2026-01-23 02:39:37,560 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5860.5757, 6058.891, 5864.8477, 6063.3765, 3558.1406, 5693.6675, 5732.1167, 5645.658, 5838.257, 5988.8174]
2026-01-23 02:39:37,560 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:39:37,569 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 79/100 (estimated time remaining: 35 minutes, 4 seconds)
2026-01-23 02:41:04,873 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:41:13,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5929.82471 ± 109.345
2026-01-23 02:41:13,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5896.3496, 5954.1094, 6000.123, 5997.384, 5720.784, 5889.829, 5776.309, 6033.7817, 6099.161, 5930.419]
2026-01-23 02:41:13,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:41:13,135 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 80/100 (estimated time remaining: 33 minutes, 27 seconds)
2026-01-23 02:42:40,854 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:42:49,218 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5782.88672 ± 330.075
2026-01-23 02:42:49,218 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6064.457, 6025.236, 6018.0083, 4929.7046, 5730.4395, 5972.9795, 6076.952, 5650.679, 5633.557, 5726.8525]
2026-01-23 02:42:49,218 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:42:49,227 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 81/100 (estimated time remaining: 31 minutes, 54 seconds)
2026-01-23 02:44:16,840 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:44:25,109 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5695.60645 ± 306.390
2026-01-23 02:44:25,109 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6120.4185, 5642.9556, 6077.9316, 6029.801, 5591.497, 5196.8984, 5389.033, 5372.423, 5646.4844, 5888.6235]
2026-01-23 02:44:25,109 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:44:25,121 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 82/100 (estimated time remaining: 30 minutes, 20 seconds)
2026-01-23 02:45:52,369 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:46:00,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5786.48828 ± 298.075
2026-01-23 02:46:00,630 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6100.221, 5446.4995, 5832.6807, 5649.664, 5596.765, 6060.4214, 6236.803, 5218.9604, 5918.315, 5804.5513]
2026-01-23 02:46:00,630 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:46:00,638 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 83/100 (estimated time remaining: 28 minutes, 43 seconds)
2026-01-23 02:47:28,186 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:47:36,402 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5728.40479 ± 744.585
2026-01-23 02:47:36,402 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5967.5923, 6176.768, 6077.215, 6107.241, 3602.2468, 5944.9033, 6006.9297, 5326.641, 5928.9414, 6145.566]
2026-01-23 02:47:36,402 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:47:36,411 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 84/100 (estimated time remaining: 27 minutes, 8 seconds)
2026-01-23 02:49:03,936 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:49:12,118 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5972.35352 ± 95.336
2026-01-23 02:49:12,118 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6135.7275, 5969.5156, 6094.905, 6036.9824, 5854.869, 5865.8667, 6043.537, 5913.87, 5936.195, 5872.0664]
2026-01-23 02:49:12,118 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:49:12,128 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 85/100 (estimated time remaining: 25 minutes, 32 seconds)
2026-01-23 02:50:39,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:50:47,816 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5835.87598 ± 320.990
2026-01-23 02:50:47,816 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6217.714, 6102.1606, 6135.318, 5226.7695, 5415.551, 5955.0156, 6128.506, 5704.176, 5596.288, 5877.2573]
2026-01-23 02:50:47,816 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:50:47,826 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 86/100 (estimated time remaining: 23 minutes, 55 seconds)
2026-01-23 02:52:15,522 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:52:23,829 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5718.89453 ± 323.026
2026-01-23 02:52:23,829 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6091.5337, 5883.2495, 5749.098, 6072.8643, 5279.3755, 5350.3555, 5453.358, 5297.2847, 5937.514, 6074.3115]
2026-01-23 02:52:23,829 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:52:23,840 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 87/100 (estimated time remaining: 22 minutes, 20 seconds)
2026-01-23 02:53:51,384 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:53:59,689 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5972.39746 ± 206.314
2026-01-23 02:53:59,689 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6219.867, 5789.108, 5909.958, 5786.3394, 5931.2593, 5962.8613, 6395.869, 5661.4595, 6092.827, 5974.4243]
2026-01-23 02:53:59,689 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:53:59,698 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 88/100 (estimated time remaining: 20 minutes, 45 seconds)
2026-01-23 02:55:26,592 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:55:34,804 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5661.75928 ± 1096.788
2026-01-23 02:55:34,804 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6171.342, 6148.637, 5972.479, 6237.26, 2397.8958, 5938.294, 5955.672, 5740.7373, 6115.215, 5940.0674]
2026-01-23 02:55:34,804 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:55:34,814 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 89/100 (estimated time remaining: 19 minutes, 8 seconds)
2026-01-23 02:57:02,380 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:57:10,661 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6238.38672 ± 81.260
2026-01-23 02:57:10,661 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6301.241, 6282.2773, 6239.018, 6299.9106, 6349.0176, 6153.299, 6086.294, 6287.6387, 6132.847, 6252.3228]
2026-01-23 02:57:10,661 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:57:10,661 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (6238.39) for latency DatasetOffice
2026-01-23 02:57:10,670 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 90/100 (estimated time remaining: 17 minutes, 32 seconds)
2026-01-23 02:58:38,325 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:58:46,619 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5785.68457 ± 392.837
2026-01-23 02:58:46,619 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6216.9834, 5877.4917, 6068.1323, 4883.3945, 5391.95, 5982.4185, 6232.017, 5806.146, 5546.6396, 5851.672]
2026-01-23 02:58:46,619 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:58:46,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 91/100 (estimated time remaining: 15 minutes, 57 seconds)
2026-01-23 03:00:14,387 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:00:22,754 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5799.61670 ± 304.342
2026-01-23 03:00:22,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6147.4946, 5886.5796, 6053.1562, 6122.8457, 5769.7793, 5316.548, 5264.6206, 5566.2466, 5840.5967, 6028.2954]
2026-01-23 03:00:22,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:00:22,765 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 92/100 (estimated time remaining: 14 minutes, 22 seconds)
2026-01-23 03:01:50,000 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:01:58,184 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6030.70801 ± 177.542
2026-01-23 03:01:58,185 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6159.1455, 5843.2246, 5858.618, 6052.0015, 5856.996, 6149.206, 6273.6787, 5748.3306, 6137.3433, 6228.535]
2026-01-23 03:01:58,185 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:01:58,194 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 93/100 (estimated time remaining: 12 minutes, 45 seconds)
2026-01-23 03:03:25,828 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:03:34,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5848.34180 ± 756.372
2026-01-23 03:03:34,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6029.958, 6257.79, 6144.107, 6181.1377, 3629.765, 5882.697, 6158.217, 5741.127, 6252.3853, 6206.234]
2026-01-23 03:03:34,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:03:34,142 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 11 seconds)
2026-01-23 03:05:01,238 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:05:09,516 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6294.22363 ± 92.659
2026-01-23 03:05:09,517 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6244.3423, 6146.7505, 6433.0293, 6403.5137, 6148.023, 6311.4224, 6270.41, 6315.1763, 6381.908, 6287.6553]
2026-01-23 03:05:09,517 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:05:09,517 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (6294.22) for latency DatasetOffice
2026-01-23 03:05:09,528 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 95/100 (estimated time remaining: 9 minutes, 34 seconds)
2026-01-23 03:06:37,320 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:06:45,527 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5877.66992 ± 350.596
2026-01-23 03:06:45,527 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6208.003, 6214.8145, 6074.602, 5068.3413, 5763.2026, 6069.0654, 6226.989, 5564.872, 5673.4277, 5913.3794]
2026-01-23 03:06:45,527 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:06:45,537 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 96/100 (estimated time remaining: 7 minutes, 58 seconds)
2026-01-23 03:08:13,249 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:08:21,529 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5863.99658 ± 261.074
2026-01-23 03:08:21,529 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6233.595, 6022.6934, 5940.199, 6152.5635, 5640.6, 5460.5303, 5524.546, 5684.689, 5851.042, 6129.5083]
2026-01-23 03:08:21,529 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:08:21,540 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 23 seconds)
2026-01-23 03:09:49,008 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:09:57,342 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5899.39746 ± 236.801
2026-01-23 03:09:57,342 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6161.818, 5593.74, 5717.0107, 5904.8647, 5739.4136, 6053.3027, 6334.115, 5609.6704, 5799.2007, 6080.835]
2026-01-23 03:09:57,342 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:09:57,355 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 98/100 (estimated time remaining: 4 minutes, 47 seconds)
2026-01-23 03:11:24,326 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:11:32,611 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5803.16309 ± 718.763
2026-01-23 03:11:32,612 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5947.6255, 6210.337, 5941.055, 6298.52, 3689.5696, 6001.3853, 6135.1904, 5761.9775, 6022.1304, 6023.8384]
2026-01-23 03:11:32,612 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:11:32,624 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 11 seconds)
2026-01-23 03:13:00,073 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:13:08,341 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6326.20410 ± 86.050
2026-01-23 03:13:08,341 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6248.541, 6405.9272, 6441.4717, 6249.287, 6288.4473, 6285.4746, 6292.2715, 6430.62, 6194.26, 6425.7407]
2026-01-23 03:13:08,341 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:13:08,341 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (6326.20) for latency DatasetOffice
2026-01-23 03:13:08,353 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 35 seconds)
2026-01-23 03:14:36,130 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:14:44,437 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6022.93213 ± 384.927
2026-01-23 03:14:44,438 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6288.445, 6434.2817, 6183.266, 4982.536, 6033.352, 6197.8525, 6267.932, 5949.4917, 5832.2437, 6059.9233]
2026-01-23 03:14:44,438 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:14:44,449 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1299 [DEBUG]: Training session finished
