2026-01-23 01:04:27,426 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-halfcheetah/DatasetOffice-sac-aug-mem2
2026-01-23 01:04:27,426 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-halfcheetah/DatasetOffice-sac-aug-mem2
2026-01-23 01:04:27,426 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x14f435fb9f90>}
2026-01-23 01:04:27,426 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1159 [DEBUG]: using device: cuda
2026-01-23 01:04:27,426 baseline-sac-noisy-halfcheetah:77 [WARNING]: args.memorize_actions != args.horizon: 2 != 32
2026-01-23 01:04:27,572 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1181 [INFO]: Creating new trainer
2026-01-23 01:04:27,588 baseline-sac-noisy-halfcheetah:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=29, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2026-01-23 01:04:27,588 baseline-sac-noisy-halfcheetah:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-23 01:04:28,419 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1242 [DEBUG]: Starting training session...
2026-01-23 01:04:28,420 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 1/100
2026-01-23 01:05:55,246 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:06:03,900 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: -452.96161 ± 36.193
2026-01-23 01:06:03,900 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [-446.57117, -391.15448, -467.21255, -505.29294, -481.22504, -492.93933, -458.35995, -465.2467, -419.89877, -401.71515]
2026-01-23 01:06:03,900 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:06:03,900 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (-452.96) for latency DatasetOffice
2026-01-23 01:06:03,905 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 37 minutes, 33 seconds)
2026-01-23 01:07:35,620 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:07:44,103 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: -137.88089 ± 80.544
2026-01-23 01:07:44,103 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [-34.008076, -147.74709, -221.28752, -242.1628, -82.973694, -29.066591, -53.56022, -201.70258, -128.2987, -238.00156]
2026-01-23 01:07:44,103 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:07:44,103 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (-137.88) for latency DatasetOffice
2026-01-23 01:07:44,109 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 39 minutes, 48 seconds)
2026-01-23 01:09:15,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:09:24,302 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: -4.00427 ± 49.245
2026-01-23 01:09:24,302 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [0.2480405, 2.9906485, 43.01432, 23.258219, -3.7773445, 13.338718, -111.9612, 21.533436, 50.494965, -79.182526]
2026-01-23 01:09:24,302 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:09:24,302 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (-4.00) for latency DatasetOffice
2026-01-23 01:09:24,305 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 39 minutes, 26 seconds)
2026-01-23 01:10:55,829 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:11:04,360 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: -6.68832 ± 145.984
2026-01-23 01:11:04,360 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [-177.93848, 100.82897, -100.99281, 47.615353, -137.93987, 123.27105, -243.60548, 163.95386, -35.81579, 193.73999]
2026-01-23 01:11:04,361 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:11:04,365 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 38 minutes, 22 seconds)
2026-01-23 01:12:36,082 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:12:44,548 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 433.20859 ± 164.682
2026-01-23 01:12:44,548 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [244.13342, 230.06967, 543.9527, 571.67834, 289.62537, 479.63303, 604.2105, 678.0152, 478.3846, 212.38263]
2026-01-23 01:12:44,548 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:12:44,548 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (433.21) for latency DatasetOffice
2026-01-23 01:12:44,552 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 37 minutes, 6 seconds)
2026-01-23 01:14:16,206 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:14:24,705 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 653.02942 ± 207.434
2026-01-23 01:14:24,705 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [580.3922, 358.97717, 299.53073, 690.47, 617.8667, 664.24207, 968.114, 743.852, 973.8002, 633.04865]
2026-01-23 01:14:24,705 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:14:24,705 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (653.03) for latency DatasetOffice
2026-01-23 01:14:24,711 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 36 minutes, 55 seconds)
2026-01-23 01:15:56,267 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:16:04,729 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1185.39368 ± 459.487
2026-01-23 01:16:04,730 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [1331.5311, 1290.4066, 1328.8832, -179.85121, 1407.033, 1263.7628, 1390.2589, 1338.545, 1453.2085, 1230.1593]
2026-01-23 01:16:04,730 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:16:04,730 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (1185.39) for latency DatasetOffice
2026-01-23 01:16:04,734 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 35 minutes, 11 seconds)
2026-01-23 01:17:36,600 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:17:45,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1424.89233 ± 648.993
2026-01-23 01:17:45,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [1782.5961, 1684.8282, 1722.283, 507.19928, 1682.7919, 1771.3333, 1712.2596, 1766.9937, 1794.6587, -176.0198]
2026-01-23 01:17:45,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:17:45,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (1424.89) for latency DatasetOffice
2026-01-23 01:17:45,027 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 33 minutes, 33 seconds)
2026-01-23 01:19:16,793 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:19:25,225 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1586.97998 ± 676.163
2026-01-23 01:19:25,225 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [934.8849, 1874.7996, 1970.7292, 1894.6049, 1925.1122, 1982.1351, 1905.4166, -240.51627, 1881.0344, 1741.5994]
2026-01-23 01:19:25,225 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:19:25,225 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (1586.98) for latency DatasetOffice
2026-01-23 01:19:25,231 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 31 minutes, 55 seconds)
2026-01-23 01:20:56,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:21:05,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2211.10400 ± 311.994
2026-01-23 01:21:05,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2293.3777, 2437.6182, 1335.0771, 2415.231, 2208.7183, 2108.6394, 2375.3345, 2374.831, 2154.3132, 2407.8987]
2026-01-23 01:21:05,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:21:05,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (2211.10) for latency DatasetOffice
2026-01-23 01:21:05,521 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 30 minutes, 17 seconds)
2026-01-23 01:22:37,157 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:22:45,609 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2211.81787 ± 793.816
2026-01-23 01:22:45,609 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [10.58346, 2606.1355, 1578.8297, 2486.7424, 2602.8115, 2413.6104, 2680.7954, 2616.1953, 2522.3792, 2600.097]
2026-01-23 01:22:45,609 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:22:45,609 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (2211.82) for latency DatasetOffice
2026-01-23 01:22:45,614 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 28 minutes, 36 seconds)
2026-01-23 01:24:17,371 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:24:25,817 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2141.73486 ± 1023.956
2026-01-23 01:24:25,817 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2709.7043, 2575.2732, 2592.7212, 2637.8333, 501.16846, 2708.9338, -216.89072, 2802.7153, 2847.4268, 2258.4602]
2026-01-23 01:24:25,817 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:24:25,824 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 26 minutes, 59 seconds)
2026-01-23 01:25:57,658 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:26:06,209 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2431.60889 ± 879.843
2026-01-23 01:26:06,209 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2947.535, 2286.858, 2777.893, 3028.5828, 2926.735, 2900.853, 2469.6104, 2607.032, 2484.4683, -113.47674]
2026-01-23 01:26:06,209 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:26:06,210 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (2431.61) for latency DatasetOffice
2026-01-23 01:26:06,215 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 25 minutes, 20 seconds)
2026-01-23 01:27:37,646 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:27:46,189 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2983.56812 ± 96.262
2026-01-23 01:27:46,189 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3019.5947, 3024.8896, 2937.6174, 2951.907, 2916.6865, 2760.0195, 3012.1292, 3111.0005, 2996.2786, 3105.558]
2026-01-23 01:27:46,189 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:27:46,189 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (2983.57) for latency DatasetOffice
2026-01-23 01:27:46,193 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 23 minutes, 36 seconds)
2026-01-23 01:29:17,827 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:29:26,246 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3236.19116 ± 108.047
2026-01-23 01:29:26,246 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3260.0718, 3277.0984, 3208.1482, 3285.7856, 3233.8416, 3027.1536, 3293.2131, 3219.5703, 3104.675, 3452.3538]
2026-01-23 01:29:26,246 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:29:26,246 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (3236.19) for latency DatasetOffice
2026-01-23 01:29:26,251 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 21 minutes, 52 seconds)
2026-01-23 01:30:57,271 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:31:05,597 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2958.67700 ± 733.081
2026-01-23 01:31:05,597 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3554.955, 3527.2993, 2015.0427, 3393.5525, 2327.9202, 3348.5027, 3372.8152, 1352.3109, 3424.7056, 3269.6667]
2026-01-23 01:31:05,597 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:31:05,604 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 19 minutes, 59 seconds)
2026-01-23 01:32:36,309 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:32:44,628 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3573.06787 ± 82.412
2026-01-23 01:32:44,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3710.907, 3519.252, 3510.7678, 3568.8223, 3673.0486, 3622.0183, 3500.6335, 3511.673, 3657.236, 3456.3237]
2026-01-23 01:32:44,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:32:44,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (3573.07) for latency DatasetOffice
2026-01-23 01:32:44,634 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 18 minutes)
2026-01-23 01:34:15,126 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:34:23,557 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3517.58081 ± 428.710
2026-01-23 01:34:23,557 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3666.4531, 3302.1091, 3912.8557, 3721.122, 3815.9531, 3566.2737, 3296.7292, 3662.7153, 3853.2136, 2378.3833]
2026-01-23 01:34:23,557 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:34:23,560 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 15 minutes, 56 seconds)
2026-01-23 01:35:53,730 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:36:02,012 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3864.73633 ± 62.365
2026-01-23 01:36:02,013 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3831.7522, 3951.6199, 3818.342, 3855.7283, 3867.2224, 3792.956, 3884.0867, 3898.7954, 3770.5217, 3976.337]
2026-01-23 01:36:02,013 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:36:02,013 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (3864.74) for latency DatasetOffice
2026-01-23 01:36:02,018 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 13 minutes, 52 seconds)
2026-01-23 01:37:31,967 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:37:40,248 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3654.67383 ± 782.891
2026-01-23 01:37:40,248 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3955.1194, 4021.8223, 4088.7134, 3887.2642, 3879.298, 1320.7095, 3799.0977, 3897.724, 3782.4846, 3914.5044]
2026-01-23 01:37:40,248 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:37:40,255 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 11 minutes, 44 seconds)
2026-01-23 01:39:10,275 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:39:18,697 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3861.54810 ± 471.991
2026-01-23 01:39:18,698 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4121.8584, 4058.7612, 2469.267, 4115.165, 3858.4663, 3891.7058, 3990.1008, 4051.9849, 3963.9363, 4094.2341]
2026-01-23 01:39:18,698 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:39:18,708 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 9 minutes, 51 seconds)
2026-01-23 01:40:48,420 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:40:56,678 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4060.89502 ± 52.030
2026-01-23 01:40:56,678 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4103.7505, 4135.178, 4060.0771, 4023.8958, 4060.03, 4064.3784, 4080.4, 4085.8398, 3929.3455, 4066.053]
2026-01-23 01:40:56,678 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:40:56,678 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (4060.90) for latency DatasetOffice
2026-01-23 01:40:56,683 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 7 minutes, 55 seconds)
2026-01-23 01:42:26,050 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:42:34,249 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3804.09253 ± 488.635
2026-01-23 01:42:34,249 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4197.7754, 3402.7183, 3934.3772, 4118.1973, 4049.967, 4085.8784, 3725.366, 4030.9792, 4002.6633, 2493.0034]
2026-01-23 01:42:34,249 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:42:34,254 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 5 minutes, 56 seconds)
2026-01-23 01:44:03,574 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:44:11,777 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4062.67725 ± 96.179
2026-01-23 01:44:11,777 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4053.468, 4108.3994, 4058.2131, 4132.805, 3950.7903, 3830.9265, 4151.3726, 4097.5415, 4161.1973, 4082.0637]
2026-01-23 01:44:11,777 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:44:11,777 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (4062.68) for latency DatasetOffice
2026-01-23 01:44:11,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 4 minutes, 4 seconds)
2026-01-23 01:45:41,254 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:45:49,482 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3992.27881 ± 445.656
2026-01-23 01:45:49,482 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4270.1997, 4261.725, 4330.0996, 4120.737, 4246.66, 3937.2449, 2764.0625, 4200.408, 3712.183, 4079.4705]
2026-01-23 01:45:49,482 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:45:49,488 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 2 minutes, 18 seconds)
2026-01-23 01:47:18,864 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:47:27,171 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3986.28955 ± 455.502
2026-01-23 01:47:27,171 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4221.798, 4186.244, 2641.0354, 4212.03, 4214.2227, 4023.7598, 4186.777, 3981.0786, 4088.583, 4107.3667]
2026-01-23 01:47:27,171 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:47:27,179 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 29 seconds)
2026-01-23 01:48:56,595 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:49:04,848 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4255.40381 ± 74.621
2026-01-23 01:49:04,848 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4295.436, 4374.1, 4217.429, 4179.085, 4140.21, 4244.8877, 4308.298, 4330.529, 4302.0674, 4161.9946]
2026-01-23 01:49:04,848 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:49:04,848 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (4255.40) for latency DatasetOffice
2026-01-23 01:49:04,855 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 28/100 (estimated time remaining: 1 hour, 58 minutes, 47 seconds)
2026-01-23 01:50:34,203 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:50:42,533 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3773.67065 ± 1242.007
2026-01-23 01:50:42,533 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4307.7275, 3849.3628, 4169.303, 4283.792, 4137.2983, 4348.5464, 4059.3772, 4182.324, 4326.7686, 72.20263]
2026-01-23 01:50:42,533 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:50:42,539 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 29/100 (estimated time remaining: 1 hour, 57 minutes, 11 seconds)
2026-01-23 01:52:11,916 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:52:20,213 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4249.64697 ± 87.229
2026-01-23 01:52:20,213 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4259.4136, 4338.721, 4098.1494, 4116.933, 4298.7925, 4177.273, 4313.162, 4244.484, 4372.175, 4277.362]
2026-01-23 01:52:20,214 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:52:20,220 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 30/100 (estimated time remaining: 1 hour, 55 minutes, 35 seconds)
2026-01-23 01:53:49,596 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:53:57,930 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4244.69678 ± 109.700
2026-01-23 01:53:57,930 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4281.97, 4460.247, 4124.7046, 4270.1665, 4373.664, 4127.334, 4287.8257, 4205.6577, 4092.4487, 4222.9497]
2026-01-23 01:53:57,930 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:53:57,937 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 53 minutes, 58 seconds)
2026-01-23 01:55:30,264 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:55:38,621 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4132.09863 ± 521.636
2026-01-23 01:55:38,621 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4217.6914, 4398.776, 2611.8535, 4366.0356, 4425.746, 4076.1985, 4311.8535, 4498.123, 4137.4946, 4277.216]
2026-01-23 01:55:38,621 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:55:38,628 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 53 minutes, 2 seconds)
2026-01-23 01:57:07,988 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:57:16,211 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4328.88379 ± 81.168
2026-01-23 01:57:16,211 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4430.972, 4374.8647, 4356.273, 4374.6543, 4276.88, 4420.4956, 4311.472, 4277.938, 4327.4614, 4137.826]
2026-01-23 01:57:16,211 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:57:16,211 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (4328.88) for latency DatasetOffice
2026-01-23 01:57:16,217 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 51 minutes, 22 seconds)
2026-01-23 01:58:45,594 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:58:53,910 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4092.33154 ± 452.354
2026-01-23 01:58:53,910 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4449.814, 3893.1821, 4247.3735, 4387.448, 4338.1064, 4178.918, 4009.4636, 4237.6177, 4355.06, 2826.3342]
2026-01-23 01:58:53,910 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:58:53,916 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 49 minutes, 44 seconds)
2026-01-23 02:00:23,057 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:00:31,447 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4348.36914 ± 149.716
2026-01-23 02:00:31,447 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4514.1763, 4478.119, 4323.3384, 4207.048, 4381.15, 3977.459, 4374.473, 4369.997, 4485.689, 4372.241]
2026-01-23 02:00:31,447 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:00:31,447 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (4348.37) for latency DatasetOffice
2026-01-23 02:00:31,453 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 48 minutes, 4 seconds)
2026-01-23 02:02:00,879 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:02:09,116 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4314.59229 ± 87.066
2026-01-23 02:02:09,116 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4323.047, 4352.9907, 4291.9463, 4311.9766, 4474.9854, 4113.5938, 4252.607, 4362.4927, 4309.5874, 4352.6973]
2026-01-23 02:02:09,116 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:02:09,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 46 minutes, 25 seconds)
2026-01-23 02:03:38,619 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:03:46,896 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4048.95898 ± 715.273
2026-01-23 02:03:46,896 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4342.079, 4386.236, 1922.8909, 4125.9272, 4262.5273, 4239.528, 4434.0703, 4125.081, 4350.8066, 4300.443]
2026-01-23 02:03:46,896 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:03:46,902 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 44 minutes, 9 seconds)
2026-01-23 02:05:16,572 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:05:24,917 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4415.45605 ± 62.318
2026-01-23 02:05:24,917 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4348.4214, 4386.9316, 4351.0664, 4335.6533, 4538.916, 4407.469, 4457.7373, 4436.271, 4488.557, 4403.5356]
2026-01-23 02:05:24,917 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:05:24,917 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (4415.46) for latency DatasetOffice
2026-01-23 02:05:24,923 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 42 minutes, 37 seconds)
2026-01-23 02:06:54,401 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:07:02,733 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4073.72266 ± 535.082
2026-01-23 02:07:02,734 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4606.988, 3823.0632, 4348.232, 4370.3193, 4426.085, 4177.0854, 3954.57, 4089.0933, 4330.122, 2611.6738]
2026-01-23 02:07:02,734 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:07:02,739 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 41 minutes, 1 second)
2026-01-23 02:08:32,238 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:08:40,610 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4409.69092 ± 93.118
2026-01-23 02:08:40,610 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4409.1855, 4516.2393, 4443.0503, 4231.1006, 4325.6807, 4279.9067, 4486.7383, 4434.67, 4500.8667, 4469.473]
2026-01-23 02:08:40,610 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:08:40,617 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 39 minutes, 27 seconds)
2026-01-23 02:10:10,026 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:10:18,370 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4440.05371 ± 128.851
2026-01-23 02:10:18,370 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4540.7026, 4496.5522, 4389.971, 4433.9507, 4609.9487, 4155.682, 4494.8545, 4548.908, 4280.4155, 4449.5513]
2026-01-23 02:10:18,370 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:10:18,370 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (4440.05) for latency DatasetOffice
2026-01-23 02:10:18,377 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 37 minutes, 51 seconds)
2026-01-23 02:11:47,865 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:11:56,097 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4249.91357 ± 535.511
2026-01-23 02:11:56,097 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4470.8384, 4495.76, 2662.3772, 4442.9146, 4257.1865, 4346.053, 4557.891, 4379.734, 4392.008, 4494.3745]
2026-01-23 02:11:56,097 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:11:56,103 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 36 minutes, 12 seconds)
2026-01-23 02:13:25,673 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:13:34,037 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4518.59473 ± 53.349
2026-01-23 02:13:34,037 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4562.398, 4581.027, 4440.711, 4525.6455, 4540.605, 4581.145, 4497.5303, 4428.0435, 4555.9766, 4472.867]
2026-01-23 02:13:34,037 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:13:34,037 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (4518.59) for latency DatasetOffice
2026-01-23 02:13:34,043 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 34 minutes, 33 seconds)
2026-01-23 02:15:03,441 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:15:11,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4221.68799 ± 531.911
2026-01-23 02:15:11,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4428.86, 3970.0608, 4381.2886, 4516.1143, 4453.5713, 4599.4478, 4277.255, 4391.3984, 4496.1445, 2702.7375]
2026-01-23 02:15:11,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:15:11,812 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 32 minutes, 55 seconds)
2026-01-23 02:16:41,023 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:16:49,355 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4468.25488 ± 96.424
2026-01-23 02:16:49,355 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4426.022, 4608.264, 4435.399, 4463.7046, 4468.0703, 4256.4854, 4484.3594, 4512.1455, 4415.905, 4612.187]
2026-01-23 02:16:49,356 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:16:49,363 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 31 minutes, 13 seconds)
2026-01-23 02:18:18,784 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:18:27,118 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4503.44922 ± 123.082
2026-01-23 02:18:27,118 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4597.0024, 4641.513, 4555.036, 4530.477, 4595.858, 4279.3013, 4520.1206, 4358.8125, 4338.4883, 4617.878]
2026-01-23 02:18:27,118 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:18:27,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 29 minutes, 36 seconds)
2026-01-23 02:19:56,565 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:20:04,851 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4349.86572 ± 490.336
2026-01-23 02:20:04,851 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4526.251, 4531.035, 2887.4814, 4615.02, 4471.2524, 4441.486, 4582.774, 4490.9004, 4442.4653, 4509.9937]
2026-01-23 02:20:04,851 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:20:04,859 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 27 minutes, 58 seconds)
2026-01-23 02:21:34,292 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:21:42,633 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4557.20361 ± 58.968
2026-01-23 02:21:42,633 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4668.073, 4593.989, 4558.7783, 4631.5933, 4467.746, 4543.3433, 4554.0957, 4513.261, 4557.804, 4483.348]
2026-01-23 02:21:42,633 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:21:42,633 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (4557.20) for latency DatasetOffice
2026-01-23 02:21:42,641 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 26 minutes, 19 seconds)
2026-01-23 02:23:12,024 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:23:20,254 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4310.71240 ± 558.246
2026-01-23 02:23:20,254 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4660.3257, 4039.2615, 4696.7485, 4592.288, 4562.745, 4591.496, 4171.1465, 4495.243, 4549.041, 2748.83]
2026-01-23 02:23:20,254 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:23:20,261 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 24 minutes, 39 seconds)
2026-01-23 02:24:49,598 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:24:57,823 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4609.62256 ± 140.899
2026-01-23 02:24:57,823 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4734.256, 4638.052, 4586.136, 4524.6587, 4535.307, 4260.8115, 4676.379, 4697.388, 4650.9976, 4792.24]
2026-01-23 02:24:57,823 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:24:57,823 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (4609.62) for latency DatasetOffice
2026-01-23 02:24:57,833 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 23 minutes, 2 seconds)
2026-01-23 02:26:27,368 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:26:35,666 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4593.26318 ± 122.716
2026-01-23 02:26:35,667 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4612.688, 4712.337, 4581.0645, 4703.169, 4634.109, 4310.6455, 4656.4307, 4615.238, 4421.896, 4685.059]
2026-01-23 02:26:35,667 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:26:35,675 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 21 minutes, 25 seconds)
2026-01-23 02:28:05,079 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:28:13,402 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4319.27686 ± 661.219
2026-01-23 02:28:13,402 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4605.077, 4706.4277, 2826.1213, 4721.9697, 3201.233, 4550.3164, 4538.875, 4637.313, 4670.7983, 4734.639]
2026-01-23 02:28:13,402 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:28:13,408 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 19 minutes, 47 seconds)
2026-01-23 02:29:42,743 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:29:51,109 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4733.96094 ± 75.369
2026-01-23 02:29:51,110 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4657.433, 4853.529, 4654.9795, 4743.2573, 4774.901, 4693.8867, 4741.2544, 4793.681, 4819.587, 4607.1006]
2026-01-23 02:29:51,110 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:29:51,110 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (4733.96) for latency DatasetOffice
2026-01-23 02:29:51,120 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 18 minutes, 9 seconds)
2026-01-23 02:31:20,404 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:31:28,669 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4085.35229 ± 1504.597
2026-01-23 02:31:28,669 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4807.026, 4082.4644, 4676.0684, 4780.0273, 4617.9604, 4602.597, 4388.476, 4542.3994, 4743.375, -386.87152]
2026-01-23 02:31:28,669 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:31:28,677 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 16 minutes, 31 seconds)
2026-01-23 02:32:57,854 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:33:06,127 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4648.10596 ± 81.477
2026-01-23 02:33:06,127 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4733.107, 4711.7783, 4496.5684, 4612.633, 4674.8125, 4528.0693, 4676.91, 4630.28, 4647.9043, 4768.9946]
2026-01-23 02:33:06,127 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:33:06,133 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 14 minutes, 52 seconds)
2026-01-23 02:34:35,483 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:34:43,823 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4634.73584 ± 125.439
2026-01-23 02:34:43,823 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4763.777, 4699.9614, 4440.521, 4776.6587, 4700.054, 4525.8926, 4725.5615, 4676.7803, 4405.727, 4632.4277]
2026-01-23 02:34:43,823 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:34:43,831 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 13 minutes, 13 seconds)
2026-01-23 02:36:13,293 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:36:21,518 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4541.76123 ± 543.296
2026-01-23 02:36:21,519 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4770.0737, 4711.0264, 2929.7234, 4659.167, 4899.1157, 4605.1094, 4704.1226, 4803.32, 4675.883, 4660.0757]
2026-01-23 02:36:21,519 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:36:21,527 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 11 minutes, 35 seconds)
2026-01-23 02:37:50,887 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:37:59,027 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4790.28369 ± 54.920
2026-01-23 02:37:59,027 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4801.3354, 4785.7124, 4667.7686, 4817.6465, 4872.2407, 4828.637, 4813.5845, 4824.1772, 4731.111, 4760.6265]
2026-01-23 02:37:59,027 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:37:59,027 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (4790.28) for latency DatasetOffice
2026-01-23 02:37:59,035 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 9 minutes, 56 seconds)
2026-01-23 02:39:28,392 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:39:36,621 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4441.09229 ± 558.453
2026-01-23 02:39:36,621 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4745.554, 4181.962, 4597.188, 4833.3364, 4716.8228, 4703.689, 4344.763, 4606.035, 4809.915, 2871.6562]
2026-01-23 02:39:36,621 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:39:36,630 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 8 minutes, 18 seconds)
2026-01-23 02:41:05,921 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:41:14,162 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4708.19629 ± 96.061
2026-01-23 02:41:14,163 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4664.21, 4725.8154, 4731.1855, 4836.249, 4600.251, 4498.939, 4775.735, 4799.304, 4679.1553, 4771.1196]
2026-01-23 02:41:14,163 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:41:14,176 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 6 minutes, 41 seconds)
2026-01-23 02:42:43,547 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:42:51,838 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4671.60205 ± 145.902
2026-01-23 02:42:51,838 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4790.629, 4741.2427, 4821.533, 4791.1143, 4755.075, 4574.1636, 4689.844, 4558.644, 4314.5576, 4679.2144]
2026-01-23 02:42:51,838 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:42:51,847 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 5 minutes, 4 seconds)
2026-01-23 02:44:21,185 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:44:29,470 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4588.14746 ± 568.705
2026-01-23 02:44:29,470 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4867.8745, 4859.5903, 2898.0945, 4696.431, 4910.3696, 4648.52, 4721.3022, 4782.157, 4750.104, 4747.0312]
2026-01-23 02:44:29,470 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:44:29,479 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 3 minutes, 26 seconds)
2026-01-23 02:45:58,750 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:46:07,082 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4861.09277 ± 65.742
2026-01-23 02:46:07,082 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4889.247, 4901.6875, 4732.953, 4950.3545, 4893.1147, 4830.081, 4773.3423, 4818.9224, 4913.9126, 4907.311]
2026-01-23 02:46:07,082 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:46:07,082 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (4861.09) for latency DatasetOffice
2026-01-23 02:46:07,090 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 1 minute, 49 seconds)
2026-01-23 02:47:36,285 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:47:44,561 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4509.66553 ± 571.064
2026-01-23 02:47:44,561 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4784.072, 4109.4355, 4933.046, 4793.9707, 4783.5, 4733.2944, 4352.054, 4784.588, 4866.387, 2956.3105]
2026-01-23 02:47:44,562 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:47:44,570 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 10 seconds)
2026-01-23 02:49:13,750 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:49:21,909 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4828.61816 ± 105.634
2026-01-23 02:49:21,909 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4873.898, 4862.585, 4786.4277, 4877.5747, 4787.4536, 4558.2925, 4847.243, 4906.761, 4807.252, 4978.699]
2026-01-23 02:49:21,910 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:49:21,917 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 65/100 (estimated time remaining: 58 minutes, 31 seconds)
2026-01-23 02:50:51,385 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:50:59,566 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4773.75244 ± 149.373
2026-01-23 02:50:59,566 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4908.4106, 4975.226, 4839.0312, 4836.234, 4837.7275, 4504.6416, 4703.839, 4834.063, 4507.4814, 4790.8726]
2026-01-23 02:50:59,566 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:50:59,576 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 66/100 (estimated time remaining: 56 minutes, 54 seconds)
2026-01-23 02:52:28,886 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:52:37,033 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4363.19287 ± 1029.217
2026-01-23 02:52:37,033 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4944.396, 4937.7764, 2959.2202, 4966.329, 4674.6123, 4812.969, 4841.839, 4899.7183, 4797.952, 1797.1154]
2026-01-23 02:52:37,033 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:52:37,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 67/100 (estimated time remaining: 55 minutes, 15 seconds)
2026-01-23 02:54:06,467 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:54:14,598 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4868.46387 ± 72.187
2026-01-23 02:54:14,598 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5020.7173, 4913.6084, 4798.0723, 4812.9185, 4899.116, 4915.107, 4802.5767, 4823.1816, 4915.387, 4783.9565]
2026-01-23 02:54:14,599 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:54:14,599 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (4868.46) for latency DatasetOffice
2026-01-23 02:54:14,609 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 68/100 (estimated time remaining: 53 minutes, 37 seconds)
2026-01-23 02:55:43,894 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:55:52,199 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4582.85498 ± 573.955
2026-01-23 02:55:52,200 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4944.459, 4049.1763, 4807.2573, 4878.467, 4959.224, 5002.409, 4512.236, 4744.416, 4868.1685, 3062.7368]
2026-01-23 02:55:52,200 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:55:52,208 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 69/100 (estimated time remaining: 52 minutes)
2026-01-23 02:57:21,625 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:57:29,787 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4851.18896 ± 105.103
2026-01-23 02:57:29,787 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4978.8457, 4834.453, 4755.195, 4899.7905, 4827.9053, 4588.516, 4905.152, 4914.6533, 4916.372, 4891.0063]
2026-01-23 02:57:29,787 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:57:29,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 70/100 (estimated time remaining: 50 minutes, 24 seconds)
2026-01-23 02:58:59,033 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:59:07,227 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4798.75098 ± 88.684
2026-01-23 02:59:07,227 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4767.928, 4910.943, 4898.72, 4841.555, 4833.335, 4627.355, 4848.673, 4824.3223, 4661.1616, 4773.5195]
2026-01-23 02:59:07,227 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:59:07,238 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 71/100 (estimated time remaining: 48 minutes, 45 seconds)
2026-01-23 03:00:36,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:00:44,800 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4663.56885 ± 825.802
2026-01-23 03:00:44,800 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4927.1196, 4901.2983, 2192.1777, 5018.8916, 4925.386, 4851.5635, 5015.478, 5004.7686, 4854.684, 4944.32]
2026-01-23 03:00:44,800 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:00:44,808 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 72/100 (estimated time remaining: 47 minutes, 9 seconds)
2026-01-23 03:02:14,227 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:02:22,526 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4921.09326 ± 57.575
2026-01-23 03:02:22,526 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4977.4995, 4979.6934, 4854.431, 4993.695, 4875.9517, 4947.069, 4954.2866, 4878.511, 4933.729, 4816.0625]
2026-01-23 03:02:22,526 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:02:22,526 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (4921.09) for latency DatasetOffice
2026-01-23 03:02:22,536 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 73/100 (estimated time remaining: 45 minutes, 32 seconds)
2026-01-23 03:03:51,919 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:04:00,121 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4660.78174 ± 651.179
2026-01-23 03:04:00,122 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5106.1177, 4359.8765, 4945.0605, 5062.3896, 5035.7686, 4935.983, 4407.81, 4894.0996, 5004.4893, 2856.2175]
2026-01-23 03:04:00,122 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:04:00,132 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 74/100 (estimated time remaining: 43 minutes, 54 seconds)
2026-01-23 03:05:29,481 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:05:37,660 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4859.22754 ± 123.086
2026-01-23 03:05:37,661 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4941.492, 4848.5605, 4861.277, 4727.671, 4681.157, 4704.8164, 4800.916, 5021.049, 5025.2637, 4980.0684]
2026-01-23 03:05:37,661 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:05:37,670 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 75/100 (estimated time remaining: 42 minutes, 16 seconds)
2026-01-23 03:07:06,983 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:07:15,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4879.32129 ± 137.712
2026-01-23 03:07:15,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5016.207, 4948.445, 4821.255, 4767.5244, 5088.323, 4647.209, 4947.108, 4944.695, 4677.488, 4934.9604]
2026-01-23 03:07:15,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:07:15,134 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 76/100 (estimated time remaining: 40 minutes, 39 seconds)
2026-01-23 03:08:44,539 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:08:52,769 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4675.41699 ± 802.981
2026-01-23 03:08:52,769 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5033.265, 5035.4556, 2275.7412, 4972.077, 5000.6206, 4803.053, 4904.3276, 4959.876, 4868.58, 4901.181]
2026-01-23 03:08:52,769 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:08:52,778 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 77/100 (estimated time remaining: 39 minutes, 2 seconds)
2026-01-23 03:10:22,163 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:10:30,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4943.88965 ± 53.790
2026-01-23 03:10:30,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4895.6094, 5034.9165, 4891.112, 4882.8003, 4958.7974, 5022.2456, 4984.79, 4963.204, 4902.6396, 4902.7773]
2026-01-23 03:10:30,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:10:30,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (4943.89) for latency DatasetOffice
2026-01-23 03:10:30,347 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 78/100 (estimated time remaining: 37 minutes, 23 seconds)
2026-01-23 03:11:59,675 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:12:07,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4637.38721 ± 594.524
2026-01-23 03:12:07,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5070.712, 4274.6807, 4875.501, 5010.946, 4989.254, 4848.8203, 4529.5913, 4836.697, 4944.8555, 2992.814]
2026-01-23 03:12:07,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:12:07,997 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 79/100 (estimated time remaining: 35 minutes, 46 seconds)
2026-01-23 03:13:37,371 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:13:45,565 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4987.58301 ± 86.544
2026-01-23 03:13:45,566 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4944.329, 5127.7197, 4999.2627, 4971.0215, 5000.1445, 4770.8857, 5009.9287, 5053.801, 4980.639, 5018.093]
2026-01-23 03:13:45,566 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:13:45,566 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (4987.58) for latency DatasetOffice
2026-01-23 03:13:45,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 80/100 (estimated time remaining: 34 minutes, 9 seconds)
2026-01-23 03:15:15,129 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:15:23,422 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4904.06641 ± 75.144
2026-01-23 03:15:23,422 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4962.571, 4997.8276, 4861.3813, 4958.382, 4897.268, 4734.572, 4899.6206, 4928.5947, 4829.3184, 4971.125]
2026-01-23 03:15:23,422 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:15:23,435 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 81/100 (estimated time remaining: 32 minutes, 33 seconds)
2026-01-23 03:16:52,958 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:17:01,222 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4755.30322 ± 612.206
2026-01-23 03:17:01,222 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4881.7124, 5029.7705, 2933.736, 5089.601, 5023.0283, 4815.7017, 4982.665, 4979.035, 4873.2803, 4944.501]
2026-01-23 03:17:01,223 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:17:01,233 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 82/100 (estimated time remaining: 30 minutes, 56 seconds)
2026-01-23 03:18:30,730 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:18:38,981 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5019.58008 ± 70.698
2026-01-23 03:18:38,982 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5088.922, 5142.2295, 4999.237, 5092.7446, 4882.491, 5026.9805, 4985.6123, 5017.4893, 5004.862, 4955.2305]
2026-01-23 03:18:38,982 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:18:38,982 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (5019.58) for latency DatasetOffice
2026-01-23 03:18:38,992 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 83/100 (estimated time remaining: 29 minutes, 19 seconds)
2026-01-23 03:20:08,393 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:20:16,570 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4680.06885 ± 532.754
2026-01-23 03:20:16,570 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4982.5396, 4392.301, 4887.1567, 4919.283, 4927.1987, 4926.8267, 4632.598, 4917.9644, 5036.7046, 3178.1155]
2026-01-23 03:20:16,570 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:20:16,581 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 84/100 (estimated time remaining: 27 minutes, 41 seconds)
2026-01-23 03:21:46,017 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:21:54,295 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4950.60059 ± 93.627
2026-01-23 03:21:54,295 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4911.845, 4945.0767, 4919.6865, 4942.621, 4936.465, 4733.4434, 4982.3623, 4965.03, 5101.3057, 5068.17]
2026-01-23 03:21:54,295 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:21:54,306 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 85/100 (estimated time remaining: 26 minutes, 3 seconds)
2026-01-23 03:23:23,518 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:23:31,748 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4958.43213 ± 84.742
2026-01-23 03:23:31,748 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4909.396, 5046.031, 4917.7188, 4899.555, 4993.8374, 4853.9766, 5152.383, 4981.0146, 4879.2734, 4951.138]
2026-01-23 03:23:31,748 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:23:31,759 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 86/100 (estimated time remaining: 24 minutes, 24 seconds)
2026-01-23 03:25:01,169 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:25:09,499 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4791.09033 ± 595.887
2026-01-23 03:25:09,500 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4926.0034, 5031.08, 3022.8523, 4884.205, 5090.5674, 4841.033, 4987.7305, 5155.166, 4996.0947, 4976.1714]
2026-01-23 03:25:09,500 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:25:09,510 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 87/100 (estimated time remaining: 22 minutes, 47 seconds)
2026-01-23 03:26:38,829 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:26:47,027 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5010.66113 ± 97.419
2026-01-23 03:26:47,027 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4835.884, 5164.966, 4930.113, 5109.6035, 5067.068, 5063.5747, 4984.6924, 4957.409, 5081.4565, 4911.8394]
2026-01-23 03:26:47,027 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:26:47,039 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 88/100 (estimated time remaining: 21 minutes, 8 seconds)
2026-01-23 03:28:16,326 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:28:24,612 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4682.90479 ± 566.903
2026-01-23 03:28:24,612 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5051.5093, 4592.003, 4856.7144, 4991.236, 5010.559, 4880.048, 4588.275, 4872.2695, 4942.43, 3044.0046]
2026-01-23 03:28:24,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:28:24,624 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 89/100 (estimated time remaining: 19 minutes, 31 seconds)
2026-01-23 03:29:54,074 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:30:02,222 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4880.52246 ± 169.889
2026-01-23 03:30:02,222 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4959.725, 5016.043, 4744.414, 4843.64, 4993.2905, 4555.916, 5037.905, 4625.9863, 4994.923, 5033.3813]
2026-01-23 03:30:02,222 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:30:02,232 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 90/100 (estimated time remaining: 17 minutes, 53 seconds)
2026-01-23 03:31:31,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:31:39,926 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4953.02588 ± 144.016
2026-01-23 03:31:39,926 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4922.672, 5025.7646, 4962.9766, 4976.749, 5133.1436, 4796.8794, 5119.541, 5062.6284, 4636.064, 4893.838]
2026-01-23 03:31:39,926 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:31:39,937 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 91/100 (estimated time remaining: 16 minutes, 16 seconds)
2026-01-23 03:33:09,282 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:33:17,480 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4744.79346 ± 842.465
2026-01-23 03:33:17,480 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5073.7515, 5072.5, 2224.3164, 5029.897, 4987.286, 4944.3765, 5104.1196, 5068.4478, 5050.28, 4892.9556]
2026-01-23 03:33:17,480 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:33:17,492 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 92/100 (estimated time remaining: 14 minutes, 38 seconds)
2026-01-23 03:34:46,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:34:54,921 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5007.72363 ± 41.639
2026-01-23 03:34:54,921 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4996.2476, 5088.752, 4990.9463, 5024.8506, 4983.5493, 4936.7773, 5069.8345, 5003.888, 4995.0405, 4987.3545]
2026-01-23 03:34:54,921 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:34:54,933 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes)
2026-01-23 03:36:24,263 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:36:32,535 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4761.19189 ± 596.771
2026-01-23 03:36:32,535 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5071.9243, 4293.438, 5040.4097, 5158.2646, 5006.1147, 5153.2764, 4683.166, 4947.039, 5118.351, 3139.9343]
2026-01-23 03:36:32,535 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:36:32,567 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 23 seconds)
2026-01-23 03:38:01,835 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:38:09,992 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4988.57764 ± 106.556
2026-01-23 03:38:09,992 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5109.9536, 5128.814, 5000.8994, 4743.261, 4898.3394, 4921.462, 4997.999, 5006.664, 5056.856, 5021.53]
2026-01-23 03:38:09,992 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:38:10,006 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 95/100 (estimated time remaining: 9 minutes, 45 seconds)
2026-01-23 03:39:39,321 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:39:47,586 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5001.12402 ± 126.830
2026-01-23 03:39:47,586 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5144.8906, 5100.083, 4956.081, 5090.4204, 5048.362, 4856.5723, 4979.3555, 5067.83, 4703.79, 5063.8613]
2026-01-23 03:39:47,586 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:39:47,599 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 7 seconds)
2026-01-23 03:41:17,040 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:41:25,293 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4763.43262 ± 822.763
2026-01-23 03:41:25,293 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5089.5806, 5028.03, 2304.957, 5078.4146, 4989.947, 4855.493, 5095.7935, 5089.665, 5111.9595, 4990.48]
2026-01-23 03:41:25,293 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:41:25,306 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 30 seconds)
2026-01-23 03:42:54,687 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:43:02,832 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5037.83594 ± 81.805
2026-01-23 03:43:02,833 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4993.0776, 5128.5845, 5086.4893, 4994.0684, 4972.128, 5180.6436, 4874.03, 5033.3716, 5054.5967, 5061.3696]
2026-01-23 03:43:02,833 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:43:02,833 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1274 [INFO]: New best (5037.84) for latency DatasetOffice
2026-01-23 03:43:02,843 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 98/100 (estimated time remaining: 4 minutes, 52 seconds)
2026-01-23 03:44:32,195 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:44:40,485 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4699.52881 ± 643.895
2026-01-23 03:44:40,485 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5145.655, 4334.498, 4885.4727, 4968.2104, 5114.2397, 5114.79, 4488.7397, 4921.2563, 5088.569, 2933.8586]
2026-01-23 03:44:40,485 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:44:40,496 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 15 seconds)
2026-01-23 03:46:09,779 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:46:18,003 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4948.45459 ± 136.133
2026-01-23 03:46:18,004 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4810.799, 5188.6245, 4859.0967, 4831.218, 4888.4014, 4757.9966, 5025.391, 5006.55, 4975.5205, 5140.9487]
2026-01-23 03:46:18,004 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:46:18,015 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1247 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 37 seconds)
2026-01-23 03:47:47,447 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:47:55,734 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4997.92871 ± 132.408
2026-01-23 03:47:55,734 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5061.4443, 5109.7554, 4953.4756, 5129.2183, 5066.8506, 4746.6084, 5031.6045, 4902.9385, 4813.585, 5163.8066]
2026-01-23 03:47:55,734 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:47:55,748 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1299 [DEBUG]: Training session finished
