2026-01-22 23:14:22,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-halfcheetah/DatasetOffice-bpql-mem5 
2026-01-22 23:14:22,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-halfcheetah/DatasetOffice-bpql-mem5 
2026-01-22 23:14:22,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x14b32299fc90>}
2026-01-22 23:14:22,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1159 [DEBUG]: using device: cuda
2026-01-22 23:14:22,799 baseline-bpql-noisy-halfcheetah:77 [WARNING]: args.assumed_delay != args.horizon: 5 != 32
2026-01-22 23:14:22,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1181 [INFO]: Creating new trainer
2026-01-22 23:14:22,816 baseline-bpql-noisy-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=47, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2026-01-22 23:14:22,816 baseline-bpql-noisy-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-22 23:14:23,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1242 [DEBUG]: Starting training session...
2026-01-22 23:14:23,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 1/100
2026-01-22 23:15:56,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:16:05,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: -453.60919 ± 54.670
2026-01-22 23:16:05,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [-415.83496, -414.83194, -408.20612, -520.2809, -505.50516, -427.45047, -531.68585, -403.6203, -519.44196, -389.23413]
2026-01-22 23:16:05,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:16:05,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (-453.61) for latency DatasetOffice
2026-01-22 23:16:05,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 48 minutes, 11 seconds)
2026-01-22 23:17:43,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:17:52,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: -201.26225 ± 96.620
2026-01-22 23:17:52,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [-167.4695, -423.93015, -134.93256, -276.79276, -221.0555, -244.2476, -184.63206, -191.22276, -119.4403, -48.89929]
2026-01-22 23:17:52,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:17:52,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (-201.26) for latency DatasetOffice
2026-01-22 23:17:52,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 50 minutes, 23 seconds)
2026-01-22 23:19:30,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:19:39,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 727.01678 ± 193.089
2026-01-22 23:19:39,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [884.7577, 795.8904, 685.8994, 591.4741, 811.5929, 746.3903, 237.4792, 758.5436, 998.9373, 759.203]
2026-01-22 23:19:39,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:19:39,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (727.02) for latency DatasetOffice
2026-01-22 23:19:39,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 50 minutes)
2026-01-22 23:21:16,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:21:25,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 846.24902 ± 509.401
2026-01-22 23:21:25,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [1734.2291, 1560.2086, 188.1109, 896.5508, 736.28217, 1081.9427, 427.65536, 1112.1484, 163.02638, 562.3354]
2026-01-22 23:21:25,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:21:25,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (846.25) for latency DatasetOffice
2026-01-22 23:21:25,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 48 minutes, 53 seconds)
2026-01-22 23:23:03,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:23:12,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1311.38232 ± 610.360
2026-01-22 23:23:12,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [1351.569, 971.67755, 1950.6934, 749.38007, 1977.4204, 262.95975, 1462.7146, 1860.8141, 552.09344, 1974.5001]
2026-01-22 23:23:12,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:23:12,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (1311.38) for latency DatasetOffice
2026-01-22 23:23:12,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 47 minutes, 31 seconds)
2026-01-22 23:24:50,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:24:59,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2027.38281 ± 615.303
2026-01-22 23:24:59,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2144.9487, 2153.0576, 2533.653, 302.01648, 2255.271, 2367.6772, 1726.9788, 2026.9685, 2465.9138, 2297.3438]
2026-01-22 23:24:59,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:24:59,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (2027.38) for latency DatasetOffice
2026-01-22 23:24:59,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 47 minutes, 18 seconds)
2026-01-22 23:26:37,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:26:46,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1965.46936 ± 765.220
2026-01-22 23:26:46,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2496.7903, 1376.4951, 2570.8853, 2296.7903, 2431.6748, 48.57776, 1547.9103, 1818.4084, 2625.2952, 2441.865]
2026-01-22 23:26:46,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:26:46,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 45 minutes, 31 seconds)
2026-01-22 23:28:24,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:28:33,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1891.75757 ± 768.680
2026-01-22 23:28:33,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2302.6482, 1380.2876, 2875.9426, 2081.165, 1284.1011, 591.35406, 1160.927, 1580.4431, 2931.6055, 2729.1028]
2026-01-22 23:28:33,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:28:33,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 43 minutes, 42 seconds)
2026-01-22 23:30:10,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:30:19,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2967.98511 ± 146.999
2026-01-22 23:30:19,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3040.929, 2763.3755, 2791.0142, 2855.699, 2917.0056, 3137.2837, 3246.974, 2884.924, 3061.8745, 2980.7705]
2026-01-22 23:30:19,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:30:19,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (2967.99) for latency DatasetOffice
2026-01-22 23:30:19,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 41 minutes, 56 seconds)
2026-01-22 23:31:57,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:32:06,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3068.03101 ± 264.053
2026-01-22 23:32:06,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2752.2485, 2534.558, 2904.8118, 3230.408, 3410.7524, 3002.3345, 3315.4514, 3106.2017, 3081.1, 3342.446]
2026-01-22 23:32:06,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:32:06,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (3068.03) for latency DatasetOffice
2026-01-22 23:32:06,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 40 minutes, 8 seconds)
2026-01-22 23:33:44,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:33:53,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3213.85986 ± 173.184
2026-01-22 23:33:53,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3241.9697, 3097.37, 3058.4587, 2985.6506, 3182.907, 3283.2808, 3521.127, 3174.579, 3079.944, 3513.31]
2026-01-22 23:33:53,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:33:53,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (3213.86) for latency DatasetOffice
2026-01-22 23:33:53,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 38 minutes, 20 seconds)
2026-01-22 23:35:31,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:35:39,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3144.23047 ± 369.718
2026-01-22 23:35:39,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2783.2068, 3329.3726, 3524.6636, 2909.247, 3262.4656, 3183.0103, 3328.2031, 2259.0452, 3478.288, 3384.8008]
2026-01-22 23:35:39,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:35:39,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 36 minutes, 32 seconds)
2026-01-22 23:37:17,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:37:26,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3637.90039 ± 197.157
2026-01-22 23:37:26,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3674.679, 3725.6846, 3674.2104, 3388.4744, 3284.339, 3919.991, 3839.0107, 3725.907, 3739.2073, 3407.4995]
2026-01-22 23:37:26,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:37:26,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (3637.90) for latency DatasetOffice
2026-01-22 23:37:26,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 34 minutes, 47 seconds)
2026-01-22 23:39:04,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:39:13,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3547.49854 ± 632.855
2026-01-22 23:39:13,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3928.9302, 3773.496, 1747.2427, 3498.411, 3584.674, 3831.6995, 3355.3535, 4032.2302, 3945.0288, 3777.9167]
2026-01-22 23:39:13,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:39:13,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 32 minutes, 59 seconds)
2026-01-22 23:40:51,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:41:00,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3508.25928 ± 206.762
2026-01-22 23:41:00,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3386.8044, 3034.9292, 3728.6372, 3582.5713, 3576.2249, 3398.1511, 3398.3572, 3570.6372, 3589.2302, 3817.0522]
2026-01-22 23:41:00,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:41:00,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 31 minutes, 14 seconds)
2026-01-22 23:42:38,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:42:47,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3778.40039 ± 319.553
2026-01-22 23:42:47,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3734.6426, 4326.319, 4083.9363, 3720.657, 3151.319, 3938.8357, 3496.0955, 4019.3428, 3522.5383, 3790.3179]
2026-01-22 23:42:47,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:42:47,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (3778.40) for latency DatasetOffice
2026-01-22 23:42:47,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 29 minutes, 26 seconds)
2026-01-22 23:44:24,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:44:33,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3647.06982 ± 420.384
2026-01-22 23:44:33,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3818.3303, 3773.693, 3792.2463, 3815.5156, 3752.927, 3522.5132, 3979.0867, 2429.1606, 3725.7632, 3861.462]
2026-01-22 23:44:33,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:44:33,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 27 minutes, 40 seconds)
2026-01-22 23:46:11,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:46:20,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3876.14648 ± 145.985
2026-01-22 23:46:20,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3832.128, 4077.1272, 3933.6763, 3867.026, 3691.4946, 4066.4812, 3765.1672, 4066.0195, 3673.4407, 3788.9045]
2026-01-22 23:46:20,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:46:20,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (3876.15) for latency DatasetOffice
2026-01-22 23:46:20,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 25 minutes, 53 seconds)
2026-01-22 23:47:58,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:48:07,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3905.84497 ± 202.570
2026-01-22 23:48:07,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3488.6345, 4013.0615, 4175.653, 3820.411, 3717.4004, 4001.9563, 3738.2637, 4007.3745, 4150.851, 3944.8445]
2026-01-22 23:48:07,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:48:07,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (3905.84) for latency DatasetOffice
2026-01-22 23:48:07,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 24 minutes, 9 seconds)
2026-01-22 23:49:45,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:49:54,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3366.27539 ± 953.410
2026-01-22 23:49:54,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3544.135, 2998.0046, 3011.146, 3666.9114, 3586.5042, 749.6818, 4072.7778, 4088.8115, 3806.375, 4138.407]
2026-01-22 23:49:54,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:49:54,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 22 minutes, 20 seconds)
2026-01-22 23:51:31,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:51:40,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3773.01025 ± 970.082
2026-01-22 23:51:40,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4105.1436, 4041.25, 4032.6855, 4154.706, 4171.2847, 4103.5425, 3866.098, 4219.331, 876.45966, 4159.6006]
2026-01-22 23:51:40,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:51:40,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 20 minutes, 33 seconds)
2026-01-22 23:53:18,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:53:27,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3682.50732 ± 941.603
2026-01-22 23:53:27,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [1837.4578, 2217.7478, 4546.116, 4231.2446, 3984.3074, 4213.7114, 4444.217, 2871.9126, 4379.1846, 4099.175]
2026-01-22 23:53:27,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:53:27,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 18 minutes, 46 seconds)
2026-01-22 23:55:05,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:55:14,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3541.70972 ± 1348.918
2026-01-22 23:55:14,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3909.2192, 4462.2515, 4223.9556, 4525.5786, 4527.171, 789.6093, 3804.6665, 3923.412, 4259.8447, 991.39185]
2026-01-22 23:55:14,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:55:14,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 16 minutes, 56 seconds)
2026-01-22 23:56:51,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:57:00,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4233.04932 ± 164.198
2026-01-22 23:57:00,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4377.1426, 4472.6646, 4222.314, 4099.8394, 4076.2896, 4266.439, 3881.6353, 4337.354, 4248.57, 4348.244]
2026-01-22 23:57:00,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:57:00,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (4233.05) for latency DatasetOffice
2026-01-22 23:57:00,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 15 minutes, 7 seconds)
2026-01-22 23:58:38,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:58:47,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4296.52344 ± 441.363
2026-01-22 23:58:47,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3911.8093, 3182.2014, 4532.1763, 4471.9644, 4507.167, 4235.5454, 4593.103, 4399.9453, 4265.356, 4865.964]
2026-01-22 23:58:47,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:58:47,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (4296.52) for latency DatasetOffice
2026-01-22 23:58:47,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 13 minutes, 20 seconds)
2026-01-23 00:00:25,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:00:34,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4375.44824 ± 153.355
2026-01-23 00:00:34,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4510.4946, 4412.0664, 4115.9805, 4216.5283, 4146.763, 4357.3115, 4532.2866, 4560.7285, 4452.388, 4449.9326]
2026-01-23 00:00:34,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:00:34,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (4375.45) for latency DatasetOffice
2026-01-23 00:00:34,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 11 minutes, 32 seconds)
2026-01-23 00:02:11,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:02:20,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4376.22119 ± 578.666
2026-01-23 00:02:20,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4675.194, 4446.72, 4724.903, 4609.71, 4589.2227, 4444.0757, 4365.8594, 2670.8826, 4578.5005, 4657.14]
2026-01-23 00:02:20,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:02:20,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (4376.22) for latency DatasetOffice
2026-01-23 00:02:20,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 9 minutes, 44 seconds)
2026-01-23 00:03:58,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:04:07,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3857.38794 ± 873.339
2026-01-23 00:04:07,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4180.934, 4298.2637, 4526.2495, 2595.2058, 3759.2842, 4006.2214, 1831.9227, 4252.412, 4612.9375, 4510.4507]
2026-01-23 00:04:07,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:04:07,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 7 minutes, 57 seconds)
2026-01-23 00:05:44,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:05:53,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4506.52197 ± 139.148
2026-01-23 00:05:53,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4560.7954, 4683.219, 4642.725, 4405.732, 4575.6533, 4627.715, 4222.346, 4490.232, 4333.4, 4523.4033]
2026-01-23 00:05:53,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:05:53,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (4506.52) for latency DatasetOffice
2026-01-23 00:05:53,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 6 minutes, 10 seconds)
2026-01-23 00:07:31,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:07:40,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4306.82324 ± 414.920
2026-01-23 00:07:40,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3891.8547, 3324.2239, 4631.044, 4409.1304, 3981.6235, 4667.26, 4501.074, 4596.6436, 4466.0806, 4599.2964]
2026-01-23 00:07:40,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:07:40,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 4 minutes, 26 seconds)
2026-01-23 00:09:18,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:09:27,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4496.12744 ± 293.376
2026-01-23 00:09:27,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4095.299, 4890.161, 4553.509, 4497.898, 4882.395, 4797.1187, 4254.8843, 4022.106, 4580.0845, 4387.8164]
2026-01-23 00:09:27,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:09:27,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 2 minutes, 40 seconds)
2026-01-23 00:11:04,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:11:13,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3880.65747 ± 1188.205
2026-01-23 00:11:13,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [1715.6858, 4517.0444, 4502.795, 1816.5438, 4935.976, 4637.375, 4495.2153, 2846.5693, 4661.41, 4677.9585]
2026-01-23 00:11:13,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:11:13,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 50 seconds)
2026-01-23 00:12:50,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:12:59,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4201.73779 ± 930.767
2026-01-23 00:12:59,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4516.23, 4305.154, 4812.7617, 4576.484, 1614.2854, 4538.824, 4642.519, 4867.9404, 4600.436, 3542.7446]
2026-01-23 00:12:59,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:12:59,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 58 minutes, 47 seconds)
2026-01-23 00:14:35,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:14:44,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4630.75830 ± 516.924
2026-01-23 00:14:44,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4926.503, 4811.125, 4869.007, 3161.024, 4745.626, 4653.681, 4448.4956, 5002.8945, 5015.388, 4673.8394]
2026-01-23 00:14:44,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:14:44,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (4630.76) for latency DatasetOffice
2026-01-23 00:14:44,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 56 minutes, 42 seconds)
2026-01-23 00:16:20,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:16:29,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4587.33691 ± 385.540
2026-01-23 00:16:29,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4507.1455, 3539.5825, 4843.1055, 4508.3184, 4760.625, 4574.092, 5042.1133, 4551.0415, 4736.6577, 4810.69]
2026-01-23 00:16:29,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:16:29,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 54 minutes, 33 seconds)
2026-01-23 00:18:04,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:18:13,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4471.90430 ± 897.463
2026-01-23 00:18:13,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4662.8135, 4881.628, 4759.1265, 4750.3833, 1819.377, 4737.152, 4804.098, 4837.048, 4415.118, 5052.301]
2026-01-23 00:18:13,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:18:13,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 52 minutes, 17 seconds)
2026-01-23 00:19:49,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:19:58,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4453.22363 ± 610.154
2026-01-23 00:19:58,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4626.817, 4530.002, 4777.4043, 4915.155, 4203.604, 4440.729, 4648.3887, 2727.5684, 4876.4033, 4786.163]
2026-01-23 00:19:58,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:19:58,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 50 minutes, 7 seconds)
2026-01-23 00:21:33,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:21:42,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4624.04541 ± 183.222
2026-01-23 00:21:42,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4540.709, 4681.4756, 4856.793, 4605.1147, 4307.29, 4365.6914, 4904.481, 4573.475, 4631.045, 4774.374]
2026-01-23 00:21:42,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:21:42,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 48 minutes, 12 seconds)
2026-01-23 00:23:18,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:23:27,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4719.77197 ± 228.162
2026-01-23 00:23:27,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4862.72, 4943.2646, 4728.476, 4382.457, 4754.073, 4799.816, 4199.7607, 4756.123, 4917.7725, 4853.256]
2026-01-23 00:23:27,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:23:27,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (4719.77) for latency DatasetOffice
2026-01-23 00:23:27,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 46 minutes, 17 seconds)
2026-01-23 00:25:02,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:25:11,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4662.24756 ± 366.540
2026-01-23 00:25:11,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4502.5103, 3664.3884, 5020.4155, 4684.544, 4850.5527, 4805.577, 4580.315, 4703.7, 4831.163, 4979.3145]
2026-01-23 00:25:11,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:25:11,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 44 minutes, 24 seconds)
2026-01-23 00:26:46,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:26:55,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4890.43164 ± 124.311
2026-01-23 00:26:55,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4989.772, 4881.4487, 4784.905, 4789.17, 5041.021, 5085.8257, 4902.163, 4960.4375, 4668.2725, 4801.301]
2026-01-23 00:26:55,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:26:55,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (4890.43) for latency DatasetOffice
2026-01-23 00:26:55,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 42 minutes, 33 seconds)
2026-01-23 00:28:30,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:28:38,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4626.06201 ± 538.786
2026-01-23 00:28:38,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4755.329, 4977.4243, 4870.1714, 4697.8066, 4827.8184, 4671.463, 5092.6006, 3142.2727, 4956.082, 4269.653]
2026-01-23 00:28:38,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:28:38,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 40 minutes, 39 seconds)
2026-01-23 00:30:13,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:30:22,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4518.81787 ± 1003.573
2026-01-23 00:30:22,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4850.308, 4874.7876, 4866.1943, 5059.374, 1529.0013, 5021.976, 4697.6455, 4869.541, 4742.1406, 4677.21]
2026-01-23 00:30:22,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:30:22,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 38 minutes, 45 seconds)
2026-01-23 00:31:57,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:32:06,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4968.50000 ± 240.058
2026-01-23 00:32:06,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5288.7905, 5001.6626, 4985.159, 4462.422, 4830.656, 5167.1255, 4736.7324, 5287.4688, 4929.2, 4995.7866]
2026-01-23 00:32:06,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:32:06,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (4968.50) for latency DatasetOffice
2026-01-23 00:32:06,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 36 minutes, 52 seconds)
2026-01-23 00:33:41,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:33:49,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4326.63086 ± 718.437
2026-01-23 00:33:49,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4316.632, 3697.7576, 2378.952, 4631.578, 4621.947, 4733.3677, 4702.3477, 4715.082, 4768.7417, 4699.901]
2026-01-23 00:33:49,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:33:49,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 35 minutes, 2 seconds)
2026-01-23 00:35:24,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:35:33,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4951.83350 ± 245.033
2026-01-23 00:35:33,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4600.7676, 5156.083, 4812.2266, 4831.989, 4858.5986, 5235.268, 5314.3916, 4954.8203, 4585.6025, 5168.591]
2026-01-23 00:35:33,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:35:33,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 33 minutes, 16 seconds)
2026-01-23 00:37:08,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:37:16,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4636.18652 ± 460.546
2026-01-23 00:37:16,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4753.671, 4603.9546, 4785.3765, 4929.872, 4762.7686, 4678.539, 4893.7803, 3282.342, 4857.3765, 4814.185]
2026-01-23 00:37:16,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:37:16,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 31 minutes, 31 seconds)
2026-01-23 00:38:51,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:39:00,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4972.97949 ± 169.427
2026-01-23 00:39:00,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5099.2085, 5088.568, 4972.754, 4773.436, 4694.026, 5092.265, 5266.8916, 5038.3423, 4907.1123, 4797.188]
2026-01-23 00:39:00,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:39:00,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (4972.98) for latency DatasetOffice
2026-01-23 00:39:00,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 29 minutes, 45 seconds)
2026-01-23 00:40:35,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:40:43,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5067.03613 ± 156.625
2026-01-23 00:40:43,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5044.3257, 5247.989, 5164.0166, 4869.114, 5135.575, 5273.6284, 4811.975, 4862.6245, 5106.183, 5154.9243]
2026-01-23 00:40:43,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:40:43,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (5067.04) for latency DatasetOffice
2026-01-23 00:40:43,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 28 minutes, 2 seconds)
2026-01-23 00:42:18,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:42:27,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4862.41992 ± 412.953
2026-01-23 00:42:27,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4604.04, 3784.5781, 5210.6543, 4806.5854, 5083.4556, 5268.054, 4940.1367, 4898.1406, 4807.412, 5221.1357]
2026-01-23 00:42:27,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:42:27,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 26 minutes, 17 seconds)
2026-01-23 00:44:02,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:44:11,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5116.22998 ± 151.070
2026-01-23 00:44:11,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4964.4985, 5048.891, 5117.3774, 5246.033, 5303.319, 5159.9155, 5218.1606, 5237.533, 5098.915, 4767.661]
2026-01-23 00:44:11,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:44:11,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (5116.23) for latency DatasetOffice
2026-01-23 00:44:11,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 24 minutes, 32 seconds)
2026-01-23 00:45:45,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:45:54,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4867.21777 ± 504.730
2026-01-23 00:45:54,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4814.958, 5259.375, 4990.376, 5143.998, 4899.495, 4732.6455, 5091.533, 3447.737, 5318.312, 4973.75]
2026-01-23 00:45:54,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:45:54,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 22 minutes, 49 seconds)
2026-01-23 00:47:29,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:47:38,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4641.70703 ± 1022.608
2026-01-23 00:47:38,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [1754.0583, 5163.7944, 5221.7144, 5039.252, 5063.101, 5190.3936, 5019.1855, 3960.7776, 5060.5283, 4944.264]
2026-01-23 00:47:38,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:47:38,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 21 minutes, 9 seconds)
2026-01-23 00:49:13,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:49:21,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4754.73584 ± 542.092
2026-01-23 00:49:21,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5076.522, 5108.0864, 4995.035, 4399.7275, 4926.049, 4967.5815, 4680.2837, 5142.5967, 3258.3982, 4993.0767]
2026-01-23 00:49:21,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:49:21,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 19 minutes, 24 seconds)
2026-01-23 00:50:56,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:51:05,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4947.21973 ± 341.168
2026-01-23 00:51:05,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4538.0015, 4157.5547, 5283.531, 5106.1787, 5198.275, 4931.436, 4920.1, 5065.4287, 5342.497, 4929.193]
2026-01-23 00:51:05,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:51:05,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 17 minutes, 41 seconds)
2026-01-23 00:52:40,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:52:48,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5096.46777 ± 130.295
2026-01-23 00:52:48,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5087.3613, 5063.744, 5050.547, 5151.3486, 5281.855, 5127.572, 4938.966, 5342.7866, 4907.489, 5013.0103]
2026-01-23 00:52:48,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:52:48,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 15 minutes, 57 seconds)
2026-01-23 00:54:23,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:54:32,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4970.73340 ± 565.065
2026-01-23 00:54:32,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5131.1216, 5337.0063, 5211.7505, 4865.794, 4812.489, 4842.57, 5416.616, 3410.8096, 5435.11, 5244.071]
2026-01-23 00:54:32,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:54:32,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 14 minutes, 14 seconds)
2026-01-23 00:56:07,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:56:16,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5114.82666 ± 151.993
2026-01-23 00:56:16,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5048.2183, 5218.5254, 5042.8354, 5348.8403, 4937.963, 5063.528, 5090.3164, 5402.6465, 4923.2275, 5072.1646]
2026-01-23 00:56:16,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:56:16,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 12 minutes, 29 seconds)
2026-01-23 00:57:50,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:57:59,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4915.09766 ± 235.244
2026-01-23 00:57:59,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5022.6426, 5161.536, 4900.7646, 4390.929, 5180.429, 5088.881, 4670.89, 4799.7456, 4851.1973, 5083.9614]
2026-01-23 00:57:59,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:57:59,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 10 minutes, 47 seconds)
2026-01-23 00:59:34,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:59:43,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4955.34766 ± 417.023
2026-01-23 00:59:43,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4511.8164, 3896.6494, 5188.2144, 5101.1797, 5138.1245, 4883.279, 5263.4263, 5223.5415, 5324.1226, 5023.123]
2026-01-23 00:59:43,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:59:43,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 9 minutes, 1 second)
2026-01-23 01:01:17,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:01:26,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5089.29834 ± 179.133
2026-01-23 01:01:26,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5275.368, 5092.712, 5138.6665, 5201.056, 4933.977, 4755.523, 5129.361, 5047.478, 5407.015, 4911.8286]
2026-01-23 01:01:26,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:01:26,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 7 minutes, 17 seconds)
2026-01-23 01:03:01,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:03:10,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4729.71582 ± 999.879
2026-01-23 01:03:10,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2352.8503, 5092.6343, 5432.5293, 5537.602, 4672.446, 5189.791, 5394.956, 3309.6426, 5279.62, 5035.0854]
2026-01-23 01:03:10,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:03:10,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 5 minutes, 33 seconds)
2026-01-23 01:04:44,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:04:53,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5175.91357 ± 204.985
2026-01-23 01:04:53,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5170.386, 5477.2334, 4881.9297, 5236.7656, 4839.2847, 5209.3364, 5151.3486, 5466.1216, 5304.182, 5022.5474]
2026-01-23 01:04:53,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:04:53,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (5175.91) for latency DatasetOffice
2026-01-23 01:04:53,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 3 minutes, 49 seconds)
2026-01-23 01:06:28,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:06:37,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5213.44287 ± 118.366
2026-01-23 01:06:37,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5166.1333, 5470.1724, 5118.745, 5144.3486, 5319.079, 5267.6074, 5081.4146, 5061.786, 5270.2954, 5234.8496]
2026-01-23 01:06:37,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:06:37,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (5213.44) for latency DatasetOffice
2026-01-23 01:06:37,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 2 minutes, 6 seconds)
2026-01-23 01:08:12,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:08:20,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5011.81641 ± 449.901
2026-01-23 01:08:20,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4680.211, 3949.547, 5102.8916, 5509.824, 5528.614, 5236.0425, 4654.9863, 5229.0576, 5021.6514, 5205.342]
2026-01-23 01:08:20,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:08:20,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 24 seconds)
2026-01-23 01:09:55,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:10:04,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4965.45312 ± 139.312
2026-01-23 01:10:04,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5074.5405, 4946.675, 4861.5347, 4945.71, 4938.44, 4937.937, 5172.1035, 5138.3706, 4655.2876, 4983.9277]
2026-01-23 01:10:04,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:10:04,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 67/100 (estimated time remaining: 58 minutes, 41 seconds)
2026-01-23 01:11:39,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:11:48,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4603.67480 ± 978.181
2026-01-23 01:11:48,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2303.2727, 4843.5786, 5149.9956, 4853.8677, 4727.7114, 5289.746, 5029.471, 3172.0144, 5527.784, 5139.308]
2026-01-23 01:11:48,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:11:48,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 68/100 (estimated time remaining: 56 minutes, 58 seconds)
2026-01-23 01:13:22,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:13:31,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5025.64453 ± 380.048
2026-01-23 01:13:31,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4966.2817, 5308.2285, 5184.0073, 4005.059, 5144.0596, 4939.217, 5255.2188, 4824.609, 5236.182, 5393.581]
2026-01-23 01:13:31,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:13:31,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 69/100 (estimated time remaining: 55 minutes, 14 seconds)
2026-01-23 01:15:06,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:15:15,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5229.75879 ± 180.160
2026-01-23 01:15:15,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5149.203, 5454.1626, 5309.1606, 5189.262, 5326.0215, 5291.6855, 4887.2715, 5293.067, 5447.3384, 4950.409]
2026-01-23 01:15:15,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:15:15,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (5229.76) for latency DatasetOffice
2026-01-23 01:15:15,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 70/100 (estimated time remaining: 53 minutes, 29 seconds)
2026-01-23 01:16:49,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:16:58,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4972.49805 ± 369.914
2026-01-23 01:16:58,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4869.839, 3948.0322, 5211.4106, 4766.245, 5142.759, 5194.2095, 5121.3657, 5188.3877, 5189.992, 5092.7393]
2026-01-23 01:16:58,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:16:58,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 71/100 (estimated time remaining: 51 minutes, 45 seconds)
2026-01-23 01:18:33,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:18:42,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5243.08203 ± 138.433
2026-01-23 01:18:42,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5386.335, 5521.3823, 5066.989, 5133.459, 5236.1216, 5046.419, 5255.708, 5314.6855, 5177.89, 5291.8335]
2026-01-23 01:18:42,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:18:42,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (5243.08) for latency DatasetOffice
2026-01-23 01:18:42,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 72/100 (estimated time remaining: 50 minutes, 2 seconds)
2026-01-23 01:20:16,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:20:25,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4920.70703 ± 515.850
2026-01-23 01:20:25,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4907.7803, 5010.557, 4927.9907, 5093.41, 5124.618, 4796.0513, 5188.1597, 3467.443, 5448.7817, 5242.277]
2026-01-23 01:20:25,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:20:25,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 73/100 (estimated time remaining: 48 minutes, 18 seconds)
2026-01-23 01:22:00,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:22:09,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5135.07227 ± 223.807
2026-01-23 01:22:09,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5113.348, 4847.9185, 5266.7104, 4714.269, 4987.533, 5308.6606, 5452.5137, 5094.187, 5395.88, 5169.7046]
2026-01-23 01:22:09,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:22:09,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 74/100 (estimated time remaining: 46 minutes, 35 seconds)
2026-01-23 01:23:43,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:23:52,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5183.36084 ± 172.407
2026-01-23 01:23:52,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5237.3125, 5287.1655, 5097.7925, 5104.2725, 5438.646, 5285.334, 4835.9644, 4981.7124, 5201.082, 5364.3296]
2026-01-23 01:23:52,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:23:52,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 75/100 (estimated time remaining: 44 minutes, 52 seconds)
2026-01-23 01:25:27,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:25:36,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5116.98730 ± 487.636
2026-01-23 01:25:36,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4987.0205, 3794.7139, 5490.7017, 5262.62, 5390.5176, 5465.3833, 5294.226, 5532.2207, 5053.888, 4898.58]
2026-01-23 01:25:36,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:25:36,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 76/100 (estimated time remaining: 43 minutes, 9 seconds)
2026-01-23 01:27:11,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:27:19,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5209.48730 ± 96.162
2026-01-23 01:27:19,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5077.0264, 5291.0186, 5293.2764, 5120.0996, 5110.2383, 5255.9185, 5327.019, 5206.269, 5089.9077, 5324.1035]
2026-01-23 01:27:19,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:27:19,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 77/100 (estimated time remaining: 41 minutes, 25 seconds)
2026-01-23 01:28:54,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:29:03,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4969.40771 ± 590.862
2026-01-23 01:29:03,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5065.3237, 5144.4507, 5360.8926, 5290.0796, 4810.6016, 4734.994, 5469.1143, 3327.5295, 5338.0356, 5153.059]
2026-01-23 01:29:03,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:29:03,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 78/100 (estimated time remaining: 39 minutes, 42 seconds)
2026-01-23 01:30:38,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:30:46,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5162.63916 ± 175.565
2026-01-23 01:30:46,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5403.443, 5177.807, 5135.8096, 5372.0176, 4837.093, 5113.7407, 5212.939, 5346.203, 5104.421, 4922.914]
2026-01-23 01:30:46,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:30:46,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 79/100 (estimated time remaining: 37 minutes, 57 seconds)
2026-01-23 01:32:21,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:32:30,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5126.65918 ± 248.210
2026-01-23 01:32:30,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5357.016, 4755.8755, 5460.8774, 4810.982, 5422.8613, 4954.305, 4930.0454, 5331.8965, 5032.189, 5210.5444]
2026-01-23 01:32:30,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:32:30,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 80/100 (estimated time remaining: 36 minutes, 13 seconds)
2026-01-23 01:34:05,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:34:13,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5043.82666 ± 494.792
2026-01-23 01:34:13,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4960.3647, 3606.035, 5017.246, 5159.4863, 5318.6406, 5199.9927, 5336.7476, 5347.3477, 5237.537, 5254.8657]
2026-01-23 01:34:13,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:34:13,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 81/100 (estimated time remaining: 34 minutes, 30 seconds)
2026-01-23 01:35:48,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:35:57,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5198.14941 ± 185.665
2026-01-23 01:35:57,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5082.759, 5302.2646, 5484.8867, 4971.751, 5429.9287, 5387.363, 5206.717, 4922.803, 5119.011, 5074.011]
2026-01-23 01:35:57,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:35:57,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 82/100 (estimated time remaining: 32 minutes, 46 seconds)
2026-01-23 01:37:32,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:37:41,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4272.14746 ± 1525.945
2026-01-23 01:37:41,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [1526.7787, 5098.0566, 5191.4966, 5513.3403, 5136.716, 5082.257, 5349.7964, 2905.7117, 1597.9646, 5319.357]
2026-01-23 01:37:41,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:37:41,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 83/100 (estimated time remaining: 31 minutes, 3 seconds)
2026-01-23 01:39:16,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:39:24,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5097.27686 ± 228.818
2026-01-23 01:39:24,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5091.9434, 4704.308, 5325.6206, 4872.746, 4836.0903, 5212.2456, 5402.6416, 4961.4756, 5235.0425, 5330.655]
2026-01-23 01:39:24,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:39:24,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 84/100 (estimated time remaining: 29 minutes, 20 seconds)
2026-01-23 01:40:59,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:41:08,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5309.24121 ± 247.745
2026-01-23 01:41:08,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5384.2954, 5554.44, 5563.017, 4743.887, 5149.221, 5320.728, 5311.254, 5594.123, 5083.603, 5387.844]
2026-01-23 01:41:08,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:41:08,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (5309.24) for latency DatasetOffice
2026-01-23 01:41:08,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 85/100 (estimated time remaining: 27 minutes, 36 seconds)
2026-01-23 01:42:42,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:42:51,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4955.01904 ± 431.769
2026-01-23 01:42:51,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4586.322, 3799.649, 5312.5337, 4981.527, 5221.133, 5113.5317, 5045.739, 5089.3257, 5306.1797, 5094.253]
2026-01-23 01:42:51,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:42:51,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 86/100 (estimated time remaining: 25 minutes, 53 seconds)
2026-01-23 01:44:26,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:44:35,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5303.13623 ± 206.404
2026-01-23 01:44:35,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5633.2104, 5427.644, 5204.1997, 5262.384, 4811.6523, 5367.107, 5462.431, 5316.6323, 5365.541, 5180.558]
2026-01-23 01:44:35,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:44:35,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 87/100 (estimated time remaining: 24 minutes, 9 seconds)
2026-01-23 01:46:09,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:46:18,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5105.30859 ± 557.268
2026-01-23 01:46:18,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5174.1714, 5375.756, 5586.69, 5059.9077, 5389.335, 4970.3325, 5493.47, 3552.719, 5454.363, 4996.342]
2026-01-23 01:46:18,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:46:18,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 88/100 (estimated time remaining: 22 minutes, 25 seconds)
2026-01-23 01:47:53,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:48:01,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5290.75244 ± 175.269
2026-01-23 01:48:01,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5249.501, 5405.671, 5252.8467, 5068.5503, 5330.094, 4932.213, 5466.0073, 5552.4575, 5395.1396, 5255.045]
2026-01-23 01:48:01,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:48:01,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 89/100 (estimated time remaining: 20 minutes, 41 seconds)
2026-01-23 01:49:36,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:49:45,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5242.62451 ± 225.046
2026-01-23 01:49:45,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5496.908, 5497.074, 5038.058, 5420.1787, 5333.6646, 4755.506, 5154.324, 5327.727, 5060.5703, 5342.238]
2026-01-23 01:49:45,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:49:45,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 90/100 (estimated time remaining: 18 minutes, 58 seconds)
2026-01-23 01:51:20,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:51:28,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5127.17725 ± 452.979
2026-01-23 01:51:28,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4529.6343, 4140.2905, 5270.867, 5052.722, 5418.394, 5408.957, 5667.3657, 5310.5674, 5537.461, 4935.5137]
2026-01-23 01:51:28,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:51:28,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 91/100 (estimated time remaining: 17 minutes, 14 seconds)
2026-01-23 01:53:03,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:53:12,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5015.78857 ± 119.240
2026-01-23 01:53:12,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5267.794, 5069.9453, 4854.7666, 5077.7104, 4996.67, 5045.784, 4947.796, 5023.0234, 4819.544, 5054.8555]
2026-01-23 01:53:12,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:53:12,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 92/100 (estimated time remaining: 15 minutes, 30 seconds)
2026-01-23 01:54:47,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:54:56,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5124.97900 ± 569.479
2026-01-23 01:54:56,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5284.789, 5168.7783, 5332.8823, 5398.304, 5180.0527, 5182.4497, 5430.1655, 3444.9087, 5480.085, 5347.373]
2026-01-23 01:54:56,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:54:56,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 48 seconds)
2026-01-23 01:56:31,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:56:39,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5090.06348 ± 184.534
2026-01-23 01:56:39,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5037.612, 5062.655, 5080.216, 5261.5835, 4702.2925, 4880.2207, 5264.4937, 5372.7524, 5142.2847, 5096.5205]
2026-01-23 01:56:39,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:56:39,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 94/100 (estimated time remaining: 12 minutes, 4 seconds)
2026-01-23 01:58:14,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:58:23,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5116.81543 ± 492.976
2026-01-23 01:58:23,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5221.5415, 5335.3315, 5447.812, 4793.71, 5415.809, 5353.5996, 5259.167, 3730.9336, 5316.6606, 5293.593]
2026-01-23 01:58:23,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:58:23,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 21 seconds)
2026-01-23 01:59:58,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:00:07,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5181.81445 ± 389.802
2026-01-23 02:00:07,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5195.2065, 4221.903, 4935.3755, 5276.3853, 5453.8125, 5470.006, 5463.803, 5200.3647, 4937.9893, 5663.2954]
2026-01-23 02:00:07,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:00:07,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 38 seconds)
2026-01-23 02:01:41,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:01:50,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5348.47168 ± 110.534
2026-01-23 02:01:50,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5329.843, 5437.903, 5372.709, 5442.044, 5168.8037, 5268.771, 5466.542, 5395.4673, 5150.884, 5451.747]
2026-01-23 02:01:50,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:01:50,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (5348.47) for latency DatasetOffice
2026-01-23 02:01:50,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 54 seconds)
2026-01-23 02:03:25,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:03:33,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4938.88574 ± 499.080
2026-01-23 02:03:33,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5039.516, 5038.9277, 4946.1396, 5166.079, 4905.682, 4924.562, 5467.214, 3520.4795, 5175.162, 5205.095]
2026-01-23 02:03:33,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:03:33,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 10 seconds)
2026-01-23 02:05:08,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:05:17,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5316.95898 ± 162.091
2026-01-23 02:05:17,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5395.2183, 5338.2188, 5344.4634, 5488.9604, 5125.104, 5537.108, 5326.0244, 5301.6123, 5367.0225, 4945.855]
2026-01-23 02:05:17,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:05:17,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 26 seconds)
2026-01-23 02:06:51,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:07:00,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5275.13477 ± 169.255
2026-01-23 02:07:00,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5100.429, 5586.8916, 5375.607, 5043.906, 5327.7954, 5438.9854, 5038.4316, 5256.619, 5357.219, 5225.4644]
2026-01-23 02:07:00,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:07:00,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 43 seconds)
2026-01-23 02:08:35,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:08:43,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4920.11572 ± 349.807
2026-01-23 02:08:43,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4603.769, 4059.9622, 5245.6753, 4845.393, 5289.1255, 5227.602, 5121.6646, 4933.4697, 4933.2793, 4941.213]
2026-01-23 02:08:43,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:08:43,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1299 [DEBUG]: Training session finished
