2026-01-22 23:14:27,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-halfcheetah/DatasetOffice-bpql-mem2
2026-01-22 23:14:27,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-halfcheetah/DatasetOffice-bpql-mem2
2026-01-22 23:14:27,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x14c719c53110>}
2026-01-22 23:14:27,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1159 [DEBUG]: using device: cuda
2026-01-22 23:14:27,994 baseline-bpql-noisy-halfcheetah:77 [WARNING]: args.assumed_delay != args.horizon: 2 != 32
2026-01-22 23:14:27,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1181 [INFO]: Creating new trainer
2026-01-22 23:14:28,011 baseline-bpql-noisy-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=29, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2026-01-22 23:14:28,011 baseline-bpql-noisy-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-22 23:14:28,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1242 [DEBUG]: Starting training session...
2026-01-22 23:14:28,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 1/100
2026-01-22 23:16:00,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:16:08,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: -182.95328 ± 24.028
2026-01-22 23:16:08,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [-190.2386, -199.62088, -137.41045, -183.74965, -178.829, -203.62021, -169.93146, -172.4185, -163.26642, -230.44754]
2026-01-22 23:16:08,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:16:08,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (-182.95) for latency DatasetOffice
2026-01-22 23:16:08,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 45 minutes, 13 seconds)
2026-01-22 23:17:45,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:17:54,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 154.43098 ± 144.459
2026-01-22 23:17:54,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [-82.23544, 101.44794, 191.7615, 85.6996, -0.7252329, 243.54607, 363.28558, 404.7467, 156.18465, 80.59835]
2026-01-22 23:17:54,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:17:54,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (154.43) for latency DatasetOffice
2026-01-22 23:17:54,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 47 minutes, 37 seconds)
2026-01-22 23:19:30,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:19:39,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 653.76904 ± 356.284
2026-01-22 23:19:39,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [535.6069, 908.48895, 680.7969, 1256.7997, 236.38962, 1011.22925, 556.1584, 332.4741, 79.75364, 939.9932]
2026-01-22 23:19:39,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:19:39,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (653.77) for latency DatasetOffice
2026-01-22 23:19:39,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 47 minutes, 13 seconds)
2026-01-22 23:21:15,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:21:24,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1166.91785 ± 1049.279
2026-01-22 23:21:24,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [44.301872, -289.96103, 2665.9453, 2797.2144, 272.00226, 1289.5725, 1157.8677, 615.93726, 753.02905, 2363.2698]
2026-01-22 23:21:24,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:21:24,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (1166.92) for latency DatasetOffice
2026-01-22 23:21:24,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 46 minutes, 9 seconds)
2026-01-22 23:23:00,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:23:09,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2794.68115 ± 218.717
2026-01-22 23:23:09,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2564.4797, 3044.516, 3052.0576, 2679.9756, 2493.3213, 2954.8855, 2481.2825, 2816.7014, 3056.0176, 2803.5725]
2026-01-22 23:23:09,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:23:09,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (2794.68) for latency DatasetOffice
2026-01-22 23:23:09,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 44 minutes, 49 seconds)
2026-01-22 23:24:45,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:24:54,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1783.65625 ± 1173.663
2026-01-22 23:24:54,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [1142.3779, 2918.563, 2029.5012, 666.9109, 3338.6257, 1792.5695, 3378.9495, -90.998795, 2299.3594, 360.7032]
2026-01-22 23:24:54,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:24:54,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 44 minutes, 38 seconds)
2026-01-22 23:26:30,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:26:39,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1428.97314 ± 1279.581
2026-01-22 23:26:39,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [362.78076, 2601.2263, 2490.482, 1383.3632, 42.55895, 479.2808, 356.83105, 74.833786, 3636.3882, 2861.986]
2026-01-22 23:26:39,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:26:39,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 42 minutes, 53 seconds)
2026-01-22 23:28:15,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:28:24,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3123.45850 ± 606.037
2026-01-22 23:28:24,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2512.099, 3544.5098, 3575.6165, 3215.671, 2788.6702, 3638.7937, 3776.9612, 2165.8787, 3769.9534, 2246.4316]
2026-01-22 23:28:24,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:28:24,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (3123.46) for latency DatasetOffice
2026-01-22 23:28:24,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 41 minutes, 9 seconds)
2026-01-22 23:30:00,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:30:09,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4046.46045 ± 474.915
2026-01-22 23:30:09,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4099.821, 4328.587, 4139.5337, 4119.436, 4073.4587, 4419.344, 4031.111, 2669.6956, 4252.252, 4331.3647]
2026-01-22 23:30:09,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:30:09,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (4046.46) for latency DatasetOffice
2026-01-22 23:30:09,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 39 minutes, 25 seconds)
2026-01-22 23:31:46,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:31:54,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4095.52271 ± 714.674
2026-01-22 23:31:54,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4150.3154, 4400.5737, 4449.1704, 4241.494, 2046.8986, 4393.1064, 3806.0886, 4412.023, 4632.9155, 4422.6455]
2026-01-22 23:31:54,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:31:54,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (4095.52) for latency DatasetOffice
2026-01-22 23:31:54,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 37 minutes, 38 seconds)
2026-01-22 23:33:31,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:33:39,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4441.52441 ± 301.785
2026-01-22 23:33:39,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3844.5996, 3876.2068, 4647.3887, 4442.6094, 4664.38, 4660.7812, 4614.746, 4683.8994, 4465.774, 4514.863]
2026-01-22 23:33:39,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:33:39,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (4441.52) for latency DatasetOffice
2026-01-22 23:33:39,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 35 minutes, 51 seconds)
2026-01-22 23:35:16,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:35:24,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5021.84473 ± 129.534
2026-01-22 23:35:24,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5160.599, 4915.83, 4789.161, 5140.7666, 5144.499, 5099.0176, 4913.552, 5028.6196, 4878.8657, 5147.536]
2026-01-22 23:35:24,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:35:24,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (5021.84) for latency DatasetOffice
2026-01-22 23:35:24,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 34 minutes, 6 seconds)
2026-01-22 23:37:01,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:37:09,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4524.59082 ± 1435.004
2026-01-22 23:37:09,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4747.6797, 5094.151, 5255.7793, 5287.088, 687.2716, 4928.3384, 5496.163, 3111.6216, 5242.079, 5395.7383]
2026-01-22 23:37:09,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:37:09,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 32 minutes, 19 seconds)
2026-01-22 23:38:46,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:38:55,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5307.08350 ± 129.988
2026-01-22 23:38:55,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5379.609, 5422.655, 5228.058, 5126.4263, 5050.651, 5480.2163, 5412.463, 5372.8965, 5289.7603, 5308.0996]
2026-01-22 23:38:55,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:38:55,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (5307.08) for latency DatasetOffice
2026-01-22 23:38:55,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 30 minutes, 34 seconds)
2026-01-22 23:40:31,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:40:40,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5317.01611 ± 190.314
2026-01-22 23:40:40,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5340.4595, 5575.393, 5588.918, 4948.201, 5362.227, 5398.921, 5072.099, 5388.465, 5275.0205, 5220.454]
2026-01-22 23:40:40,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:40:40,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (5317.02) for latency DatasetOffice
2026-01-22 23:40:40,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 28 minutes, 50 seconds)
2026-01-22 23:42:16,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:42:25,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5363.67285 ± 398.569
2026-01-22 23:42:25,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4801.3403, 4520.401, 5669.264, 5294.9863, 5618.485, 5400.9873, 5890.4326, 5709.041, 5371.231, 5360.5615]
2026-01-22 23:42:25,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:42:25,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (5363.67) for latency DatasetOffice
2026-01-22 23:42:25,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 27 minutes, 8 seconds)
2026-01-22 23:44:01,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:44:10,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5921.99854 ± 128.403
2026-01-22 23:44:10,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6017.731, 6172.715, 5930.577, 5851.5645, 5963.6865, 5864.0815, 5788.04, 5980.7456, 5679.3516, 5971.4937]
2026-01-22 23:44:10,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:44:10,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (5922.00) for latency DatasetOffice
2026-01-22 23:44:10,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 25 minutes, 21 seconds)
2026-01-22 23:45:46,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:45:55,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5720.62012 ± 779.662
2026-01-22 23:45:55,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5478.03, 6167.4106, 6031.937, 5943.2905, 5618.3115, 6047.7563, 6169.911, 3482.1067, 6166.6113, 6100.834]
2026-01-22 23:45:55,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:45:55,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 23 minutes, 39 seconds)
2026-01-22 23:47:31,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:47:40,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6165.85156 ± 226.467
2026-01-22 23:47:40,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6247.6035, 6148.2544, 6248.228, 5998.795, 5588.427, 6430.8276, 6171.31, 6388.3613, 6297.4087, 6139.2993]
2026-01-22 23:47:40,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:47:40,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (6165.85) for latency DatasetOffice
2026-01-22 23:47:40,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 21 minutes, 54 seconds)
2026-01-22 23:49:16,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:49:25,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6033.50049 ± 370.817
2026-01-22 23:49:25,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5848.1543, 6423.122, 6296.6978, 5076.498, 6202.5093, 6232.76, 5761.826, 6193.7607, 6216.696, 6082.979]
2026-01-22 23:49:25,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:49:25,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 20 minutes, 8 seconds)
2026-01-22 23:51:01,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:51:10,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5877.82471 ± 452.155
2026-01-22 23:51:10,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5084.031, 4934.697, 6244.043, 5913.752, 6159.5815, 6081.6665, 6147.489, 5851.875, 6103.699, 6257.4155]
2026-01-22 23:51:10,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:51:10,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 18 minutes, 21 seconds)
2026-01-22 23:52:46,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:52:55,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5902.00684 ± 1234.059
2026-01-22 23:52:55,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6271.9736, 6434.9175, 6139.066, 6620.4287, 6500.9785, 2246.8892, 6368.3013, 6106.471, 5921.5376, 6409.505]
2026-01-22 23:52:55,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:52:55,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 16 minutes, 36 seconds)
2026-01-22 23:54:31,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:54:40,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6011.86719 ± 673.916
2026-01-22 23:54:40,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6068.3613, 6179.9487, 6605.663, 6286.4595, 5833.5146, 6114.56, 6284.1904, 4077.542, 6244.09, 6424.3413]
2026-01-22 23:54:40,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:54:40,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 14 minutes, 48 seconds)
2026-01-22 23:56:16,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:56:25,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6183.05713 ± 206.163
2026-01-22 23:56:25,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6156.7363, 6416.791, 6077.6147, 5897.7856, 5781.137, 6344.3013, 6172.4966, 6294.1494, 6456.3813, 6233.1733]
2026-01-22 23:56:25,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:56:25,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (6183.06) for latency DatasetOffice
2026-01-22 23:56:25,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 12 minutes, 57 seconds)
2026-01-22 23:58:00,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:58:09,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6308.35693 ± 115.378
2026-01-22 23:58:09,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6125.694, 6287.279, 6398.1455, 6200.8965, 6481.735, 6304.5347, 6160.4785, 6344.464, 6471.002, 6309.343]
2026-01-22 23:58:09,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:58:09,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (6308.36) for latency DatasetOffice
2026-01-22 23:58:09,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 10 minutes, 53 seconds)
2026-01-22 23:59:44,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:59:52,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6098.59375 ± 490.324
2026-01-22 23:59:52,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5456.3613, 5167.062, 6406.4185, 6101.191, 6384.974, 6415.7056, 6577.6895, 6359.123, 6581.22, 5536.196]
2026-01-22 23:59:52,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:59:52,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 8 minutes, 46 seconds)
2026-01-23 00:01:27,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:01:36,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6074.60449 ± 1736.896
2026-01-23 00:01:36,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6848.8115, 6820.9546, 6547.7744, 6383.635, 6630.2827, 884.9818, 6850.92, 6637.651, 6429.5522, 6711.486]
2026-01-23 00:01:36,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:01:36,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 6 minutes, 38 seconds)
2026-01-23 00:03:10,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:03:19,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5858.83350 ± 1047.029
2026-01-23 00:03:19,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6054.288, 6420.3066, 5831.738, 6254.1978, 5781.9863, 6309.7036, 6448.85, 2797.3574, 6502.484, 6187.424]
2026-01-23 00:03:19,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:03:19,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 4 minutes, 24 seconds)
2026-01-23 00:04:53,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:05:01,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6538.74756 ± 307.006
2026-01-23 00:05:01,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6660.635, 6658.42, 6243.586, 6353.2173, 5906.06, 7027.84, 6675.6743, 6451.9233, 6517.0527, 6893.07]
2026-01-23 00:05:01,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:05:01,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (6538.75) for latency DatasetOffice
2026-01-23 00:05:01,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 2 minutes, 11 seconds)
2026-01-23 00:06:36,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:06:44,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5888.79736 ± 1574.597
2026-01-23 00:06:44,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6128.7734, 6562.8066, 6225.3516, 6444.859, 6544.069, 1194.483, 6117.3857, 6594.065, 6552.6562, 6523.5195]
2026-01-23 00:06:44,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:06:44,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 17 seconds)
2026-01-23 00:08:18,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:08:27,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6333.91553 ± 537.990
2026-01-23 00:08:27,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5276.005, 5354.3916, 6764.9917, 6265.09, 6473.198, 6692.979, 6580.4507, 6386.837, 6891.5977, 6653.612]
2026-01-23 00:08:27,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:08:27,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 58 minutes, 24 seconds)
2026-01-23 00:10:01,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:10:09,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6663.02734 ± 167.256
2026-01-23 00:10:09,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6478.5493, 6867.019, 6487.9326, 6686.3936, 6875.177, 6686.906, 6751.277, 6691.625, 6335.4785, 6769.916]
2026-01-23 00:10:09,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:10:09,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (6663.03) for latency DatasetOffice
2026-01-23 00:10:09,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 56 minutes, 26 seconds)
2026-01-23 00:11:43,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:11:51,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6304.34277 ± 771.042
2026-01-23 00:11:51,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6090.6104, 6672.285, 6604.1025, 6734.008, 6026.46, 6370.2583, 6925.809, 4149.9316, 6803.3877, 6666.577]
2026-01-23 00:11:51,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:11:51,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 54 minutes, 33 seconds)
2026-01-23 00:13:25,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:13:34,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6516.57471 ± 593.056
2026-01-23 00:13:34,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6845.63, 6530.3994, 6685.341, 6462.5996, 6425.7705, 6848.8906, 6842.054, 6795.4424, 4809.9834, 6919.6353]
2026-01-23 00:13:34,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:13:34,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 52 minutes, 42 seconds)
2026-01-23 00:15:07,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:15:16,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6675.16406 ± 206.097
2026-01-23 00:15:16,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6717.599, 6911.7783, 6851.8965, 6494.3833, 6773.407, 6553.4917, 6176.094, 6797.2993, 6679.724, 6795.969]
2026-01-23 00:15:16,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:15:16,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (6675.16) for latency DatasetOffice
2026-01-23 00:15:16,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 50 minutes, 50 seconds)
2026-01-23 00:16:49,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:16:58,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6457.32959 ± 524.427
2026-01-23 00:16:58,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5457.144, 5409.912, 6853.896, 6646.693, 6826.398, 6699.8096, 6756.5796, 6590.0117, 6853.585, 6479.2637]
2026-01-23 00:16:58,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:16:58,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 49 minutes)
2026-01-23 00:18:32,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:18:40,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6818.29053 ± 152.818
2026-01-23 00:18:40,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6883.4033, 7016.102, 6610.316, 6810.1943, 7014.462, 6724.428, 6922.201, 6709.7295, 6563.6504, 6928.42]
2026-01-23 00:18:40,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:18:40,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (6818.29) for latency DatasetOffice
2026-01-23 00:18:40,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 47 minutes, 13 seconds)
2026-01-23 00:20:14,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:20:22,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6469.12451 ± 752.306
2026-01-23 00:20:22,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6531.9907, 6714.9014, 6941.1714, 6925.916, 6286.5205, 6707.0312, 6999.8413, 4296.6636, 6567.0576, 6720.149]
2026-01-23 00:20:22,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:20:22,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 45 minutes, 30 seconds)
2026-01-23 00:21:56,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:22:04,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6681.42090 ± 220.265
2026-01-23 00:22:04,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6741.978, 6799.9907, 6371.211, 6707.195, 6257.636, 6867.318, 6648.84, 6775.501, 6591.178, 7053.3594]
2026-01-23 00:22:04,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:22:04,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 43 minutes, 47 seconds)
2026-01-23 00:23:38,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:23:46,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6744.97754 ± 153.381
2026-01-23 00:23:46,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6641.802, 6843.762, 6887.1475, 6508.752, 6797.8433, 6844.747, 6499.9346, 6650.656, 6978.3535, 6796.772]
2026-01-23 00:23:46,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:23:46,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 42 minutes, 5 seconds)
2026-01-23 00:25:20,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:25:28,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6508.11084 ± 517.126
2026-01-23 00:25:28,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5502.425, 5497.842, 6821.669, 6694.829, 6831.4565, 6582.8433, 6893.9956, 6917.5825, 6546.082, 6792.383]
2026-01-23 00:25:28,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:25:28,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 40 minutes, 21 seconds)
2026-01-23 00:27:02,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:27:10,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6777.27881 ± 139.001
2026-01-23 00:27:10,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6822.8403, 6927.3813, 6686.313, 6924.6465, 6886.6636, 6826.9565, 6846.7983, 6670.8604, 6452.324, 6728.007]
2026-01-23 00:27:10,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:27:10,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 38 minutes, 39 seconds)
2026-01-23 00:28:44,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:28:53,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5963.84863 ± 1921.233
2026-01-23 00:28:53,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6542.271, 6937.1763, 6834.053, 6889.361, 748.4772, 6716.572, 7029.713, 4138.7446, 7013.076, 6789.0396]
2026-01-23 00:28:53,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:28:53,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 37 minutes)
2026-01-23 00:30:26,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:30:35,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6823.23291 ± 196.484
2026-01-23 00:30:35,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6836.0117, 6821.0107, 6715.5166, 6773.372, 6451.0044, 7052.783, 6619.6895, 7178.7515, 6932.402, 6851.782]
2026-01-23 00:30:35,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:30:35,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (6823.23) for latency DatasetOffice
2026-01-23 00:30:35,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 35 minutes, 18 seconds)
2026-01-23 00:32:08,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:32:17,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6836.36182 ± 184.489
2026-01-23 00:32:17,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6962.8584, 6893.8535, 6843.0845, 6616.92, 7020.161, 6596.145, 6491.558, 6970.5625, 6965.0527, 7003.425]
2026-01-23 00:32:17,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:32:17,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (6836.36) for latency DatasetOffice
2026-01-23 00:32:17,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 33 minutes, 33 seconds)
2026-01-23 00:33:50,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:33:59,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6677.89551 ± 465.369
2026-01-23 00:33:59,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6028.0605, 5672.3286, 7068.661, 6404.4707, 7183.4995, 6849.7583, 6818.9043, 6788.8403, 6949.166, 7015.268]
2026-01-23 00:33:59,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:33:59,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 31 minutes, 52 seconds)
2026-01-23 00:35:32,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:35:41,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7014.35791 ± 94.127
2026-01-23 00:35:41,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7015.707, 7051.7764, 7054.431, 7027.0156, 7078.353, 6867.6177, 7138.486, 6812.899, 7021.64, 7075.6504]
2026-01-23 00:35:41,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:35:41,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (7014.36) for latency DatasetOffice
2026-01-23 00:35:41,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 30 minutes, 10 seconds)
2026-01-23 00:37:14,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:37:23,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6594.64990 ± 816.074
2026-01-23 00:37:23,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6407.3833, 6979.0933, 7105.8843, 7050.726, 6184.5044, 6923.128, 7030.719, 4305.9707, 6920.745, 7038.3447]
2026-01-23 00:37:23,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:37:23,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 28 minutes, 25 seconds)
2026-01-23 00:38:56,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:39:05,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6218.42383 ± 1468.867
2026-01-23 00:39:05,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6908.4185, 6686.105, 6372.519, 1840.6106, 6541.5283, 6902.442, 6821.1226, 6614.6562, 6611.352, 6885.485]
2026-01-23 00:39:05,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:39:05,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 26 minutes, 43 seconds)
2026-01-23 00:40:38,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:40:47,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6773.58838 ± 199.276
2026-01-23 00:40:47,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6537.9126, 6838.6377, 6868.4395, 6493.1646, 6974.3315, 6947.8994, 6425.4185, 6942.9746, 6943.879, 6763.2256]
2026-01-23 00:40:47,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:40:47,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 25 minutes, 1 second)
2026-01-23 00:42:20,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:42:29,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6028.42480 ± 2034.654
2026-01-23 00:42:29,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5888.737, 16.64756, 6876.7476, 6666.9043, 6966.456, 6803.3706, 6895.9385, 6668.317, 7211.234, 6289.8965]
2026-01-23 00:42:29,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:42:29,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 23 minutes, 21 seconds)
2026-01-23 00:44:03,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:44:11,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7032.04395 ± 143.491
2026-01-23 00:44:11,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6689.7324, 7032.288, 7148.882, 7117.0747, 6993.0396, 6983.5713, 7175.559, 7137.3247, 6894.8677, 7148.0947]
2026-01-23 00:44:11,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:44:11,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (7032.04) for latency DatasetOffice
2026-01-23 00:44:11,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 21 minutes, 38 seconds)
2026-01-23 00:45:45,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:45:53,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6578.53760 ± 847.333
2026-01-23 00:45:53,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6464.1943, 6766.4966, 7056.39, 6884.5796, 6215.8594, 6926.302, 7216.1997, 4184.628, 7067.374, 7003.349]
2026-01-23 00:45:53,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:45:53,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 19 minutes, 57 seconds)
2026-01-23 00:47:27,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:47:35,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6791.36621 ± 830.382
2026-01-23 00:47:35,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7085.6777, 7124.7583, 6862.414, 6986.814, 4317.7417, 7188.1567, 7153.7856, 7035.091, 7182.4434, 6976.7783]
2026-01-23 00:47:35,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:47:35,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 18 minutes, 15 seconds)
2026-01-23 00:49:09,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:49:17,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6859.31104 ± 247.255
2026-01-23 00:49:17,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6614.3076, 7122.547, 6818.2324, 6564.9307, 7007.552, 6810.9033, 6427.2007, 7137.106, 7180.057, 6910.27]
2026-01-23 00:49:17,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:49:17,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 16 minutes, 35 seconds)
2026-01-23 00:50:51,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:50:59,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6693.82129 ± 425.236
2026-01-23 00:50:59,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5982.1353, 5753.0645, 7030.71, 6787.7715, 6941.7183, 6939.406, 6746.402, 7024.859, 6887.972, 6844.1753]
2026-01-23 00:50:59,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:50:59,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 14 minutes, 52 seconds)
2026-01-23 00:52:33,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:52:42,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7133.27637 ± 95.514
2026-01-23 00:52:42,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7159.171, 7163.857, 7200.2085, 7116.774, 7192.0493, 7275.87, 6983.1367, 7127.7427, 6941.0728, 7172.8823]
2026-01-23 00:52:42,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:52:42,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (7133.28) for latency DatasetOffice
2026-01-23 00:52:42,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 13 minutes, 11 seconds)
2026-01-23 00:54:15,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:54:24,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6719.43604 ± 775.533
2026-01-23 00:54:24,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6640.6226, 6926.477, 6998.231, 7096.6665, 6597.2104, 6970.7563, 7296.0713, 4478.453, 6970.8506, 7219.018]
2026-01-23 00:54:24,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:54:24,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 11 minutes, 28 seconds)
2026-01-23 00:55:57,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:56:06,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6762.67822 ± 219.048
2026-01-23 00:56:06,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6982.624, 6985.9736, 6675.288, 6468.1685, 6414.4165, 6863.636, 6801.1577, 7123.148, 6684.718, 6627.6445]
2026-01-23 00:56:06,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:56:06,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 9 minutes, 46 seconds)
2026-01-23 00:57:39,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:57:48,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7007.24609 ± 133.254
2026-01-23 00:57:48,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7071.3, 7123.2036, 7126.2197, 6736.99, 7019.074, 7109.585, 6877.056, 6912.6646, 7176.5664, 6919.8037]
2026-01-23 00:57:48,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:57:48,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 8 minutes, 3 seconds)
2026-01-23 00:59:21,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:59:30,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6715.62500 ± 563.260
2026-01-23 00:59:30,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5821.7495, 5445.69, 6962.075, 6722.572, 7058.617, 6993.715, 7058.651, 6878.51, 6942.977, 7271.6963]
2026-01-23 00:59:30,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:59:30,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 6 minutes, 22 seconds)
2026-01-23 01:01:04,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:01:12,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7125.40527 ± 108.734
2026-01-23 01:01:12,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7122.93, 7242.468, 7280.946, 7248.1904, 7132.681, 7107.6777, 7066.0635, 7080.824, 6884.687, 7087.585]
2026-01-23 01:01:12,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:01:12,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 4 minutes, 39 seconds)
2026-01-23 01:02:46,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:02:54,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6715.10303 ± 780.034
2026-01-23 01:02:54,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6505.025, 6981.987, 7167.222, 6777.5034, 6557.02, 6893.9736, 7326.4517, 4516.2993, 7207.872, 7217.6763]
2026-01-23 01:02:54,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:02:54,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 2 minutes, 57 seconds)
2026-01-23 01:04:28,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:04:36,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6458.73145 ± 211.082
2026-01-23 01:04:36,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6424.881, 6587.1494, 6241.638, 6547.391, 5943.7026, 6622.559, 6386.246, 6662.1895, 6554.9067, 6616.6484]
2026-01-23 01:04:36,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:04:36,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 1 minute, 15 seconds)
2026-01-23 01:06:10,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:06:18,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6508.64404 ± 941.951
2026-01-23 01:06:18,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6843.2036, 7039.351, 6927.536, 6507.712, 6964.603, 6865.9473, 6482.6333, 6859.484, 6865.301, 3730.6711]
2026-01-23 01:06:18,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:06:18,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 66/100 (estimated time remaining: 59 minutes, 32 seconds)
2026-01-23 01:07:52,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:08:00,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6822.37207 ± 577.102
2026-01-23 01:08:00,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5762.5796, 5637.615, 7312.4785, 6808.9526, 7158.7188, 7240.0024, 7160.542, 7123.4297, 7026.3003, 6993.0967]
2026-01-23 01:08:00,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:08:00,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 67/100 (estimated time remaining: 57 minutes, 50 seconds)
2026-01-23 01:09:34,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:09:42,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7159.57715 ± 111.918
2026-01-23 01:09:42,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7140.103, 7235.6016, 7281.571, 7131.6973, 7078.2373, 7117.1504, 7280.453, 7267.2783, 6897.347, 7166.3315]
2026-01-23 01:09:42,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:09:42,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (7159.58) for latency DatasetOffice
2026-01-23 01:09:42,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 68/100 (estimated time remaining: 56 minutes, 7 seconds)
2026-01-23 01:11:16,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:11:24,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6048.94287 ± 1440.824
2026-01-23 01:11:24,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3966.7654, 5474.504, 6953.1787, 6935.5244, 6074.7397, 6862.9595, 7099.2065, 2835.492, 7173.2793, 7113.782]
2026-01-23 01:11:24,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:11:24,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 69/100 (estimated time remaining: 54 minutes, 25 seconds)
2026-01-23 01:12:58,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:13:06,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7072.98193 ± 167.934
2026-01-23 01:13:06,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7148.711, 7272.6455, 6983.992, 7008.7173, 6739.4316, 7137.279, 7106.0327, 7366.9443, 7034.463, 6931.6006]
2026-01-23 01:13:06,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:13:06,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 70/100 (estimated time remaining: 52 minutes, 43 seconds)
2026-01-23 01:14:40,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:14:48,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7019.73535 ± 210.101
2026-01-23 01:14:48,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7051.469, 7255.3066, 7062.3037, 6680.89, 7173.442, 7115.0493, 6666.229, 6936.557, 7328.3433, 6927.762]
2026-01-23 01:14:48,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:14:48,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 71/100 (estimated time remaining: 51 minutes, 1 second)
2026-01-23 01:16:22,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:16:31,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6878.77979 ± 464.066
2026-01-23 01:16:31,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6096.385, 5864.4434, 7244.1523, 6970.185, 7140.187, 7256.283, 7208.729, 7056.7764, 6993.5894, 6957.067]
2026-01-23 01:16:31,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:16:31,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 72/100 (estimated time remaining: 49 minutes, 19 seconds)
2026-01-23 01:18:04,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:18:13,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7258.59375 ± 119.751
2026-01-23 01:18:13,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7124.173, 7246.076, 7173.7646, 7385.6978, 7420.789, 7165.424, 7297.883, 7371.7944, 7049.085, 7351.251]
2026-01-23 01:18:13,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:18:13,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (7258.59) for latency DatasetOffice
2026-01-23 01:18:13,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 73/100 (estimated time remaining: 47 minutes, 38 seconds)
2026-01-23 01:19:46,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:19:55,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6237.14893 ± 2011.387
2026-01-23 01:19:55,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6628.1387, 7171.1807, 7392.1133, 7359.036, 752.5439, 7067.697, 7303.199, 4419.303, 7141.886, 7136.389]
2026-01-23 01:19:55,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:19:55,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 74/100 (estimated time remaining: 45 minutes, 56 seconds)
2026-01-23 01:21:28,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:21:37,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6414.19531 ± 1924.321
2026-01-23 01:21:37,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6975.868, 7043.801, 6841.7104, 7061.479, 650.3436, 7232.1133, 6958.7827, 7158.7603, 7048.0776, 7171.0137]
2026-01-23 01:21:37,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:21:37,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 75/100 (estimated time remaining: 44 minutes, 14 seconds)
2026-01-23 01:23:10,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:23:19,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7101.44824 ± 176.839
2026-01-23 01:23:19,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6903.199, 7160.512, 7180.5845, 6836.428, 7133.823, 7252.655, 6803.037, 7145.5513, 7289.461, 7309.2334]
2026-01-23 01:23:19,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:23:19,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 76/100 (estimated time remaining: 42 minutes, 31 seconds)
2026-01-23 01:24:52,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:25:01,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6722.01855 ± 529.478
2026-01-23 01:25:01,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5954.6343, 5611.2095, 7231.0757, 6419.647, 6952.293, 6760.941, 7323.2554, 6978.8135, 6990.0894, 6998.2305]
2026-01-23 01:25:01,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:25:01,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 77/100 (estimated time remaining: 40 minutes, 48 seconds)
2026-01-23 01:26:34,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:26:43,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7205.51172 ± 166.872
2026-01-23 01:26:43,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7284.627, 6889.8003, 7215.171, 7355.9536, 6879.4976, 7276.2134, 7278.976, 7244.4956, 7253.7725, 7376.608]
2026-01-23 01:26:43,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:26:43,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 78/100 (estimated time remaining: 39 minutes, 5 seconds)
2026-01-23 01:28:16,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:28:25,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6259.22217 ± 2004.118
2026-01-23 01:28:25,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6843.7065, 6829.37, 7441.7295, 7346.694, 849.6337, 7241.4404, 7504.372, 4346.5527, 6887.8955, 7300.8276]
2026-01-23 01:28:25,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:28:25,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 79/100 (estimated time remaining: 37 minutes, 23 seconds)
2026-01-23 01:29:58,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:30:07,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6846.45703 ± 654.292
2026-01-23 01:30:07,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7073.511, 7076.439, 6919.9946, 6986.3174, 6571.6626, 7239.196, 7121.659, 7369.9775, 4977.6025, 7128.208]
2026-01-23 01:30:07,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:30:07,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 80/100 (estimated time remaining: 35 minutes, 41 seconds)
2026-01-23 01:31:40,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:31:49,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7084.77100 ± 167.346
2026-01-23 01:31:49,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7064.9473, 7341.191, 7070.3364, 6820.5723, 7142.6855, 7150.3203, 6847.377, 6962.64, 7338.493, 7109.146]
2026-01-23 01:31:49,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:31:49,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 81/100 (estimated time remaining: 34 minutes)
2026-01-23 01:33:22,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:33:31,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6944.05859 ± 497.183
2026-01-23 01:33:31,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6205.9204, 5791.1787, 7388.9487, 6981.5044, 7195.5938, 7313.6577, 7265.3047, 6983.951, 7196.143, 7118.379]
2026-01-23 01:33:31,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:33:31,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 82/100 (estimated time remaining: 32 minutes, 19 seconds)
2026-01-23 01:35:05,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:35:13,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6731.76562 ± 174.032
2026-01-23 01:35:13,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6618.1953, 7062.395, 6874.84, 6636.2163, 6502.3296, 6682.1562, 6624.516, 6878.9043, 6542.3027, 6895.7974]
2026-01-23 01:35:13,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:35:13,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 83/100 (estimated time remaining: 30 minutes, 37 seconds)
2026-01-23 01:36:47,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:36:55,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6837.21338 ± 876.604
2026-01-23 01:36:55,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6808.3657, 7161.5005, 7240.9785, 7171.731, 7037.2637, 7043.476, 7118.1606, 4238.1143, 7253.254, 7299.288]
2026-01-23 01:36:55,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:36:55,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 84/100 (estimated time remaining: 28 minutes, 55 seconds)
2026-01-23 01:38:29,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:38:37,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7036.43457 ± 182.240
2026-01-23 01:38:37,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7011.8667, 7029.8467, 6942.3594, 7105.021, 6560.93, 7157.9956, 7028.9165, 7193.9624, 7071.2983, 7262.1465]
2026-01-23 01:38:37,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:38:37,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 85/100 (estimated time remaining: 27 minutes, 13 seconds)
2026-01-23 01:40:11,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:40:19,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6928.97021 ± 186.254
2026-01-23 01:40:19,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6943.2876, 7070.0513, 6963.2153, 6577.1914, 7166.7725, 6846.8066, 6665.1675, 7121.9683, 7087.512, 6847.7344]
2026-01-23 01:40:19,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:40:20,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 86/100 (estimated time remaining: 25 minutes, 31 seconds)
2026-01-23 01:41:53,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:42:02,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6920.99316 ± 462.746
2026-01-23 01:42:02,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6259.4297, 5856.5005, 7088.2256, 6772.1777, 7173.7134, 7283.8794, 7302.7144, 7073.473, 7187.476, 7212.342]
2026-01-23 01:42:02,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:42:02,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 87/100 (estimated time remaining: 23 minutes, 49 seconds)
2026-01-23 01:43:35,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:43:44,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7123.98682 ± 791.324
2026-01-23 01:43:44,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7451.183, 7427.8755, 7146.6953, 7369.723, 7365.3984, 7426.384, 7597.571, 7264.8843, 4774.006, 7416.145]
2026-01-23 01:43:44,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:43:44,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 88/100 (estimated time remaining: 22 minutes, 7 seconds)
2026-01-23 01:45:17,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:45:26,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6756.48926 ± 808.492
2026-01-23 01:45:26,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6809.5166, 7028.8213, 7184.0107, 7019.7275, 6505.4556, 6718.2993, 7192.108, 4459.566, 7193.2886, 7454.0957]
2026-01-23 01:45:26,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:45:26,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 89/100 (estimated time remaining: 20 minutes, 24 seconds)
2026-01-23 01:46:59,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:47:08,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7041.75391 ± 226.692
2026-01-23 01:47:08,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7157.717, 7116.5796, 6654.4253, 6844.1914, 6691.7314, 7062.4136, 7073.993, 7197.155, 7225.685, 7393.6465]
2026-01-23 01:47:08,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:47:08,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 90/100 (estimated time remaining: 18 minutes, 42 seconds)
2026-01-23 01:48:41,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:48:50,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7046.94629 ± 546.067
2026-01-23 01:48:50,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7147.7383, 7463.4795, 7178.55, 5467.426, 7331.348, 7250.251, 6891.449, 7219.2676, 7162.062, 7357.894]
2026-01-23 01:48:50,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:48:50,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 91/100 (estimated time remaining: 17 minutes)
2026-01-23 01:50:23,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:50:32,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6996.83057 ± 579.394
2026-01-23 01:50:32,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5874.6543, 5832.177, 7309.2446, 7059.869, 7376.4087, 7379.616, 7401.039, 7257.0415, 7222.0645, 7256.187]
2026-01-23 01:50:32,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:50:32,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 92/100 (estimated time remaining: 15 minutes, 18 seconds)
2026-01-23 01:52:06,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:52:14,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7283.53223 ± 114.512
2026-01-23 01:52:14,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7261.9624, 7292.837, 7319.665, 7371.54, 7469.0903, 7081.2573, 7217.7847, 7391.1235, 7112.816, 7317.2466]
2026-01-23 01:52:14,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:52:14,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (7283.53) for latency DatasetOffice
2026-01-23 01:52:14,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 36 seconds)
2026-01-23 01:53:48,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:53:56,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6405.91699 ± 1652.722
2026-01-23 01:53:56,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6743.353, 7321.0327, 7563.489, 7377.2744, 6740.844, 7193.803, 7313.699, 4334.8823, 7250.609, 2220.1797]
2026-01-23 01:53:56,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:53:56,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 54 seconds)
2026-01-23 01:55:30,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:55:39,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6627.33496 ± 1578.490
2026-01-23 01:55:39,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7144.6206, 7322.254, 6762.6987, 1940.9166, 6714.7363, 7304.834, 7177.918, 7291.926, 7148.051, 7465.3975]
2026-01-23 01:55:39,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:55:39,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 13 seconds)
2026-01-23 01:57:12,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:57:21,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7168.07275 ± 206.056
2026-01-23 01:57:21,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7065.8784, 7422.8643, 7320.7183, 6853.315, 7349.45, 6989.1963, 6822.9834, 7279.8896, 7260.799, 7315.6353]
2026-01-23 01:57:21,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:57:21,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 31 seconds)
2026-01-23 01:58:55,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:59:03,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6922.54297 ± 517.182
2026-01-23 01:59:03,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6155.315, 5714.691, 7237.6865, 6964.6465, 7215.515, 7338.3394, 7188.7573, 6943.713, 7271.216, 7195.5503]
2026-01-23 01:59:03,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:59:03,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 49 seconds)
2026-01-23 02:00:37,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:00:45,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7350.90625 ± 137.984
2026-01-23 02:00:45,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7440.824, 7424.5396, 7471.509, 7314.786, 7356.9062, 7325.4336, 7325.8706, 7367.5205, 6980.816, 7500.8584]
2026-01-23 02:00:45,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:00:45,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (7350.91) for latency DatasetOffice
2026-01-23 02:00:45,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 6 seconds)
2026-01-23 02:02:19,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:02:28,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6535.24121 ± 1528.950
2026-01-23 02:02:28,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3995.9604, 7312.4434, 7310.059, 7371.1484, 6741.4375, 7381.5327, 7333.9917, 3063.1287, 7381.833, 7460.881]
2026-01-23 02:02:28,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:02:28,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 24 seconds)
2026-01-23 02:04:01,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:04:10,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7304.11035 ± 205.858
2026-01-23 02:04:10,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7305.991, 7268.5337, 7150.5615, 7265.2407, 6894.6924, 7454.7847, 7407.7085, 7588.6777, 7117.1836, 7587.7266]
2026-01-23 02:04:10,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:04:10,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 42 seconds)
2026-01-23 02:05:43,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:05:51,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7181.67822 ± 222.887
2026-01-23 02:05:51,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7435.712, 7164.8535, 7346.4233, 6951.0156, 7430.3364, 7299.383, 6689.478, 7307.0625, 7055.253, 7137.261]
2026-01-23 02:05:51,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:05:51,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1299 [DEBUG]: Training session finished
