2026-01-22 23:14:17,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-halfcheetah/DatasetOffice-bpql-mem1  
2026-01-22 23:14:17,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-halfcheetah/DatasetOffice-bpql-mem1  
2026-01-22 23:14:17,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x14a2d717f4d0>}
2026-01-22 23:14:17,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1159 [DEBUG]: using device: cuda
2026-01-22 23:14:17,843 baseline-bpql-noisy-halfcheetah:77 [WARNING]: args.assumed_delay != args.horizon: 1 != 32
2026-01-22 23:14:17,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1181 [INFO]: Creating new trainer
2026-01-22 23:14:17,861 baseline-bpql-noisy-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=23, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2026-01-22 23:14:17,861 baseline-bpql-noisy-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-22 23:14:18,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1242 [DEBUG]: Starting training session...
2026-01-22 23:14:18,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 1/100
2026-01-22 23:15:49,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:15:58,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: -534.67450 ± 72.248
2026-01-22 23:15:58,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [-466.54752, -547.9507, -546.5811, -370.3187, -512.4085, -533.8186, -538.51385, -587.45276, -591.3715, -651.78186]
2026-01-22 23:15:58,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:15:58,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (-534.67) for latency DatasetOffice
2026-01-22 23:15:58,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 44 minutes, 48 seconds)
2026-01-22 23:17:34,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:17:43,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 427.95639 ± 189.468
2026-01-22 23:17:43,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [740.62634, 458.20328, 397.02997, 369.58508, 28.316008, 447.97784, 545.2227, 656.1636, 263.84036, 372.59927]
2026-01-22 23:17:43,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:17:43,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (427.96) for latency DatasetOffice
2026-01-22 23:17:43,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 47 minutes, 25 seconds)
2026-01-22 23:19:19,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:19:28,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1222.31128 ± 672.408
2026-01-22 23:19:28,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [1449.5048, 1842.3711, 750.8475, 204.00012, 1376.4766, 2140.5112, 941.7767, 1450.3271, 96.29691, 1971.0012]
2026-01-22 23:19:28,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:19:28,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (1222.31) for latency DatasetOffice
2026-01-22 23:19:28,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 47 minutes)
2026-01-22 23:21:04,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:21:13,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2012.57458 ± 812.348
2026-01-22 23:21:13,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [-248.49731, 2451.1995, 1397.0145, 2210.049, 2327.3218, 2401.222, 2538.5767, 2283.2925, 2349.902, 2415.6658]
2026-01-22 23:21:13,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:21:13,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (2012.57) for latency DatasetOffice
2026-01-22 23:21:13,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 46 minutes)
2026-01-22 23:22:49,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:22:58,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3014.27759 ± 114.434
2026-01-22 23:22:58,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3126.364, 2930.7502, 2801.9172, 2975.6267, 2927.6863, 2960.9126, 3005.9336, 3213.4524, 3102.812, 3097.3206]
2026-01-22 23:22:58,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:22:58,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (3014.28) for latency DatasetOffice
2026-01-22 23:22:58,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 44 minutes, 39 seconds)
2026-01-22 23:24:34,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:24:43,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3075.33716 ± 623.627
2026-01-22 23:24:43,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3375.2356, 3430.3975, 3137.3418, 3154.797, 3419.6995, 3211.1545, 1264.951, 3393.3455, 3432.2214, 2934.2278]
2026-01-22 23:24:43,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:24:43,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (3075.34) for latency DatasetOffice
2026-01-22 23:24:43,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 44 minutes, 31 seconds)
2026-01-22 23:26:19,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:26:28,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3131.44800 ± 674.809
2026-01-22 23:26:28,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3854.846, 3068.689, 2809.69, 2689.3525, 2127.0103, 3907.8252, 3554.3877, 3963.7068, 3308.8, 2030.1727]
2026-01-22 23:26:28,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:26:28,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (3131.45) for latency DatasetOffice
2026-01-22 23:26:28,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 42 minutes, 44 seconds)
2026-01-22 23:28:04,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:28:13,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3478.36572 ± 92.140
2026-01-22 23:28:13,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3407.9617, 3456.9673, 3411.921, 3469.3738, 3376.4878, 3710.5535, 3506.906, 3544.418, 3493.451, 3405.6172]
2026-01-22 23:28:13,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:28:13,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (3478.37) for latency DatasetOffice
2026-01-22 23:28:13,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 40 minutes, 59 seconds)
2026-01-22 23:29:49,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:29:58,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3962.56055 ± 226.062
2026-01-22 23:29:58,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4105.159, 4094.4563, 3954.3228, 3417.183, 3744.0, 3873.0833, 4132.415, 4213.4775, 3960.9631, 4130.5415]
2026-01-22 23:29:58,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:29:58,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (3962.56) for latency DatasetOffice
2026-01-22 23:29:58,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 39 minutes, 11 seconds)
2026-01-22 23:31:34,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:31:43,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4042.20117 ± 135.232
2026-01-22 23:31:43,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4001.712, 3790.4373, 3908.9954, 4249.3735, 3964.6777, 4012.8486, 4008.6655, 4102.869, 4167.7563, 4214.6753]
2026-01-22 23:31:43,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:31:43,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (4042.20) for latency DatasetOffice
2026-01-22 23:31:43,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 37 minutes, 29 seconds)
2026-01-22 23:33:19,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:33:28,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4250.37109 ± 208.602
2026-01-22 23:33:28,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4100.638, 4480.7173, 4057.8372, 4093.5972, 4215.3823, 4378.3687, 4405.854, 4479.204, 4447.811, 3844.3015]
2026-01-22 23:33:28,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:33:28,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (4250.37) for latency DatasetOffice
2026-01-22 23:33:28,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 35 minutes, 43 seconds)
2026-01-22 23:35:04,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:35:13,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3812.62573 ± 501.227
2026-01-22 23:35:13,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3922.2615, 3927.5452, 4280.723, 3695.134, 2402.9224, 3938.151, 4083.3315, 4073.938, 3684.4607, 4117.7876]
2026-01-22 23:35:13,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:35:13,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 33 minutes, 56 seconds)
2026-01-22 23:36:49,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:36:58,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4169.33496 ± 170.702
2026-01-22 23:36:58,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4186.4194, 3778.7703, 4132.8643, 4337.6357, 4077.0125, 4021.914, 4196.574, 4324.9287, 4386.169, 4251.0615]
2026-01-22 23:36:58,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:36:58,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 32 minutes, 13 seconds)
2026-01-22 23:38:34,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:38:43,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4131.06006 ± 1029.324
2026-01-22 23:38:43,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4524.8574, 4529.205, 4436.342, 1077.7827, 4275.829, 4514.343, 4147.936, 4694.2456, 4657.369, 4452.692]
2026-01-22 23:38:43,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:38:43,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 30 minutes, 29 seconds)
2026-01-22 23:40:19,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:40:27,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4523.48096 ± 120.018
2026-01-22 23:40:27,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4536.472, 4339.9927, 4466.293, 4486.703, 4452.5933, 4418.258, 4459.088, 4682.9434, 4706.836, 4685.6294]
2026-01-22 23:40:27,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:40:27,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (4523.48) for latency DatasetOffice
2026-01-22 23:40:27,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 28 minutes, 34 seconds)
2026-01-22 23:42:02,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:42:11,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4541.24414 ± 275.303
2026-01-22 23:42:11,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4483.7974, 4738.3906, 3847.922, 4453.7124, 4752.9795, 4674.229, 4647.6504, 4649.915, 4845.0137, 4318.831]
2026-01-22 23:42:11,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:42:11,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (4541.24) for latency DatasetOffice
2026-01-22 23:42:11,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 26 minutes, 28 seconds)
2026-01-22 23:43:46,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:43:55,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4705.76465 ± 602.817
2026-01-22 23:43:55,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5283.8027, 4556.985, 5079.339, 4479.3823, 3070.4368, 4839.197, 4986.794, 5155.78, 4585.793, 5020.14]
2026-01-22 23:43:55,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:43:55,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (4705.76) for latency DatasetOffice
2026-01-22 23:43:55,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 24 minutes, 21 seconds)
2026-01-22 23:45:29,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:45:38,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4871.11426 ± 175.292
2026-01-22 23:45:38,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4912.6978, 4662.6025, 4747.6533, 4906.0776, 4512.2007, 5142.876, 4931.0938, 5048.4087, 4889.4795, 4958.054]
2026-01-22 23:45:38,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:45:38,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (4871.11) for latency DatasetOffice
2026-01-22 23:45:38,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 22 minutes, 7 seconds)
2026-01-22 23:47:12,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:47:21,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5096.79785 ± 151.515
2026-01-22 23:47:21,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4909.1807, 5076.31, 5074.209, 4805.1533, 5209.557, 5175.027, 5028.434, 5158.103, 5156.798, 5375.211]
2026-01-22 23:47:21,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:47:21,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (5096.80) for latency DatasetOffice
2026-01-22 23:47:21,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 19 minutes, 48 seconds)
2026-01-22 23:48:55,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:49:03,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5104.93018 ± 166.096
2026-01-22 23:49:03,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4912.877, 4860.873, 4992.878, 5112.9136, 5060.391, 5107.652, 5205.8438, 5241.1504, 5473.083, 5081.6396]
2026-01-22 23:49:03,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:49:03,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (5104.93) for latency DatasetOffice
2026-01-22 23:49:03,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 17 minutes, 37 seconds)
2026-01-22 23:50:38,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:50:46,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5325.33301 ± 154.825
2026-01-22 23:50:46,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5436.2373, 5518.895, 5271.4556, 5222.445, 5238.1694, 5435.5444, 5347.9336, 5438.1323, 5392.0435, 4952.472]
2026-01-22 23:50:46,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:50:46,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (5325.33) for latency DatasetOffice
2026-01-22 23:50:46,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 15 minutes, 36 seconds)
2026-01-22 23:52:20,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:52:29,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4974.01709 ± 732.958
2026-01-22 23:52:29,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5504.913, 5103.568, 5321.9756, 4768.87, 2861.6653, 5306.1006, 5441.6313, 5253.2007, 5002.3467, 5175.9014]
2026-01-22 23:52:29,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:52:29,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 13 minutes, 35 seconds)
2026-01-22 23:54:02,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:54:11,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5097.32520 ± 941.239
2026-01-22 23:54:11,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5414.281, 5150.6973, 5300.8345, 5354.025, 2298.9866, 5339.649, 5467.663, 5600.333, 5468.4526, 5578.329]
2026-01-22 23:54:11,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:54:11,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 11 minutes, 36 seconds)
2026-01-22 23:55:44,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:55:53,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5315.31201 ± 192.822
2026-01-22 23:55:53,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5109.262, 5164.252, 5355.328, 4932.072, 5182.779, 5444.8413, 5487.0127, 5491.074, 5495.93, 5490.569]
2026-01-22 23:55:53,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:55:53,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 9 minutes, 41 seconds)
2026-01-22 23:57:26,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:57:35,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5445.30078 ± 246.877
2026-01-22 23:57:35,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5444.168, 5162.8345, 5007.257, 5473.7437, 5198.25, 5364.0923, 5724.9624, 5664.294, 5649.495, 5763.9097]
2026-01-22 23:57:35,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:57:35,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (5445.30) for latency DatasetOffice
2026-01-22 23:57:35,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 7 minutes, 50 seconds)
2026-01-22 23:59:09,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:59:17,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5540.53174 ± 156.368
2026-01-22 23:59:17,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5692.366, 5604.1284, 5474.709, 5440.713, 5384.735, 5709.499, 5685.3647, 5618.8613, 5600.257, 5194.682]
2026-01-22 23:59:17,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:59:17,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (5540.53) for latency DatasetOffice
2026-01-22 23:59:17,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 6 minutes)
2026-01-23 00:00:51,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:00:59,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5261.07324 ± 703.155
2026-01-23 00:00:59,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5601.8613, 5219.054, 5673.259, 5136.1836, 3241.2385, 5412.5537, 5684.605, 5583.4546, 5293.6406, 5764.8804]
2026-01-23 00:00:59,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:00:59,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 4 minutes, 12 seconds)
2026-01-23 00:02:33,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:02:41,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5482.99170 ± 167.106
2026-01-23 00:02:41,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5635.005, 5393.719, 5432.041, 5201.383, 5266.249, 5770.175, 5477.832, 5643.046, 5433.885, 5576.583]
2026-01-23 00:02:41,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:02:41,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 2 minutes, 30 seconds)
2026-01-23 00:04:15,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:04:23,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5653.35840 ± 163.124
2026-01-23 00:04:23,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5556.9556, 5595.2334, 5763.277, 5234.2817, 5632.0596, 5681.1064, 5854.4595, 5729.4243, 5775.4155, 5711.3716]
2026-01-23 00:04:23,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:04:23,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (5653.36) for latency DatasetOffice
2026-01-23 00:04:23,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 47 seconds)
2026-01-23 00:05:57,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:06:05,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5574.63379 ± 258.128
2026-01-23 00:06:05,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5468.504, 5245.3486, 5116.045, 5510.599, 5402.758, 5624.9185, 5818.007, 5871.421, 5904.819, 5783.918]
2026-01-23 00:06:05,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:06:05,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 59 minutes, 4 seconds)
2026-01-23 00:07:39,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:07:47,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5761.45361 ± 246.904
2026-01-23 00:07:47,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5790.8813, 6042.653, 5694.4155, 5332.857, 5775.664, 5961.7007, 5871.7397, 5969.4795, 5891.937, 5283.206]
2026-01-23 00:07:47,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:07:47,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (5761.45) for latency DatasetOffice
2026-01-23 00:07:47,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 57 minutes, 23 seconds)
2026-01-23 00:09:21,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:09:30,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5475.54004 ± 749.891
2026-01-23 00:09:30,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5983.985, 5556.9204, 5786.8643, 5404.566, 3283.4592, 5583.9546, 5843.1025, 5901.181, 5613.244, 5798.129]
2026-01-23 00:09:30,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:09:30,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 55 minutes, 43 seconds)
2026-01-23 00:11:03,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:11:12,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5704.05371 ± 159.370
2026-01-23 00:11:12,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5638.302, 5509.2534, 5550.7363, 5774.1, 5540.728, 5683.397, 5952.028, 5959.4893, 5595.4907, 5837.014]
2026-01-23 00:11:12,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:11:12,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 54 minutes)
2026-01-23 00:12:45,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:12:54,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5887.80957 ± 161.900
2026-01-23 00:12:54,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5539.0586, 5928.8022, 5909.874, 5665.9233, 5833.143, 5878.688, 5992.9404, 6080.35, 6039.437, 6009.8765]
2026-01-23 00:12:54,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:12:54,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (5887.81) for latency DatasetOffice
2026-01-23 00:12:54,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 52 minutes, 20 seconds)
2026-01-23 00:14:28,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:14:36,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5927.24072 ± 217.835
2026-01-23 00:14:36,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5997.3574, 5614.8833, 5576.0693, 5843.476, 5712.656, 5973.162, 6226.1787, 6093.224, 6072.5415, 6162.8574]
2026-01-23 00:14:36,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:14:36,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (5927.24) for latency DatasetOffice
2026-01-23 00:14:36,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 50 minutes, 39 seconds)
2026-01-23 00:16:10,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:16:18,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6096.39746 ± 198.891
2026-01-23 00:16:18,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6071.1353, 6285.5996, 6020.237, 6045.3735, 6116.3315, 6217.0303, 6343.243, 6182.966, 6102.6953, 5579.3677]
2026-01-23 00:16:18,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:16:18,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (6096.40) for latency DatasetOffice
2026-01-23 00:16:18,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 48 minutes, 59 seconds)
2026-01-23 00:17:52,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:18:00,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5772.78418 ± 795.201
2026-01-23 00:18:00,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6417.734, 5829.2793, 6122.422, 5775.189, 3491.2605, 6035.9546, 6077.2876, 6223.7983, 5569.73, 6185.1846]
2026-01-23 00:18:00,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:18:00,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 47 minutes, 16 seconds)
2026-01-23 00:19:34,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:19:42,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5970.36865 ± 162.148
2026-01-23 00:19:42,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5843.78, 5634.8296, 5983.0366, 6018.029, 5812.4937, 6218.2324, 5930.143, 6128.813, 6072.4165, 6061.91]
2026-01-23 00:19:42,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:19:42,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 45 minutes, 33 seconds)
2026-01-23 00:21:16,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:21:25,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6094.96387 ± 174.078
2026-01-23 00:21:25,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5820.563, 6094.336, 6075.9316, 5881.142, 5912.5303, 6140.6157, 6100.8403, 6306.79, 6379.9854, 6236.901]
2026-01-23 00:21:25,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:21:25,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 43 minutes, 51 seconds)
2026-01-23 00:22:58,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:23:07,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6114.65332 ± 228.655
2026-01-23 00:23:07,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6030.255, 5814.506, 5854.8843, 6082.9697, 6089.4883, 5806.5405, 6323.476, 6383.6416, 6427.168, 6333.6064]
2026-01-23 00:23:07,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:23:07,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (6114.65) for latency DatasetOffice
2026-01-23 00:23:07,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 42 minutes, 7 seconds)
2026-01-23 00:24:40,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:24:48,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6131.47949 ± 283.702
2026-01-23 00:24:48,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6222.0776, 6413.568, 5928.487, 5823.15, 6235.062, 6304.7656, 6333.9556, 6352.265, 6233.2725, 5468.192]
2026-01-23 00:24:48,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:24:48,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (6131.48) for latency DatasetOffice
2026-01-23 00:24:48,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 40 minutes, 21 seconds)
2026-01-23 00:26:22,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:26:31,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5885.44287 ± 777.143
2026-01-23 00:26:31,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6415.934, 5955.9736, 5952.2544, 5595.555, 3686.163, 6217.641, 6364.694, 6503.011, 5960.347, 6202.8545]
2026-01-23 00:26:31,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:26:31,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 38 minutes, 40 seconds)
2026-01-23 00:28:04,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:28:13,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6051.83789 ± 213.358
2026-01-23 00:28:13,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6033.494, 5626.396, 5909.09, 6038.698, 5909.6543, 6205.618, 6109.694, 6501.018, 6069.2383, 6115.4775]
2026-01-23 00:28:13,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:28:13,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 37 minutes)
2026-01-23 00:29:47,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:29:55,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6292.90869 ± 180.358
2026-01-23 00:29:55,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6041.821, 6359.5996, 6451.741, 6030.757, 6100.6694, 6234.9756, 6401.6294, 6572.2334, 6473.8115, 6261.846]
2026-01-23 00:29:55,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:29:55,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (6292.91) for latency DatasetOffice
2026-01-23 00:29:55,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 35 minutes, 16 seconds)
2026-01-23 00:31:29,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:31:37,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6354.16895 ± 169.152
2026-01-23 00:31:37,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6239.959, 6204.386, 6259.442, 6322.3223, 6205.387, 6142.986, 6425.397, 6680.4067, 6523.8984, 6537.503]
2026-01-23 00:31:37,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:31:37,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (6354.17) for latency DatasetOffice
2026-01-23 00:31:37,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 33 minutes, 35 seconds)
2026-01-23 00:33:11,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:33:19,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6384.15527 ± 224.199
2026-01-23 00:33:19,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6394.6694, 6684.1665, 6297.127, 6225.651, 6138.9897, 6632.723, 6546.615, 6444.901, 6542.777, 5933.9297]
2026-01-23 00:33:19,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:33:19,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (6384.16) for latency DatasetOffice
2026-01-23 00:33:19,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 31 minutes, 53 seconds)
2026-01-23 00:34:53,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:35:01,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6008.03906 ± 834.171
2026-01-23 00:35:01,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6427.9966, 6120.93, 6307.173, 5870.3853, 3616.7615, 6419.301, 6507.5713, 6627.207, 5839.8477, 6343.22]
2026-01-23 00:35:01,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:35:01,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 30 minutes, 11 seconds)
2026-01-23 00:36:35,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:36:43,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6304.59668 ± 198.806
2026-01-23 00:36:43,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6410.4194, 5985.1167, 6568.602, 6179.3096, 6014.773, 6434.515, 6324.407, 6581.3057, 6182.001, 6365.5146]
2026-01-23 00:36:43,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:36:43,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 28 minutes, 26 seconds)
2026-01-23 00:38:17,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:38:25,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6379.78516 ± 194.472
2026-01-23 00:38:25,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6174.2773, 6392.629, 6409.623, 6167.93, 6271.5073, 6057.7227, 6495.844, 6651.6147, 6598.6714, 6578.0317]
2026-01-23 00:38:25,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:38:25,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 26 minutes, 43 seconds)
2026-01-23 00:39:59,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:40:07,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6454.95068 ± 222.590
2026-01-23 00:40:07,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6381.3467, 6250.394, 6135.559, 6602.379, 6169.6426, 6293.697, 6598.414, 6740.7695, 6635.987, 6741.3164]
2026-01-23 00:40:07,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:40:07,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (6454.95) for latency DatasetOffice
2026-01-23 00:40:07,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 25 minutes)
2026-01-23 00:41:41,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:41:49,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6592.12402 ± 222.022
2026-01-23 00:41:49,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6596.3394, 6572.1997, 6667.229, 6348.21, 6509.9473, 6890.33, 6844.075, 6616.7173, 6769.8896, 6106.3022]
2026-01-23 00:41:49,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:41:49,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (6592.12) for latency DatasetOffice
2026-01-23 00:41:49,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 23 minutes, 19 seconds)
2026-01-23 00:43:23,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:43:31,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6251.58496 ± 826.264
2026-01-23 00:43:31,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6777.5986, 6125.0405, 6857.137, 6156.7153, 3911.9678, 6505.683, 6583.7236, 6681.067, 6112.059, 6804.8594]
2026-01-23 00:43:31,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:43:31,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 21 minutes, 36 seconds)
2026-01-23 00:45:05,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:45:13,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6515.48193 ± 214.031
2026-01-23 00:45:13,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6640.0234, 6227.8267, 6535.587, 6150.95, 6340.0864, 6589.0293, 6834.417, 6808.342, 6476.2734, 6552.2847]
2026-01-23 00:45:13,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:45:13,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 19 minutes, 54 seconds)
2026-01-23 00:46:47,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:46:55,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6538.95068 ± 207.531
2026-01-23 00:46:55,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6156.156, 6368.8853, 6591.418, 6219.883, 6643.848, 6520.532, 6727.356, 6777.2056, 6700.973, 6683.2554]
2026-01-23 00:46:55,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:46:55,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 18 minutes, 12 seconds)
2026-01-23 00:48:29,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:48:37,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6581.31738 ± 244.454
2026-01-23 00:48:37,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6549.2393, 6153.633, 6205.6143, 6681.6304, 6416.9663, 6616.252, 6648.7905, 6774.3833, 6953.8047, 6812.863]
2026-01-23 00:48:37,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:48:37,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 16 minutes, 30 seconds)
2026-01-23 00:50:11,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:50:19,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6704.13770 ± 256.641
2026-01-23 00:50:19,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6762.5366, 6974.042, 6440.322, 6500.5938, 6749.274, 6968.335, 6822.612, 6815.816, 6886.568, 6121.2734]
2026-01-23 00:50:19,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:50:19,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (6704.14) for latency DatasetOffice
2026-01-23 00:50:19,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 14 minutes, 48 seconds)
2026-01-23 00:51:53,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:52:01,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6374.90771 ± 842.786
2026-01-23 00:52:01,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6805.682, 6453.8467, 6803.6987, 6222.694, 3908.5217, 6589.1157, 6753.3135, 6875.3906, 6600.7554, 6736.06]
2026-01-23 00:52:01,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:52:01,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 13 minutes, 8 seconds)
2026-01-23 00:53:35,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:53:44,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6535.94238 ± 279.340
2026-01-23 00:53:44,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6559.8213, 6221.1255, 6432.8184, 6482.228, 6081.9165, 6934.134, 6597.21, 6924.252, 6292.3887, 6833.527]
2026-01-23 00:53:44,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:53:44,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 11 minutes, 27 seconds)
2026-01-23 00:55:17,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:55:26,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6784.89307 ± 235.780
2026-01-23 00:55:26,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6469.528, 6695.606, 6623.105, 6309.299, 6889.7275, 7015.095, 6946.6167, 6898.9585, 6997.792, 7003.1963]
2026-01-23 00:55:26,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:55:26,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (6784.89) for latency DatasetOffice
2026-01-23 00:55:26,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 9 minutes, 46 seconds)
2026-01-23 00:56:59,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:57:08,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6735.24365 ± 282.839
2026-01-23 00:57:08,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6804.1787, 6306.2056, 6311.201, 6753.4595, 6417.4736, 6757.2817, 6875.371, 7146.8315, 6917.8135, 7062.6196]
2026-01-23 00:57:08,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:57:08,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 8 minutes, 3 seconds)
2026-01-23 00:58:41,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:58:50,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6824.71729 ± 300.252
2026-01-23 00:58:50,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6965.017, 7073.054, 6777.442, 6466.7305, 6730.9185, 7100.736, 6942.1587, 7099.077, 6970.5015, 6121.5366]
2026-01-23 00:58:50,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:58:50,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (6824.72) for latency DatasetOffice
2026-01-23 00:58:50,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 6 minutes, 23 seconds)
2026-01-23 01:00:23,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:00:32,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6461.05566 ± 932.145
2026-01-23 01:00:32,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7136.4697, 6612.3696, 6941.9766, 6359.7314, 3775.6548, 6482.999, 6817.9067, 7053.8423, 6441.918, 6987.6895]
2026-01-23 01:00:32,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:00:32,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 4 minutes, 39 seconds)
2026-01-23 01:02:06,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:02:14,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6795.72754 ± 210.755
2026-01-23 01:02:14,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6837.87, 6560.0723, 6822.9087, 6641.8047, 6440.8354, 6939.926, 6973.395, 7045.089, 6606.2856, 7089.085]
2026-01-23 01:02:14,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:02:14,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 2 minutes, 58 seconds)
2026-01-23 01:03:48,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:03:56,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6957.85693 ± 213.786
2026-01-23 01:03:56,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6614.2734, 6897.5186, 7132.863, 6674.4556, 6771.6562, 6874.236, 7040.3765, 7111.5215, 7271.3975, 7190.2705]
2026-01-23 01:03:56,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:03:56,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (6957.86) for latency DatasetOffice
2026-01-23 01:03:56,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 1 minute, 15 seconds)
2026-01-23 01:05:30,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:05:38,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6944.80078 ± 262.354
2026-01-23 01:05:38,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6936.323, 6493.1206, 6778.4854, 7014.071, 6569.3965, 6875.8423, 7040.6416, 7246.3247, 7138.6533, 7355.15]
2026-01-23 01:05:38,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:05:38,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 66/100 (estimated time remaining: 59 minutes, 33 seconds)
2026-01-23 01:07:12,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:07:20,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7033.13672 ± 254.602
2026-01-23 01:07:20,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6978.749, 7279.4307, 6919.735, 6771.037, 6873.317, 7222.587, 7309.082, 7233.939, 7241.244, 6502.252]
2026-01-23 01:07:20,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:07:20,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (7033.14) for latency DatasetOffice
2026-01-23 01:07:20,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 67/100 (estimated time remaining: 57 minutes, 49 seconds)
2026-01-23 01:08:54,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:09:02,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6529.99658 ± 900.928
2026-01-23 01:09:02,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7148.05, 6628.068, 7106.679, 6261.502, 3945.6033, 6764.3853, 6914.58, 7006.7188, 6534.79, 6989.59]
2026-01-23 01:09:02,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:09:02,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 68/100 (estimated time remaining: 56 minutes, 7 seconds)
2026-01-23 01:10:36,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:10:44,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6678.28809 ± 192.594
2026-01-23 01:10:44,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6695.3433, 6374.9707, 6548.0425, 6691.078, 6421.3286, 6700.52, 6947.757, 6979.023, 6590.7603, 6834.061]
2026-01-23 01:10:44,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:10:44,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 69/100 (estimated time remaining: 54 minutes, 25 seconds)
2026-01-23 01:12:18,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:12:26,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6989.38574 ± 167.398
2026-01-23 01:12:26,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6608.225, 7132.596, 7016.724, 6745.394, 6989.8125, 7136.85, 7005.519, 7115.578, 7034.7935, 7108.37]
2026-01-23 01:12:26,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:12:26,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 70/100 (estimated time remaining: 52 minutes, 44 seconds)
2026-01-23 01:14:00,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:14:09,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6993.19531 ± 258.125
2026-01-23 01:14:09,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6850.4307, 6725.652, 6680.348, 7048.1304, 6765.337, 6732.074, 7206.354, 7260.1875, 7392.585, 7270.8516]
2026-01-23 01:14:09,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:14:09,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 71/100 (estimated time remaining: 51 minutes, 2 seconds)
2026-01-23 01:15:42,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:15:51,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6961.40771 ± 267.707
2026-01-23 01:15:51,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6858.9214, 7210.3687, 6872.547, 6737.6323, 6866.521, 7248.005, 7238.3438, 7092.504, 7136.5317, 6352.699]
2026-01-23 01:15:51,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:15:51,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 72/100 (estimated time remaining: 49 minutes, 20 seconds)
2026-01-23 01:17:24,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:17:33,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6689.68604 ± 897.713
2026-01-23 01:17:33,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7337.5728, 6634.213, 7209.703, 6551.5337, 4141.268, 6909.89, 7305.786, 7233.8223, 6552.875, 7020.1978]
2026-01-23 01:17:33,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:17:33,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 73/100 (estimated time remaining: 47 minutes, 38 seconds)
2026-01-23 01:19:06,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:19:15,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6905.64355 ± 246.665
2026-01-23 01:19:15,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6940.7524, 6625.7314, 6819.45, 6910.906, 6455.7085, 7203.917, 7073.194, 7157.9453, 6673.073, 7195.764]
2026-01-23 01:19:15,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:19:15,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 74/100 (estimated time remaining: 45 minutes, 56 seconds)
2026-01-23 01:20:48,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:20:57,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7130.17480 ± 216.627
2026-01-23 01:20:57,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6842.731, 7158.753, 7050.0254, 6687.452, 7121.876, 7242.502, 7232.526, 7369.764, 7450.4897, 7145.6353]
2026-01-23 01:20:57,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:20:57,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (7130.17) for latency DatasetOffice
2026-01-23 01:20:57,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 75/100 (estimated time remaining: 44 minutes, 13 seconds)
2026-01-23 01:22:30,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:22:39,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6875.70996 ± 214.552
2026-01-23 01:22:39,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6839.9893, 6576.488, 6551.285, 6881.962, 6600.519, 6961.091, 7130.519, 7048.681, 7073.69, 7092.876]
2026-01-23 01:22:39,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:22:39,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 76/100 (estimated time remaining: 42 minutes, 31 seconds)
2026-01-23 01:24:13,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:24:21,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7160.59619 ± 324.537
2026-01-23 01:24:21,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7130.708, 7562.3145, 6953.4043, 6956.8306, 6866.7324, 7407.7627, 7363.3096, 7410.9023, 7470.914, 6483.0786]
2026-01-23 01:24:21,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:24:21,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (7160.60) for latency DatasetOffice
2026-01-23 01:24:21,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 77/100 (estimated time remaining: 40 minutes, 50 seconds)
2026-01-23 01:25:55,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:26:03,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6835.40332 ± 1025.269
2026-01-23 01:26:03,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7552.6816, 6816.0376, 7266.666, 6668.389, 3876.5347, 7067.1436, 7390.174, 7517.137, 6923.2163, 7276.0566]
2026-01-23 01:26:03,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:26:03,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 78/100 (estimated time remaining: 39 minutes, 7 seconds)
2026-01-23 01:27:37,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:27:45,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6995.08594 ± 278.009
2026-01-23 01:27:45,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7109.177, 6963.5195, 6708.419, 6781.1587, 6575.574, 7246.093, 7064.003, 7561.1177, 6789.749, 7152.044]
2026-01-23 01:27:45,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:27:45,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 79/100 (estimated time remaining: 37 minutes, 25 seconds)
2026-01-23 01:29:19,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:29:27,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7123.67041 ± 163.040
2026-01-23 01:29:27,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6994.051, 7088.4883, 7019.491, 6851.605, 7147.715, 7175.9087, 6991.978, 7281.3657, 7445.85, 7240.2505]
2026-01-23 01:29:27,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:29:27,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 80/100 (estimated time remaining: 35 minutes, 43 seconds)
2026-01-23 01:31:01,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:31:09,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7099.17285 ± 303.999
2026-01-23 01:31:09,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7162.7803, 6745.93, 6643.933, 7047.9233, 6661.879, 7224.174, 7157.0376, 7501.5005, 7382.6855, 7463.8813]
2026-01-23 01:31:09,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:31:09,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 81/100 (estimated time remaining: 34 minutes, 2 seconds)
2026-01-23 01:32:43,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:32:52,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7223.06543 ± 235.047
2026-01-23 01:32:52,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7344.526, 7366.032, 6961.3657, 7148.77, 7178.8013, 7357.819, 7346.238, 7286.3677, 7557.9087, 6682.8286]
2026-01-23 01:32:52,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:32:52,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (7223.07) for latency DatasetOffice
2026-01-23 01:32:52,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 82/100 (estimated time remaining: 32 minutes, 20 seconds)
2026-01-23 01:34:25,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:34:34,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6954.29932 ± 990.238
2026-01-23 01:34:34,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7475.487, 7023.905, 7530.1777, 6632.2075, 4112.456, 7480.8213, 7533.708, 7487.0996, 6965.2188, 7301.9136]
2026-01-23 01:34:34,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:34:34,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 83/100 (estimated time remaining: 30 minutes, 38 seconds)
2026-01-23 01:36:07,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:36:16,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7081.55859 ± 292.669
2026-01-23 01:36:16,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7206.9614, 6453.2676, 6789.1763, 7002.785, 6937.5054, 7331.2915, 7327.2017, 7509.103, 7020.5254, 7237.775]
2026-01-23 01:36:16,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:36:16,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 84/100 (estimated time remaining: 28 minutes, 56 seconds)
2026-01-23 01:37:49,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:37:58,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7304.63135 ± 147.762
2026-01-23 01:37:58,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6995.4683, 7477.7974, 7260.3394, 7132.515, 7195.6514, 7354.2466, 7406.9844, 7353.502, 7430.5615, 7439.2407]
2026-01-23 01:37:58,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:37:58,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (7304.63) for latency DatasetOffice
2026-01-23 01:37:58,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 85/100 (estimated time remaining: 27 minutes, 14 seconds)
2026-01-23 01:39:32,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:39:40,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7239.59229 ± 250.190
2026-01-23 01:39:40,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7230.3804, 7083.569, 6829.7246, 7433.9927, 6919.82, 7090.427, 7446.2563, 7699.7373, 7324.851, 7337.164]
2026-01-23 01:39:40,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:39:40,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 86/100 (estimated time remaining: 25 minutes, 31 seconds)
2026-01-23 01:41:14,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:41:22,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7314.78662 ± 235.433
2026-01-23 01:41:22,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7468.7666, 7564.0723, 6993.8516, 7073.3486, 7321.885, 7571.679, 7454.634, 7266.1157, 7532.9463, 6900.5674]
2026-01-23 01:41:22,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:41:22,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (7314.79) for latency DatasetOffice
2026-01-23 01:41:22,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 87/100 (estimated time remaining: 23 minutes, 49 seconds)
2026-01-23 01:42:56,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:43:04,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6974.25488 ± 981.680
2026-01-23 01:43:04,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7471.332, 6984.347, 7404.5884, 6716.711, 4149.6807, 7266.4463, 7510.7856, 7611.775, 7039.338, 7587.5464]
2026-01-23 01:43:04,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:43:04,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 88/100 (estimated time remaining: 22 minutes, 7 seconds)
2026-01-23 01:44:38,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:44:46,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7118.20703 ± 332.829
2026-01-23 01:44:46,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7493.101, 6484.3447, 6936.9614, 6813.137, 6786.2935, 7361.035, 7424.4424, 7530.358, 7175.4585, 7176.942]
2026-01-23 01:44:46,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:44:46,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 89/100 (estimated time remaining: 20 minutes, 24 seconds)
2026-01-23 01:46:20,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:46:28,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7333.05957 ± 159.987
2026-01-23 01:46:28,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7028.135, 7367.1836, 7377.0654, 7057.293, 7261.9756, 7481.912, 7362.825, 7513.0425, 7430.1523, 7451.0176]
2026-01-23 01:46:28,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:46:28,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (7333.06) for latency DatasetOffice
2026-01-23 01:46:28,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 90/100 (estimated time remaining: 18 minutes, 42 seconds)
2026-01-23 01:48:02,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:48:11,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7242.18115 ± 321.445
2026-01-23 01:48:11,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7019.598, 6903.322, 7025.432, 7045.518, 6847.226, 7093.5996, 7562.2573, 7636.694, 7753.7197, 7534.444]
2026-01-23 01:48:11,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:48:11,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 91/100 (estimated time remaining: 17 minutes, 1 second)
2026-01-23 01:49:44,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:49:53,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7388.93604 ± 287.385
2026-01-23 01:49:53,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7485.8706, 7666.6104, 7150.6455, 7195.0884, 7147.5415, 7560.9043, 7574.825, 7762.159, 7553.627, 6792.0894]
2026-01-23 01:49:53,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:49:53,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (7388.94) for latency DatasetOffice
2026-01-23 01:49:53,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 92/100 (estimated time remaining: 15 minutes, 18 seconds)
2026-01-23 01:51:26,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:51:35,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7001.50391 ± 982.550
2026-01-23 01:51:35,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7567.285, 7007.763, 7275.013, 6904.844, 4148.112, 7317.5947, 7579.8975, 7591.4, 7053.5005, 7569.632]
2026-01-23 01:51:35,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:51:35,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 37 seconds)
2026-01-23 01:53:08,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:53:17,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7069.80078 ± 208.918
2026-01-23 01:53:17,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7176.4526, 6692.8545, 6994.139, 6923.8706, 6852.4536, 7285.052, 7189.4165, 7270.1016, 6941.392, 7372.28]
2026-01-23 01:53:17,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:53:17,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 54 seconds)
2026-01-23 01:54:51,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:55:00,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7363.17432 ± 199.954
2026-01-23 01:55:00,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7152.857, 7365.683, 7326.4526, 6895.381, 7406.2026, 7391.087, 7647.5645, 7558.7915, 7461.1914, 7426.529]
2026-01-23 01:55:00,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:55:00,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 13 seconds)
2026-01-23 01:56:33,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:56:42,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7266.21875 ± 285.446
2026-01-23 01:56:42,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7241.92, 7073.9795, 6806.2534, 7337.1743, 6798.7397, 7358.769, 7281.1167, 7574.7075, 7489.2393, 7700.2896]
2026-01-23 01:56:42,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:56:42,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 31 seconds)
2026-01-23 01:58:15,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:58:24,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7576.19678 ± 291.006
2026-01-23 01:58:24,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7702.3184, 7933.535, 7484.7036, 7293.73, 7470.5444, 7783.9106, 7595.072, 7827.9443, 7771.453, 6898.758]
2026-01-23 01:58:24,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:58:24,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1274 [INFO]: New best (7576.20) for latency DatasetOffice
2026-01-23 01:58:24,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 49 seconds)
2026-01-23 01:59:58,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:00:06,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7062.88916 ± 1006.463
2026-01-23 02:00:06,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7733.9233, 7177.8657, 7712.1655, 6817.6167, 4191.566, 7047.6523, 7651.604, 7679.6724, 7093.7485, 7523.0703]
2026-01-23 02:00:06,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:00:06,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 6 seconds)
2026-01-23 02:01:40,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:01:48,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7188.66162 ± 246.305
2026-01-23 02:01:48,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7138.17, 6690.212, 7171.0684, 7101.617, 6932.0635, 7521.5713, 7266.055, 7566.096, 7336.9434, 7162.8193]
2026-01-23 02:01:48,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:01:48,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 24 seconds)
2026-01-23 02:03:22,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:03:30,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7322.95166 ± 152.833
2026-01-23 02:03:30,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7118.6836, 7470.8667, 7307.32, 7023.916, 7205.635, 7311.669, 7403.6665, 7476.3027, 7431.9785, 7479.474]
2026-01-23 02:03:30,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:03:30,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1247 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 42 seconds)
2026-01-23 02:05:04,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:05:12,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7529.62109 ± 306.055
2026-01-23 02:05:12,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7500.6787, 7189.3896, 6889.039, 7742.36, 7381.5146, 7378.658, 7726.842, 7755.127, 7825.554, 7907.0474]
2026-01-23 02:05:12,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:05:12,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1299 [DEBUG]: Training session finished
