2026-01-23 01:56:44,347 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-halfcheetah/DatasetOffice-bpql-mda-mem5 
2026-01-23 01:56:44,347 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-halfcheetah/DatasetOffice-bpql-mda-mem5 
2026-01-23 01:56:44,347 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x150852f7b110>}
2026-01-23 01:56:44,348 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1159 [DEBUG]: using device: cuda
2026-01-23 01:56:44,488 baseline-bpql-mda-noisy-halfcheetah:91 [WARNING]: args.assumed_delay != args.horizon: 5 != 32
2026-01-23 01:56:44,489 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1181 [INFO]: Creating new trainer
2026-01-23 01:56:44,505 baseline-bpql-mda-noisy-halfcheetah:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2026-01-23 01:56:44,505 baseline-bpql-mda-noisy-halfcheetah:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-23 01:56:44,511 baseline-bpql-mda-noisy-halfcheetah:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(6, 384, batch_first=True)
)
2026-01-23 01:56:45,313 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1242 [DEBUG]: Starting training session...
2026-01-23 01:56:45,313 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 1/100
2026-01-23 02:00:19,028 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:00:33,174 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: -322.27280 ± 20.399
2026-01-23 02:00:33,175 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [-318.84247, -333.00848, -299.7034, -348.1084, -325.5101, -345.4433, -276.988, -336.78094, -315.32373, -323.01923]
2026-01-23 02:00:33,175 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:00:33,175 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (-322.27) for latency DatasetOffice
2026-01-23 02:00:33,178 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 2/100 (estimated time remaining: 6 hours, 15 minutes, 58 seconds)
2026-01-23 02:04:16,255 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:04:30,262 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 178.33411 ± 79.154
2026-01-23 02:04:30,262 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [253.30327, 285.81018, 85.35554, 172.74562, 209.96063, 226.83565, 88.27779, 198.38187, 30.036697, 232.63373]
2026-01-23 02:04:30,262 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:04:30,262 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (178.33) for latency DatasetOffice
2026-01-23 02:04:30,266 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 3/100 (estimated time remaining: 6 hours, 19 minutes, 42 seconds)
2026-01-23 02:08:07,709 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:08:21,667 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 605.78491 ± 62.323
2026-01-23 02:08:21,668 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [709.67896, 545.06616, 630.1284, 554.9685, 718.26953, 628.58704, 566.7929, 590.7147, 583.9387, 529.7045]
2026-01-23 02:08:21,668 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:08:21,668 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (605.78) for latency DatasetOffice
2026-01-23 02:08:21,670 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 4/100 (estimated time remaining: 6 hours, 15 minutes, 15 seconds)
2026-01-23 02:11:59,195 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:12:13,174 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1095.96545 ± 176.686
2026-01-23 02:12:13,174 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [714.9465, 1001.87506, 1173.5137, 1091.9543, 920.9897, 1346.9918, 1089.808, 1141.522, 1326.9015, 1151.1517]
2026-01-23 02:12:13,174 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:12:13,174 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (1095.97) for latency DatasetOffice
2026-01-23 02:12:13,178 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 5/100 (estimated time remaining: 6 hours, 11 minutes, 8 seconds)
2026-01-23 02:15:51,024 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:16:04,970 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1611.64099 ± 423.449
2026-01-23 02:16:04,971 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [1985.1469, 2015.7985, 2040.0779, 1039.8643, 1251.5673, 2008.4496, 1981.4089, 1034.4094, 1139.7821, 1619.905]
2026-01-23 02:16:04,971 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:16:04,971 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (1611.64) for latency DatasetOffice
2026-01-23 02:16:04,975 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 6/100 (estimated time remaining: 6 hours, 7 minutes, 13 seconds)
2026-01-23 02:19:42,663 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:19:56,681 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1960.01294 ± 627.257
2026-01-23 02:19:56,681 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2158.2683, 995.3478, 1081.8446, 2537.8506, 2499.9219, 2677.6694, 1166.8625, 1771.0638, 2549.521, 2161.779]
2026-01-23 02:19:56,681 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:19:56,681 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (1960.01) for latency DatasetOffice
2026-01-23 02:19:56,684 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 7/100 (estimated time remaining: 6 hours, 4 minutes, 33 seconds)
2026-01-23 02:23:34,511 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:23:48,472 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2552.14771 ± 104.520
2026-01-23 02:23:48,472 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2688.7292, 2663.6843, 2463.5078, 2333.8142, 2547.618, 2526.5361, 2627.2861, 2597.9019, 2618.9597, 2453.439]
2026-01-23 02:23:48,472 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:23:48,472 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (2552.15) for latency DatasetOffice
2026-01-23 02:23:48,477 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 8/100 (estimated time remaining: 5 hours, 59 minutes, 2 seconds)
2026-01-23 02:27:26,393 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:27:40,359 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3562.22510 ± 98.678
2026-01-23 02:27:40,359 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3633.7698, 3570.8838, 3685.072, 3423.757, 3690.8638, 3499.5513, 3421.8613, 3592.4553, 3643.1643, 3460.87]
2026-01-23 02:27:40,359 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:27:40,359 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (3562.23) for latency DatasetOffice
2026-01-23 02:27:40,363 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 9/100 (estimated time remaining: 5 hours, 55 minutes, 19 seconds)
2026-01-23 02:31:17,972 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:31:31,805 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3867.45264 ± 686.129
2026-01-23 02:31:31,805 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2658.3489, 4346.2783, 4027.2375, 4377.15, 4491.4644, 4349.378, 4175.8228, 3829.4158, 3967.2788, 2452.1572]
2026-01-23 02:31:31,805 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:31:31,805 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (3867.45) for latency DatasetOffice
2026-01-23 02:31:31,809 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 10/100 (estimated time remaining: 5 hours, 51 minutes, 27 seconds)
2026-01-23 02:35:09,102 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:35:22,783 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3172.81128 ± 1316.270
2026-01-23 02:35:22,783 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [1788.1527, 4081.1052, 1052.2087, 4441.182, 4458.7856, 1263.7926, 2801.9067, 4308.5015, 3042.3928, 4490.085]
2026-01-23 02:35:22,783 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:35:22,787 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 11/100 (estimated time remaining: 5 hours, 47 minutes, 20 seconds)
2026-01-23 02:39:00,881 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:39:14,552 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3586.56909 ± 955.311
2026-01-23 02:39:14,553 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2393.5293, 3934.0688, 1588.8579, 4241.773, 4045.2412, 4313.2144, 4223.9194, 2566.4688, 4371.375, 4187.244]
2026-01-23 02:39:14,553 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:39:14,558 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 12/100 (estimated time remaining: 5 hours, 43 minutes, 30 seconds)
2026-01-23 02:42:51,278 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:43:04,940 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3563.48706 ± 882.542
2026-01-23 02:43:04,940 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4084.3052, 3428.1155, 4167.0356, 3904.019, 4171.636, 4127.887, 3931.6497, 2067.945, 1648.6381, 4103.637]
2026-01-23 02:43:04,940 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:43:04,943 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 13/100 (estimated time remaining: 5 hours, 39 minutes, 13 seconds)
2026-01-23 02:46:42,586 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:46:56,579 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4058.28442 ± 426.888
2026-01-23 02:46:56,580 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3493.0845, 3044.1934, 4160.311, 4219.5483, 4404.6055, 4398.5273, 4232.579, 4186.5073, 4447.7144, 3995.7715]
2026-01-23 02:46:56,580 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:46:56,580 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (4058.28) for latency DatasetOffice
2026-01-23 02:46:56,590 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 14/100 (estimated time remaining: 5 hours, 35 minutes, 18 seconds)
2026-01-23 02:50:33,575 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:50:47,110 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3495.26123 ± 1130.271
2026-01-23 02:50:47,110 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [1154.5271, 4112.6562, 2266.6396, 4276.4097, 2054.6562, 4160.368, 4405.213, 3970.5276, 4326.2563, 4225.362]
2026-01-23 02:50:47,110 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:50:47,115 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 15/100 (estimated time remaining: 5 hours, 31 minutes, 11 seconds)
2026-01-23 02:54:23,122 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:54:36,861 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4356.33203 ± 869.398
2026-01-23 02:54:36,862 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4779.037, 4428.7964, 4629.4863, 4783.324, 4651.542, 4388.311, 4659.0566, 1793.5206, 4503.9526, 4946.294]
2026-01-23 02:54:36,862 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:54:36,862 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (4356.33) for latency DatasetOffice
2026-01-23 02:54:36,866 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 16/100 (estimated time remaining: 5 hours, 26 minutes, 59 seconds)
2026-01-23 02:58:11,886 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:58:25,553 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4359.81445 ± 525.245
2026-01-23 02:58:25,554 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4352.265, 4896.179, 4504.521, 3615.109, 4611.9927, 4429.219, 4572.7515, 3145.713, 4849.0728, 4621.319]
2026-01-23 02:58:25,554 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:58:25,554 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (4359.81) for latency DatasetOffice
2026-01-23 02:58:25,559 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 17/100 (estimated time remaining: 5 hours, 22 minutes, 16 seconds)
2026-01-23 03:01:59,929 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:02:13,604 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4685.93066 ± 182.803
2026-01-23 03:02:13,604 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4844.585, 4686.189, 4588.257, 4227.008, 4839.1963, 4706.575, 4642.431, 4826.4824, 4882.056, 4616.522]
2026-01-23 03:02:13,604 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:02:13,604 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (4685.93) for latency DatasetOffice
2026-01-23 03:02:13,609 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 18/100 (estimated time remaining: 5 hours, 17 minutes, 47 seconds)
2026-01-23 03:05:47,890 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:06:01,565 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4281.85254 ± 850.733
2026-01-23 03:06:01,565 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4492.9575, 4836.637, 4646.434, 4659.0254, 4507.858, 4711.667, 1813.8369, 4629.9756, 4543.6875, 3976.4468]
2026-01-23 03:06:01,565 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:06:01,568 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 19/100 (estimated time remaining: 5 hours, 12 minutes, 57 seconds)
2026-01-23 03:09:35,308 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:09:48,767 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4293.05518 ± 1174.371
2026-01-23 03:09:48,767 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [809.7324, 4749.229, 4898.6006, 4480.6904, 4840.471, 4266.2188, 4700.5923, 4821.6035, 4683.776, 4679.6357]
2026-01-23 03:09:48,767 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:09:48,772 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 20/100 (estimated time remaining: 5 hours, 8 minutes, 14 seconds)
2026-01-23 03:13:22,222 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:13:35,805 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4682.82422 ± 244.417
2026-01-23 03:13:35,805 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4821.8296, 4769.5117, 4524.0684, 4913.6367, 4853.067, 4856.1055, 4689.74, 4083.4128, 4467.989, 4848.882]
2026-01-23 03:13:35,805 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:13:35,813 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 21/100 (estimated time remaining: 5 hours, 3 minutes, 43 seconds)
2026-01-23 03:17:09,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:17:22,822 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4360.89941 ± 900.189
2026-01-23 03:17:22,822 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4871.19, 5062.5806, 4923.594, 4928.1396, 4626.091, 4759.885, 4915.116, 3485.3713, 2068.5, 3968.529]
2026-01-23 03:17:22,822 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:17:22,827 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 22/100 (estimated time remaining: 4 hours, 59 minutes, 28 seconds)
2026-01-23 03:20:56,975 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:21:10,526 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4455.67090 ± 926.294
2026-01-23 03:21:10,526 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4946.6123, 4960.989, 4944.4624, 4894.8965, 4970.08, 4660.668, 1772.6989, 4487.406, 4189.2866, 4729.6143]
2026-01-23 03:21:10,526 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:21:10,534 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 23/100 (estimated time remaining: 4 hours, 55 minutes, 36 seconds)
2026-01-23 03:24:44,095 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:24:57,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4813.62939 ± 162.486
2026-01-23 03:24:57,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5040.008, 4888.668, 5038.263, 4551.3594, 4798.175, 4845.254, 4776.428, 4938.8916, 4599.097, 4660.1484]
2026-01-23 03:24:57,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:24:57,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (4813.63) for latency DatasetOffice
2026-01-23 03:24:57,664 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 51 minutes, 35 seconds)
2026-01-23 03:28:31,022 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:28:44,592 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4513.14551 ± 722.685
2026-01-23 03:28:44,592 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3039.0112, 5024.362, 4793.471, 4611.951, 4961.486, 5211.6147, 3172.1873, 4830.5664, 4687.938, 4798.8687]
2026-01-23 03:28:44,592 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:28:44,597 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 25/100 (estimated time remaining: 4 hours, 47 minutes, 44 seconds)
2026-01-23 03:32:18,133 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:32:31,431 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4717.05713 ± 137.914
2026-01-23 03:32:31,431 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4522.9194, 4679.522, 4633.6885, 4984.312, 4911.426, 4627.896, 4704.2935, 4812.039, 4710.75, 4583.727]
2026-01-23 03:32:31,431 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:32:31,437 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 43 minutes, 54 seconds)
2026-01-23 03:36:05,144 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:36:18,736 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4750.29443 ± 518.031
2026-01-23 03:36:18,736 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5015.667, 5022.844, 5025.1797, 5056.4253, 4622.243, 4622.474, 5315.1484, 3344.8508, 4593.698, 4884.412]
2026-01-23 03:36:18,736 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:36:18,742 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 27/100 (estimated time remaining: 4 hours, 40 minutes, 11 seconds)
2026-01-23 03:39:51,236 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:40:04,791 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4865.01074 ± 171.897
2026-01-23 03:40:04,791 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4866.4663, 4701.399, 4523.6655, 4745.235, 5060.1377, 5065.994, 4855.1353, 4780.1104, 5037.263, 5014.702]
2026-01-23 03:40:04,791 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:40:04,791 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (4865.01) for latency DatasetOffice
2026-01-23 03:40:04,796 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 28/100 (estimated time remaining: 4 hours, 36 minutes)
2026-01-23 03:43:37,984 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:43:51,523 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4918.87646 ± 146.229
2026-01-23 03:43:51,523 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5092.644, 5011.8506, 5023.109, 4716.7085, 4707.7363, 4991.652, 4999.8936, 4950.928, 5013.6855, 4680.5547]
2026-01-23 03:43:51,523 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:43:51,523 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (4918.88) for latency DatasetOffice
2026-01-23 03:43:51,529 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 29/100 (estimated time remaining: 4 hours, 32 minutes, 7 seconds)
2026-01-23 03:47:24,645 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:47:38,220 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4737.69873 ± 509.514
2026-01-23 03:47:38,220 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3219.2393, 5012.5474, 4977.5557, 4895.573, 4880.6753, 4892.7607, 4866.1113, 4794.5303, 4953.4814, 4884.517]
2026-01-23 03:47:38,220 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:47:38,225 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 28 minutes, 17 seconds)
2026-01-23 03:51:11,640 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:51:24,904 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4886.03955 ± 98.081
2026-01-23 03:51:24,904 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4896.0205, 4752.856, 4781.4146, 4962.027, 4993.3516, 4863.834, 4843.919, 5076.7197, 4781.2886, 4908.9644]
2026-01-23 03:51:24,904 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:51:24,911 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 24 minutes, 28 seconds)
2026-01-23 03:54:58,934 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:55:12,494 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4750.31543 ± 474.228
2026-01-23 03:55:12,494 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4539.497, 4921.2666, 5037.2993, 5016.8677, 4705.2075, 4886.864, 5129.677, 3418.9934, 5034.8257, 4812.657]
2026-01-23 03:55:12,494 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:55:12,500 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 32/100 (estimated time remaining: 4 hours, 20 minutes, 45 seconds)
2026-01-23 03:58:45,784 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:58:59,342 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5012.09863 ± 127.181
2026-01-23 03:58:59,342 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5227.949, 5158.955, 4827.731, 4828.3945, 4953.5938, 5134.508, 5062.8774, 4955.1025, 4974.746, 4997.1323]
2026-01-23 03:58:59,342 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:58:59,342 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (5012.10) for latency DatasetOffice
2026-01-23 03:58:59,349 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 17 minutes, 9 seconds)
2026-01-23 04:02:32,197 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:02:45,488 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5019.71973 ± 143.547
2026-01-23 04:02:45,488 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5251.391, 4935.605, 5023.5425, 4752.646, 5032.5044, 4919.3096, 5032.7383, 5174.6763, 5176.882, 4897.8994]
2026-01-23 04:02:45,488 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:02:45,489 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (5019.72) for latency DatasetOffice
2026-01-23 04:02:45,495 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 34/100 (estimated time remaining: 4 hours, 13 minutes, 15 seconds)
2026-01-23 04:06:18,850 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:06:32,251 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4842.33301 ± 519.815
2026-01-23 04:06:32,251 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3345.1775, 4976.857, 5103.3364, 4622.739, 4965.0376, 5212.094, 5016.2935, 5066.849, 5033.628, 5081.3164]
2026-01-23 04:06:32,251 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:06:32,259 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 35/100 (estimated time remaining: 4 hours, 9 minutes, 29 seconds)
2026-01-23 04:10:05,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:10:19,470 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5140.29443 ± 109.293
2026-01-23 04:10:19,470 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5111.1914, 5223.619, 5127.8823, 5239.4897, 5351.417, 4957.49, 5123.0, 5117.7124, 4990.461, 5160.679]
2026-01-23 04:10:19,470 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:10:19,470 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (5140.29) for latency DatasetOffice
2026-01-23 04:10:19,485 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 36/100 (estimated time remaining: 4 hours, 5 minutes, 49 seconds)
2026-01-23 04:13:52,468 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:14:06,005 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4728.53027 ± 1038.587
2026-01-23 04:14:06,005 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4774.7188, 5073.681, 5289.147, 5276.7607, 5037.87, 5002.886, 5168.178, 1643.302, 4939.09, 5079.67]
2026-01-23 04:14:06,005 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:14:06,011 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 37/100 (estimated time remaining: 4 hours, 1 minute, 48 seconds)
2026-01-23 04:17:38,357 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:17:51,898 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4762.68848 ± 1065.841
2026-01-23 04:17:51,898 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5067.727, 5188.8203, 5094.797, 1571.3541, 5165.0815, 5115.166, 4950.341, 5180.3594, 5147.551, 5145.684]
2026-01-23 04:17:51,898 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:17:51,907 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 57 minutes, 50 seconds)
2026-01-23 04:21:24,581 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:21:38,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4753.07275 ± 581.172
2026-01-23 04:21:38,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5084.1836, 5050.0903, 3075.8245, 4720.6074, 4759.642, 5027.51, 4738.0747, 5185.149, 5066.7456, 4822.9004]
2026-01-23 04:21:38,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:21:38,098 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 54 minutes, 4 seconds)
2026-01-23 04:25:11,412 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:25:24,656 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4888.15918 ± 598.155
2026-01-23 04:25:24,656 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3123.6624, 5011.138, 5023.9106, 4949.089, 5115.5684, 4936.9585, 5276.679, 5132.7324, 5253.2095, 5058.6377]
2026-01-23 04:25:24,656 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:25:24,662 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 50 minutes, 15 seconds)
2026-01-23 04:28:58,099 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:29:11,599 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5183.88477 ± 89.683
2026-01-23 04:29:11,600 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5117.694, 5148.187, 5148.8223, 5186.238, 5204.2104, 5341.8447, 5251.2534, 4996.707, 5168.5376, 5275.3516]
2026-01-23 04:29:11,600 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:29:11,600 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (5183.88) for latency DatasetOffice
2026-01-23 04:29:11,606 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 46 minutes, 25 seconds)
2026-01-23 04:32:45,317 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:32:58,827 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4355.87598 ± 1183.482
2026-01-23 04:32:58,827 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [1123.4253, 4875.996, 4909.698, 4999.6943, 4655.2227, 4771.536, 4885.189, 3291.8875, 5044.576, 5001.5337]
2026-01-23 04:32:58,828 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:32:58,833 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 42 minutes, 47 seconds)
2026-01-23 04:36:31,686 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:36:45,047 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5199.42871 ± 105.345
2026-01-23 04:36:45,047 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5392.2837, 5130.751, 5271.3423, 5151.2695, 5037.2793, 5052.7075, 5264.6743, 5220.412, 5180.8037, 5292.7646]
2026-01-23 04:36:45,047 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:36:45,047 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (5199.43) for latency DatasetOffice
2026-01-23 04:36:45,056 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 39 minutes, 4 seconds)
2026-01-23 04:40:18,249 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:40:31,775 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4842.47363 ± 367.735
2026-01-23 04:40:31,775 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4905.634, 3939.969, 5218.2373, 4690.4062, 4698.9233, 5126.634, 4916.3765, 5296.305, 4982.407, 4649.843]
2026-01-23 04:40:31,775 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:40:31,783 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 35 minutes, 24 seconds)
2026-01-23 04:44:05,418 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:44:18,949 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5047.19629 ± 560.807
2026-01-23 04:44:18,949 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3389.322, 5202.8267, 5416.8525, 5061.947, 5124.5605, 5202.956, 5347.638, 5244.5664, 5217.252, 5264.0435]
2026-01-23 04:44:18,949 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:44:18,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 31 minutes, 44 seconds)
2026-01-23 04:47:51,808 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:48:05,328 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5138.10645 ± 145.722
2026-01-23 04:48:05,328 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5026.022, 5300.0312, 5159.151, 5047.171, 5189.9165, 5064.6807, 5202.9805, 5286.179, 4811.657, 5293.273]
2026-01-23 04:48:05,328 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:48:05,336 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 27 minutes, 51 seconds)
2026-01-23 04:51:38,814 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:51:52,036 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4873.70605 ± 494.963
2026-01-23 04:51:52,036 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4911.9473, 5197.5684, 5018.1357, 4961.4697, 4964.6064, 4849.7124, 5129.591, 3426.6128, 5194.023, 5083.399]
2026-01-23 04:51:52,036 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:51:52,042 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 23 minutes, 58 seconds)
2026-01-23 04:55:25,429 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:55:38,951 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5066.47412 ± 143.938
2026-01-23 04:55:38,952 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5199.4067, 4954.7554, 5019.742, 4700.9834, 5224.2456, 5164.191, 5123.6714, 5061.2573, 5104.1, 5112.3906]
2026-01-23 04:55:38,952 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:55:38,958 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 20 minutes, 19 seconds)
2026-01-23 04:59:11,549 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:59:25,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5173.10205 ± 138.911
2026-01-23 04:59:25,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5299.6, 5152.382, 5396.1997, 5003.7197, 5263.386, 5176.4937, 4901.0312, 5281.8315, 5124.5146, 5131.859]
2026-01-23 04:59:25,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:59:25,078 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 16 minutes, 26 seconds)
2026-01-23 05:02:58,133 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:03:11,544 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4938.12793 ± 544.112
2026-01-23 05:03:11,544 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3326.9155, 5258.741, 5142.0806, 5039.1807, 5154.002, 5227.8335, 5061.1274, 4955.764, 5164.461, 5051.1704]
2026-01-23 05:03:11,544 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:03:11,551 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 12 minutes, 32 seconds)
2026-01-23 05:06:44,989 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:06:58,382 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5230.75928 ± 88.805
2026-01-23 05:06:58,382 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5326.8945, 5151.6177, 5333.6167, 5176.1245, 5383.5967, 5094.7163, 5240.9536, 5151.805, 5196.7334, 5251.5356]
2026-01-23 05:06:58,382 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:06:58,382 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (5230.76) for latency DatasetOffice
2026-01-23 05:06:58,389 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 8 minutes, 50 seconds)
2026-01-23 05:10:31,773 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:10:45,299 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4399.88623 ± 1292.600
2026-01-23 05:10:45,299 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2599.6677, 5357.536, 5014.5986, 5218.4395, 3323.842, 4953.326, 5232.4023, 1641.0933, 5391.205, 5266.7534]
2026-01-23 05:10:45,299 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:10:45,307 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 5 minutes, 5 seconds)
2026-01-23 05:14:17,946 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:14:31,368 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5244.51465 ± 139.968
2026-01-23 05:14:31,368 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5215.3574, 5154.3247, 5297.985, 4892.8975, 5296.749, 5292.708, 5383.913, 5297.856, 5419.321, 5194.037]
2026-01-23 05:14:31,368 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:14:31,368 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (5244.51) for latency DatasetOffice
2026-01-23 05:14:31,378 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 53/100 (estimated time remaining: 3 hours, 1 minute, 11 seconds)
2026-01-23 05:18:04,896 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:18:18,431 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5113.79199 ± 143.336
2026-01-23 05:18:18,431 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5116.7764, 5152.6816, 5262.925, 5092.514, 5316.725, 5264.7446, 5165.8438, 4875.34, 4965.688, 4924.678]
2026-01-23 05:18:18,432 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:18:18,439 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 57 minutes, 33 seconds)
2026-01-23 05:21:51,324 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:22:04,760 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5046.36182 ± 489.503
2026-01-23 05:22:04,760 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3631.0386, 5116.101, 5360.283, 4978.695, 5270.8833, 5270.7363, 5128.481, 5428.956, 5222.5195, 5055.925]
2026-01-23 05:22:04,760 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:22:04,767 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 53 minutes, 45 seconds)
2026-01-23 05:25:37,795 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:25:51,052 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5196.48535 ± 117.604
2026-01-23 05:25:51,052 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5021.6724, 5252.5557, 5170.6177, 5032.4277, 5364.008, 5231.3677, 5362.394, 5065.659, 5215.281, 5248.868]
2026-01-23 05:25:51,052 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:25:51,060 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 49 minutes, 54 seconds)
2026-01-23 05:29:23,960 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:29:37,383 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4879.98535 ± 655.766
2026-01-23 05:29:37,384 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5136.7725, 5347.473, 5222.041, 5154.8765, 3657.3706, 5227.4585, 5123.241, 3498.3738, 5298.9146, 5133.333]
2026-01-23 05:29:37,384 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:29:37,391 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 46 minutes, 2 seconds)
2026-01-23 05:33:11,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:33:24,556 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5216.21631 ± 101.447
2026-01-23 05:33:24,556 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5198.7607, 5280.1553, 5217.7285, 5102.074, 5023.6973, 5383.7397, 5155.141, 5205.8374, 5338.178, 5256.85]
2026-01-23 05:33:24,556 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:33:24,563 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 42 minutes, 25 seconds)
2026-01-23 05:36:57,480 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:37:11,014 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5200.23340 ± 169.049
2026-01-23 05:37:11,014 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5157.1396, 5388.424, 5239.687, 5032.39, 5154.15, 5389.463, 4927.1196, 5324.5083, 5407.7373, 4981.714]
2026-01-23 05:37:11,014 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:37:11,023 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 38 minutes, 33 seconds)
2026-01-23 05:40:43,052 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:40:56,579 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5029.29688 ± 520.901
2026-01-23 05:40:56,579 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3585.193, 4791.7075, 5309.505, 5137.1523, 5153.0273, 5396.409, 5379.0615, 5441.3965, 5188.074, 4911.4424]
2026-01-23 05:40:56,579 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:40:56,587 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 34 minutes, 40 seconds)
2026-01-23 05:44:28,853 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:44:42,386 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5214.01953 ± 156.948
2026-01-23 05:44:42,386 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5258.621, 5216.705, 5194.658, 5232.158, 5073.7324, 5465.043, 5264.718, 5224.9062, 4845.6006, 5364.056]
2026-01-23 05:44:42,387 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:44:42,394 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 30 minutes, 50 seconds)
2026-01-23 05:48:15,653 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:48:29,182 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4840.89160 ± 1103.171
2026-01-23 05:48:29,182 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4873.583, 5350.495, 5345.1855, 5384.7026, 5054.9585, 5126.2866, 5408.2207, 1596.0469, 5457.3843, 4812.055]
2026-01-23 05:48:29,182 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:48:29,189 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 27 minutes, 8 seconds)
2026-01-23 05:52:02,175 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:52:15,699 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5254.15234 ± 197.868
2026-01-23 05:52:15,699 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5263.2983, 5432.171, 5282.5127, 4747.009, 5303.5337, 5172.3184, 5504.4854, 5153.966, 5389.9956, 5292.2363]
2026-01-23 05:52:15,699 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:52:15,699 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (5254.15) for latency DatasetOffice
2026-01-23 05:52:15,710 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 23 minutes, 16 seconds)
2026-01-23 05:55:49,447 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:56:02,974 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5187.05420 ± 137.777
2026-01-23 05:56:02,975 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5356.3496, 5276.813, 5232.776, 4962.7334, 5320.185, 5128.2856, 5128.1, 5073.8223, 5373.322, 5018.1587]
2026-01-23 05:56:02,975 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:56:02,982 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 19 minutes, 36 seconds)
2026-01-23 05:59:36,038 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:59:49,271 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5059.91943 ± 525.348
2026-01-23 05:59:49,271 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3582.9495, 5226.708, 5382.2847, 5157.6294, 5336.2534, 5438.2515, 5270.253, 5288.021, 4733.9126, 5182.933]
2026-01-23 05:59:49,271 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:59:49,280 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 15 minutes, 55 seconds)
2026-01-23 06:03:22,039 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:03:35,547 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5266.48096 ± 121.936
2026-01-23 06:03:35,547 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5045.922, 5148.272, 5195.2466, 5359.119, 5277.1475, 5366.972, 5435.5244, 5342.26, 5130.7017, 5363.645]
2026-01-23 06:03:35,547 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:03:35,547 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (5266.48) for latency DatasetOffice
2026-01-23 06:03:35,555 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 12 minutes, 12 seconds)
2026-01-23 06:07:08,777 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:07:22,182 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4923.84131 ± 508.958
2026-01-23 06:07:22,182 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4840.466, 5220.5386, 5101.23, 5042.061, 4955.475, 5055.9546, 5299.714, 3442.7668, 5100.722, 5179.4893]
2026-01-23 06:07:22,182 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:07:22,190 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 8 minutes, 24 seconds)
2026-01-23 06:10:55,350 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:11:08,869 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5185.45020 ± 190.734
2026-01-23 06:11:08,869 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5194.0938, 5238.0757, 5119.6377, 4708.287, 5351.7183, 5379.646, 5274.4136, 5112.299, 5088.579, 5387.753]
2026-01-23 06:11:08,869 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:11:08,879 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 4 minutes, 38 seconds)
2026-01-23 06:14:41,595 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:14:54,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5185.08691 ± 160.254
2026-01-23 06:14:54,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5275.43, 5336.332, 5034.4116, 5053.2188, 5061.595, 5357.2905, 5126.7896, 5408.29, 5289.3525, 4908.156]
2026-01-23 06:14:54,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:14:54,985 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 44 seconds)
2026-01-23 06:18:28,459 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:18:41,817 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5003.49365 ± 525.242
2026-01-23 06:18:41,817 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3464.128, 5373.3784, 5351.2607, 5171.1562, 5159.8066, 5217.063, 5065.7417, 5170.1235, 5036.5347, 5025.7476]
2026-01-23 06:18:41,817 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:18:41,828 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 57 minutes, 1 second)
2026-01-23 06:22:15,105 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:22:28,578 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5335.23535 ± 96.568
2026-01-23 06:22:28,578 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5286.9067, 5191.0693, 5403.7344, 5292.752, 5453.942, 5298.569, 5426.8354, 5479.5376, 5320.818, 5198.193]
2026-01-23 06:22:28,578 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:22:28,579 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (5335.24) for latency DatasetOffice
2026-01-23 06:22:28,589 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 53 minutes, 18 seconds)
2026-01-23 06:26:01,421 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:26:14,755 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4916.16699 ± 571.134
2026-01-23 06:26:14,755 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4995.907, 5173.9995, 5078.3013, 5188.189, 5182.9883, 5047.337, 5176.205, 3223.063, 4922.289, 5173.392]
2026-01-23 06:26:14,755 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:26:14,765 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 49 minutes, 28 seconds)
2026-01-23 06:29:48,605 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:30:02,015 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5001.98535 ± 651.251
2026-01-23 06:30:02,015 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5068.8013, 5309.836, 3094.949, 5080.1953, 5321.89, 5307.0674, 5195.4907, 4930.1353, 5429.318, 5282.1704]
2026-01-23 06:30:02,015 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:30:02,023 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 45 minutes, 45 seconds)
2026-01-23 06:33:34,742 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:33:47,983 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5251.95361 ± 104.356
2026-01-23 06:33:47,983 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5242.8525, 5354.8223, 5313.901, 5104.8516, 5324.6606, 5199.4224, 5243.865, 5299.456, 5391.155, 5044.552]
2026-01-23 06:33:47,983 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:33:47,992 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 41 minutes, 58 seconds)
2026-01-23 06:37:21,582 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:37:34,715 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4878.47559 ± 508.717
2026-01-23 06:37:34,715 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3395.6873, 5155.8203, 5081.4756, 4739.732, 5011.2207, 5135.71, 4937.068, 5075.5083, 5087.2344, 5165.2954]
2026-01-23 06:37:34,715 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:37:34,726 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 38 minutes, 11 seconds)
2026-01-23 06:41:07,920 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:41:21,326 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5278.67578 ± 113.046
2026-01-23 06:41:21,326 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5272.855, 5340.1504, 5219.1333, 5159.674, 5455.087, 5228.1426, 5440.179, 5355.8193, 5233.264, 5082.4546]
2026-01-23 06:41:21,326 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:41:21,334 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 34 minutes, 23 seconds)
2026-01-23 06:44:54,805 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:45:08,185 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5151.24219 ± 611.580
2026-01-23 06:45:08,185 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5016.8374, 5395.644, 5535.134, 5265.461, 5224.033, 5369.718, 5489.097, 3368.0706, 5368.6196, 5479.8086]
2026-01-23 06:45:08,185 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:45:08,194 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 30 minutes, 40 seconds)
2026-01-23 06:48:41,236 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:48:54,690 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5074.27051 ± 117.913
2026-01-23 06:48:54,690 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4868.0205, 4988.2734, 5157.214, 4993.077, 5171.5693, 4953.4346, 5111.149, 5097.343, 5294.552, 5108.0723]
2026-01-23 06:48:54,690 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:48:54,701 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 26 minutes, 50 seconds)
2026-01-23 06:52:27,842 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:52:41,069 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5234.41309 ± 190.363
2026-01-23 06:52:41,069 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5451.442, 5316.1553, 5291.1655, 5120.9775, 5316.1484, 5433.6553, 5079.805, 5191.4805, 5360.963, 4782.3413]
2026-01-23 06:52:41,069 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:52:41,080 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 23 minutes, 5 seconds)
2026-01-23 06:56:13,622 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:56:27,094 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5018.32520 ± 559.175
2026-01-23 06:56:27,094 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3426.3828, 5307.9443, 5205.4233, 5157.242, 5169.387, 5300.8335, 4705.978, 5281.7764, 5244.5327, 5383.7534]
2026-01-23 06:56:27,094 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:56:27,103 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 19 minutes, 15 seconds)
2026-01-23 07:00:00,060 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:00:13,263 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5219.47119 ± 163.183
2026-01-23 07:00:13,263 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5188.507, 5372.3623, 5428.796, 5320.43, 5227.288, 5295.65, 5311.4697, 5199.871, 4924.1553, 4926.1816]
2026-01-23 07:00:13,263 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:00:13,272 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 15 minutes, 27 seconds)
2026-01-23 07:03:46,323 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:03:59,533 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4961.06152 ± 526.985
2026-01-23 07:03:59,534 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4820.229, 5141.42, 5125.2905, 5182.5537, 4951.9565, 5118.1504, 5255.4185, 3437.4985, 5267.3833, 5310.7163]
2026-01-23 07:03:59,534 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:03:59,542 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 11 minutes, 39 seconds)
2026-01-23 07:07:32,671 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:07:46,040 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5154.15186 ± 171.121
2026-01-23 07:07:46,041 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5154.617, 5220.3193, 5126.819, 4804.896, 5114.223, 5403.7695, 5225.219, 5196.7915, 5365.214, 4929.65]
2026-01-23 07:07:46,041 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:07:46,052 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 7 minutes, 52 seconds)
2026-01-23 07:11:18,742 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:11:32,214 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5060.94678 ± 184.975
2026-01-23 07:11:32,214 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5220.7104, 5113.4316, 5133.1665, 4704.113, 5229.7915, 5212.3413, 4899.8765, 5224.918, 5086.6016, 4784.5166]
2026-01-23 07:11:32,214 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:11:32,223 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 4 minutes, 5 seconds)
2026-01-23 07:15:04,447 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:15:17,810 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5239.59131 ± 531.838
2026-01-23 07:15:17,810 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3674.6223, 5517.689, 5363.2407, 5200.744, 5440.6353, 5456.429, 5548.585, 5283.323, 5394.722, 5515.923]
2026-01-23 07:15:17,810 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:15:17,821 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 18 seconds)
2026-01-23 07:18:50,806 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:19:04,019 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5309.76123 ± 53.248
2026-01-23 07:19:04,020 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5351.7505, 5290.593, 5381.022, 5368.305, 5294.0522, 5245.7036, 5331.82, 5236.2246, 5238.8853, 5359.255]
2026-01-23 07:19:04,020 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:19:04,029 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 86/100 (estimated time remaining: 56 minutes, 32 seconds)
2026-01-23 07:22:37,295 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:22:50,759 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5042.15527 ± 575.376
2026-01-23 07:22:50,759 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5124.6606, 5323.7866, 5355.2837, 5280.522, 4531.5635, 5247.824, 5428.6875, 3480.239, 5419.7793, 5229.1987]
2026-01-23 07:22:50,759 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:22:50,774 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 87/100 (estimated time remaining: 52 minutes, 47 seconds)
2026-01-23 07:26:23,872 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:26:37,355 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5219.70410 ± 130.716
2026-01-23 07:26:37,355 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5425.2183, 5144.506, 5257.95, 4997.3716, 5175.7534, 5110.163, 5240.1357, 5108.4346, 5390.2134, 5347.296]
2026-01-23 07:26:37,355 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:26:37,364 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 88/100 (estimated time remaining: 49 minutes, 1 second)
2026-01-23 07:30:10,188 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:30:23,635 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5173.92480 ± 220.570
2026-01-23 07:30:23,635 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5365.7207, 5135.047, 5284.746, 4662.5293, 5391.2666, 5449.35, 5086.8086, 5267.4375, 5089.1025, 5007.2417]
2026-01-23 07:30:23,635 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:30:23,645 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 89/100 (estimated time remaining: 45 minutes, 15 seconds)
2026-01-23 07:33:57,100 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:34:10,555 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5047.30566 ± 545.256
2026-01-23 07:34:10,555 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3427.0867, 5311.99, 5242.822, 5234.309, 5161.284, 5076.1514, 5240.3447, 5347.587, 5157.958, 5273.525]
2026-01-23 07:34:10,555 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:34:10,564 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 90/100 (estimated time remaining: 41 minutes, 32 seconds)
2026-01-23 07:37:42,692 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:37:55,883 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5270.47266 ± 115.556
2026-01-23 07:37:55,883 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5413.948, 5142.946, 5209.0366, 5269.241, 5263.9854, 5042.9985, 5319.057, 5461.4585, 5264.1885, 5317.8677]
2026-01-23 07:37:55,883 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:37:55,893 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 91/100 (estimated time remaining: 37 minutes, 43 seconds)
2026-01-23 07:41:28,496 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:41:41,877 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4502.52441 ± 1026.235
2026-01-23 07:41:41,877 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3146.9978, 2759.2783, 5415.638, 5357.9224, 3685.0188, 5288.587, 5238.882, 3520.642, 5359.2095, 5253.0684]
2026-01-23 07:41:41,877 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:41:41,888 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 92/100 (estimated time remaining: 33 minutes, 56 seconds)
2026-01-23 07:45:14,910 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:45:28,360 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5202.62207 ± 200.675
2026-01-23 07:45:28,360 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5372.1865, 5316.4062, 5331.2456, 4774.5747, 4974.1226, 5263.0327, 5412.4146, 5200.2637, 5016.22, 5365.753]
2026-01-23 07:45:28,360 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:45:28,372 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 93/100 (estimated time remaining: 30 minutes, 9 seconds)
2026-01-23 07:49:00,970 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:49:14,339 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5127.32324 ± 169.492
2026-01-23 07:49:14,339 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5362.874, 5054.2324, 5296.366, 4842.8296, 5147.4854, 5291.3, 4874.602, 5224.616, 5171.4443, 5007.484]
2026-01-23 07:49:14,339 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:49:14,352 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 94/100 (estimated time remaining: 26 minutes, 22 seconds)
2026-01-23 07:52:46,593 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:52:59,889 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5030.12402 ± 525.529
2026-01-23 07:52:59,890 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3494.8127, 5271.6655, 5317.6177, 4997.4604, 5337.648, 5254.51, 5238.3823, 5012.6787, 5078.7573, 5297.7114]
2026-01-23 07:52:59,890 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:52:59,900 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 95/100 (estimated time remaining: 22 minutes, 35 seconds)
2026-01-23 07:56:31,212 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:56:44,467 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5340.96436 ± 124.733
2026-01-23 07:56:44,467 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5350.6523, 5486.229, 5307.0776, 5214.905, 5377.304, 5081.8105, 5389.251, 5469.5493, 5246.634, 5486.229]
2026-01-23 07:56:44,467 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:56:44,467 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (5340.96) for latency DatasetOffice
2026-01-23 07:56:44,477 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 96/100 (estimated time remaining: 18 minutes, 48 seconds)
2026-01-23 08:00:15,491 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:00:28,818 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5124.37402 ± 574.287
2026-01-23 08:00:28,818 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5087.973, 5426.547, 5234.3076, 5507.0312, 5137.1255, 5232.6914, 5498.793, 3454.7634, 5434.231, 5230.2754]
2026-01-23 08:00:28,818 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:00:28,828 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 97/100 (estimated time remaining: 15 minutes, 1 second)
2026-01-23 08:04:00,868 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:04:14,078 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5189.43848 ± 153.059
2026-01-23 08:04:14,078 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5278.883, 5144.9287, 5252.806, 4780.6973, 5239.2905, 5323.449, 5281.0723, 5082.9595, 5301.997, 5208.304]
2026-01-23 08:04:14,078 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:04:14,089 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 98/100 (estimated time remaining: 11 minutes, 15 seconds)
2026-01-23 08:07:46,035 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:07:59,367 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5332.59375 ± 163.124
2026-01-23 08:07:59,367 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5386.8984, 5269.134, 5418.02, 4992.7964, 5553.854, 5376.3105, 5306.826, 5497.0415, 5417.395, 5107.6626]
2026-01-23 08:07:59,367 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:07:59,378 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 99/100 (estimated time remaining: 7 minutes, 30 seconds)
2026-01-23 08:11:29,644 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:11:42,145 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5145.04199 ± 609.334
2026-01-23 08:11:42,145 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3330.1108, 5263.078, 5493.562, 5252.002, 5380.4224, 5380.2915, 5376.4175, 5240.087, 5363.751, 5370.694]
2026-01-23 08:11:42,145 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:11:42,156 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 44 seconds)
2026-01-23 08:15:09,575 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:15:21,943 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5308.86621 ± 115.626
2026-01-23 08:15:21,943 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5344.5537, 5344.6016, 5330.5244, 5450.03, 5207.473, 5335.993, 5119.515, 5425.348, 5112.809, 5417.8145]
2026-01-23 08:15:21,943 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:15:21,958 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1299 [DEBUG]: Training session finished
