2026-01-23 01:55:39,274 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-halfcheetah/DatasetOffice-bpql-mda-mem1  
2026-01-23 01:55:39,274 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-halfcheetah/DatasetOffice-bpql-mda-mem1  
2026-01-23 01:55:39,275 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x147673f5f110>}
2026-01-23 01:55:39,275 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1159 [DEBUG]: using device: cuda
2026-01-23 01:55:39,417 baseline-bpql-mda-noisy-halfcheetah:91 [WARNING]: args.assumed_delay != args.horizon: 1 != 32
2026-01-23 01:55:39,417 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1181 [INFO]: Creating new trainer
2026-01-23 01:55:39,434 baseline-bpql-mda-noisy-halfcheetah:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2026-01-23 01:55:39,434 baseline-bpql-mda-noisy-halfcheetah:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-23 01:55:39,440 baseline-bpql-mda-noisy-halfcheetah:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(6, 384, batch_first=True)
)
2026-01-23 01:55:40,434 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1242 [DEBUG]: Starting training session...
2026-01-23 01:55:40,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 1/100
2026-01-23 01:59:09,821 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:59:23,406 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: -373.59723 ± 9.147
2026-01-23 01:59:23,406 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [-385.69934, -362.87125, -365.55582, -366.47818, -380.32788, -385.0784, -372.83423, -384.06583, -372.55502, -360.50653]
2026-01-23 01:59:23,406 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:59:23,406 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (-373.60) for latency DatasetOffice
2026-01-23 01:59:23,410 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 2/100 (estimated time remaining: 6 hours, 7 minutes, 54 seconds)
2026-01-23 02:02:58,274 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:03:11,682 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: -30.48671 ± 71.163
2026-01-23 02:03:11,683 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [35.634026, 58.32147, -56.6044, 58.25313, 46.183453, -120.543594, -73.61018, -120.629684, -25.077993, -106.79331]
2026-01-23 02:03:11,683 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:03:11,683 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (-30.49) for latency DatasetOffice
2026-01-23 02:03:11,687 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 3/100 (estimated time remaining: 6 hours, 8 minutes, 31 seconds)
2026-01-23 02:06:46,498 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:06:59,628 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 339.34802 ± 288.051
2026-01-23 02:06:59,628 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [454.44974, 363.8689, 348.85782, 561.1455, 567.21295, 136.4712, 902.73, -63.070747, 200.76897, -78.95419]
2026-01-23 02:06:59,628 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:06:59,628 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (339.35) for latency DatasetOffice
2026-01-23 02:06:59,632 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 4/100 (estimated time remaining: 6 hours, 6 minutes)
2026-01-23 02:10:34,253 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:10:47,397 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1975.77478 ± 115.600
2026-01-23 02:10:47,397 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [1918.1063, 1989.9116, 1921.6826, 2030.5223, 2106.659, 1735.0884, 1885.2731, 2146.7314, 2079.337, 1944.4352]
2026-01-23 02:10:47,397 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:10:47,397 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (1975.77) for latency DatasetOffice
2026-01-23 02:10:47,401 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 5/100 (estimated time remaining: 6 hours, 2 minutes, 47 seconds)
2026-01-23 02:14:22,042 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:14:35,264 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2521.31396 ± 693.188
2026-01-23 02:14:35,264 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2824.143, 2787.6863, 2558.5562, 2578.1694, 2954.2517, 492.6943, 2561.0422, 2619.9846, 2983.732, 2852.8794]
2026-01-23 02:14:35,264 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:14:35,264 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (2521.31) for latency DatasetOffice
2026-01-23 02:14:35,267 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 6/100 (estimated time remaining: 5 hours, 59 minutes, 21 seconds)
2026-01-23 02:18:09,803 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:18:23,065 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3187.64917 ± 338.429
2026-01-23 02:18:23,065 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3293.6946, 3077.4956, 3048.6565, 3347.357, 3473.6238, 3607.1555, 2327.2578, 3432.1804, 3217.1653, 3051.9045]
2026-01-23 02:18:23,065 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:18:23,065 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (3187.65) for latency DatasetOffice
2026-01-23 02:18:23,069 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 7/100 (estimated time remaining: 5 hours, 57 minutes, 5 seconds)
2026-01-23 02:21:57,606 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:22:10,731 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3821.82812 ± 376.007
2026-01-23 02:22:10,731 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4225.533, 3364.6658, 3433.8105, 4475.2456, 3664.5388, 3530.9065, 4105.668, 3402.7148, 4129.482, 3885.7158]
2026-01-23 02:22:10,731 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:22:10,731 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (3821.83) for latency DatasetOffice
2026-01-23 02:22:10,735 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 8/100 (estimated time remaining: 5 hours, 53 minutes, 6 seconds)
2026-01-23 02:25:45,150 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:25:58,398 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3153.09717 ± 760.556
2026-01-23 02:25:58,398 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3550.6785, 3519.352, 3605.8362, 963.96783, 2850.7136, 3334.4653, 3359.6848, 3656.4082, 3322.0698, 3367.797]
2026-01-23 02:25:58,398 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:25:58,402 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 9/100 (estimated time remaining: 5 hours, 49 minutes, 13 seconds)
2026-01-23 02:29:32,011 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:29:45,003 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4499.33008 ± 137.298
2026-01-23 02:29:45,003 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4504.5073, 4626.3833, 4590.56, 4707.877, 4334.169, 4348.7017, 4496.32, 4364.7236, 4671.5825, 4348.477]
2026-01-23 02:29:45,003 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:29:45,003 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (4499.33) for latency DatasetOffice
2026-01-23 02:29:45,010 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 10/100 (estimated time remaining: 5 hours, 45 minutes, 4 seconds)
2026-01-23 02:33:17,334 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:33:30,451 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3931.34180 ± 1023.793
2026-01-23 02:33:30,451 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4579.411, 4204.974, 4062.3162, 4041.7063, 4345.327, 899.00146, 4292.6987, 4298.2773, 4470.0024, 4119.7017]
2026-01-23 02:33:30,451 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:33:30,456 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 11/100 (estimated time remaining: 5 hours, 40 minutes, 33 seconds)
2026-01-23 02:37:02,618 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:37:15,725 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3546.67456 ± 910.228
2026-01-23 02:37:15,725 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3932.4434, 2699.6165, 1087.9099, 3985.3875, 3933.0027, 4109.452, 3906.3682, 4089.215, 4076.9475, 3646.403]
2026-01-23 02:37:15,725 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:37:15,729 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 12/100 (estimated time remaining: 5 hours, 36 minutes, 1 second)
2026-01-23 02:40:48,033 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:41:00,957 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4660.46045 ± 182.293
2026-01-23 02:41:00,958 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4971.4766, 4499.576, 4699.1167, 4779.3247, 4337.452, 4650.678, 4750.1406, 4420.3354, 4813.5327, 4682.973]
2026-01-23 02:41:00,958 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:41:00,958 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (4660.46) for latency DatasetOffice
2026-01-23 02:41:00,964 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 13/100 (estimated time remaining: 5 hours, 31 minutes, 32 seconds)
2026-01-23 02:44:33,466 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:44:46,514 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4283.20215 ± 1210.364
2026-01-23 02:44:46,515 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5136.387, 4786.5347, 4772.774, 4960.0938, 4671.7534, 3873.8545, 775.971, 4625.432, 4528.9287, 4700.293]
2026-01-23 02:44:46,515 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:44:46,518 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 14/100 (estimated time remaining: 5 hours, 27 minutes, 9 seconds)
2026-01-23 02:48:18,787 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:48:31,893 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4959.60645 ± 110.758
2026-01-23 02:48:31,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5067.053, 5157.426, 5048.619, 5009.028, 4811.2505, 4959.4653, 4834.1245, 4996.7603, 4832.7603, 4879.575]
2026-01-23 02:48:31,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:48:31,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (4959.61) for latency DatasetOffice
2026-01-23 02:48:31,898 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 15/100 (estimated time remaining: 5 hours, 23 minutes, 2 seconds)
2026-01-23 02:52:04,324 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:52:17,375 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5129.08447 ± 691.983
2026-01-23 02:52:17,375 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5564.4365, 5297.566, 5142.0, 5299.8135, 5365.9272, 3092.2195, 5320.621, 5587.4336, 5419.328, 5201.502]
2026-01-23 02:52:17,375 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:52:17,375 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (5129.08) for latency DatasetOffice
2026-01-23 02:52:17,379 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 16/100 (estimated time remaining: 5 hours, 19 minutes, 17 seconds)
2026-01-23 02:55:49,688 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:56:02,813 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4788.02832 ± 1120.697
2026-01-23 02:56:02,814 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4941.359, 4748.51, 5099.3984, 5336.9316, 5356.7686, 5467.7817, 5277.2275, 1481.6162, 5119.2305, 5051.456]
2026-01-23 02:56:02,814 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:56:02,819 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 17/100 (estimated time remaining: 5 hours, 15 minutes, 35 seconds)
2026-01-23 02:59:35,116 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:59:48,130 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5595.73145 ± 349.689
2026-01-23 02:59:48,130 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5730.383, 5402.9204, 5775.1064, 6226.0283, 5020.562, 5326.354, 5856.7163, 5181.043, 5898.217, 5539.9863]
2026-01-23 02:59:48,130 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:59:48,130 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (5595.73) for latency DatasetOffice
2026-01-23 02:59:48,136 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 18/100 (estimated time remaining: 5 hours, 11 minutes, 51 seconds)
2026-01-23 03:03:20,551 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:03:33,685 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6115.90088 ± 254.856
2026-01-23 03:03:33,685 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6152.68, 6002.172, 6472.0195, 6134.8633, 6571.571, 5606.902, 6032.778, 5931.1226, 6124.9175, 6129.98]
2026-01-23 03:03:33,685 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:03:33,685 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (6115.90) for latency DatasetOffice
2026-01-23 03:03:33,690 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 19/100 (estimated time remaining: 5 hours, 8 minutes, 5 seconds)
2026-01-23 03:07:06,201 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:07:19,514 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6596.24512 ± 101.901
2026-01-23 03:07:19,514 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6514.101, 6693.1177, 6728.652, 6457.628, 6746.022, 6670.6816, 6482.228, 6500.951, 6602.2944, 6566.78]
2026-01-23 03:07:19,514 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:07:19,514 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (6596.25) for latency DatasetOffice
2026-01-23 03:07:19,518 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 20/100 (estimated time remaining: 5 hours, 4 minutes, 27 seconds)
2026-01-23 03:10:51,945 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:11:05,034 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6234.28711 ± 864.265
2026-01-23 03:11:05,034 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6770.7686, 6521.5903, 5924.197, 6394.904, 6918.0527, 3756.0728, 6324.0166, 6591.048, 6625.504, 6516.719]
2026-01-23 03:11:05,034 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:11:05,039 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 21/100 (estimated time remaining: 5 hours, 42 seconds)
2026-01-23 03:14:37,484 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:14:50,603 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6434.88672 ± 276.931
2026-01-23 03:14:50,604 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6469.058, 5717.264, 6585.4463, 6636.6445, 6650.6934, 6704.8354, 6521.1, 6267.1973, 6520.2007, 6276.425]
2026-01-23 03:14:50,604 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:14:50,608 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 22/100 (estimated time remaining: 4 hours, 56 minutes, 59 seconds)
2026-01-23 03:18:23,125 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:18:36,218 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6674.38965 ± 368.439
2026-01-23 03:18:36,218 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7002.535, 6280.456, 6611.347, 7058.207, 6062.7285, 6877.426, 7055.489, 6286.3047, 7085.7754, 6423.632]
2026-01-23 03:18:36,218 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:18:36,218 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (6674.39) for latency DatasetOffice
2026-01-23 03:18:36,222 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 23/100 (estimated time remaining: 4 hours, 53 minutes, 18 seconds)
2026-01-23 03:22:08,421 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:22:21,455 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6535.99707 ± 283.205
2026-01-23 03:22:21,455 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6817.4316, 6250.2617, 6714.0347, 6880.567, 6818.575, 6196.814, 6323.8696, 6825.8804, 6162.9663, 6369.57]
2026-01-23 03:22:21,455 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:22:21,459 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 49 minutes, 27 seconds)
2026-01-23 03:25:53,942 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:26:07,040 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7020.41797 ± 150.478
2026-01-23 03:26:07,040 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7267.006, 7047.7773, 6954.7437, 6753.6763, 7172.719, 6982.976, 6958.693, 7214.909, 6874.3867, 6977.293]
2026-01-23 03:26:07,040 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:26:07,040 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (7020.42) for latency DatasetOffice
2026-01-23 03:26:07,045 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 25/100 (estimated time remaining: 4 hours, 45 minutes, 38 seconds)
2026-01-23 03:29:39,338 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:29:52,322 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6588.57275 ± 938.371
2026-01-23 03:29:52,322 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7038.985, 6975.7236, 6404.9727, 6699.524, 7283.837, 3872.792, 6942.3413, 7127.604, 6951.9707, 6587.9746]
2026-01-23 03:29:52,322 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:29:52,329 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 41 minutes, 49 seconds)
2026-01-23 03:33:24,590 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:33:37,696 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6864.60059 ± 301.588
2026-01-23 03:33:37,696 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7029.3794, 6375.568, 6948.1196, 7039.0103, 7315.859, 7225.9434, 6975.1646, 6677.382, 6520.3047, 6539.2764]
2026-01-23 03:33:37,696 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:33:37,700 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 27/100 (estimated time remaining: 4 hours, 38 minutes)
2026-01-23 03:37:09,943 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:37:22,938 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6843.93457 ± 272.475
2026-01-23 03:37:22,938 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7002.604, 6502.0015, 6886.663, 7308.784, 6364.71, 7017.492, 6980.317, 6604.363, 7032.9556, 6739.4517]
2026-01-23 03:37:22,938 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:37:22,947 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 28/100 (estimated time remaining: 4 hours, 34 minutes, 10 seconds)
2026-01-23 03:40:55,311 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:41:08,376 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6740.64600 ± 207.233
2026-01-23 03:41:08,376 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6829.831, 6831.093, 6767.3477, 7023.7393, 6923.6196, 6390.233, 6881.193, 6586.87, 6379.9546, 6792.584]
2026-01-23 03:41:08,376 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:41:08,381 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 29/100 (estimated time remaining: 4 hours, 30 minutes, 27 seconds)
2026-01-23 03:44:40,773 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:44:53,866 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7161.02588 ± 183.721
2026-01-23 03:44:53,866 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7349.2334, 7432.0693, 7325.1763, 7244.755, 6816.5767, 7029.4688, 7263.0317, 7078.0684, 6964.675, 7107.2065]
2026-01-23 03:44:53,866 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:44:53,866 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (7161.03) for latency DatasetOffice
2026-01-23 03:44:53,872 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 26 minutes, 40 seconds)
2026-01-23 03:48:26,229 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:48:39,315 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6742.56934 ± 983.691
2026-01-23 03:48:39,315 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7334.159, 7071.8423, 6469.094, 7025.681, 7070.1367, 3877.3232, 7133.414, 7301.563, 7251.423, 6891.0566]
2026-01-23 03:48:39,315 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:48:39,322 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 22 minutes, 57 seconds)
2026-01-23 03:52:10,110 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:52:22,952 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6997.07959 ± 275.636
2026-01-23 03:52:22,953 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7117.5103, 6378.6475, 7208.4556, 7174.872, 7093.4326, 7288.192, 6979.277, 7089.557, 7066.5884, 6574.2656]
2026-01-23 03:52:22,953 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:52:22,960 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 32/100 (estimated time remaining: 4 hours, 18 minutes, 48 seconds)
2026-01-23 03:55:53,356 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:56:06,216 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7084.48145 ± 454.679
2026-01-23 03:56:06,216 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7440.3013, 6745.4116, 7454.3936, 7434.484, 6402.9775, 7097.6523, 7515.5757, 6257.205, 7546.5913, 6950.2217]
2026-01-23 03:56:06,216 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:56:06,221 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 14 minutes, 36 seconds)
2026-01-23 03:59:33,770 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:59:46,144 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7047.85791 ± 247.008
2026-01-23 03:59:46,144 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7310.082, 7148.9526, 7246.6763, 7382.052, 7182.1543, 6532.146, 6829.341, 6982.6924, 6839.4873, 7024.996]
2026-01-23 03:59:46,144 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:59:46,151 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 34/100 (estimated time remaining: 4 hours, 9 minutes, 38 seconds)
2026-01-23 04:03:12,999 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:03:25,607 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7370.70215 ± 113.059
2026-01-23 04:03:25,608 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7281.0767, 7394.1943, 7437.1284, 7334.4453, 7584.8994, 7311.9414, 7236.1377, 7203.49, 7489.1753, 7434.5415]
2026-01-23 04:03:25,608 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:03:25,608 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (7370.70) for latency DatasetOffice
2026-01-23 04:03:25,613 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 35/100 (estimated time remaining: 4 hours, 4 minutes, 34 seconds)
2026-01-23 04:06:52,311 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:07:04,741 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6905.52246 ± 956.263
2026-01-23 04:07:04,741 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7543.241, 7164.779, 6746.1616, 6830.0156, 7614.436, 4183.351, 7168.6245, 7461.337, 7493.2134, 6850.0674]
2026-01-23 04:07:04,741 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:07:04,749 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 59 minutes, 30 seconds)
2026-01-23 04:10:31,889 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:10:44,249 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7085.92969 ± 333.728
2026-01-23 04:10:44,249 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7119.1245, 6422.651, 7037.2754, 7582.606, 7184.2007, 7455.1855, 7288.954, 7233.0806, 6847.9497, 6688.2676]
2026-01-23 04:10:44,249 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:10:44,256 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 54 minutes, 56 seconds)
2026-01-23 04:14:11,374 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:14:23,798 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7138.97266 ± 384.285
2026-01-23 04:14:23,798 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7398.852, 6736.887, 6887.9097, 7610.993, 6792.7427, 7366.538, 7260.6, 6409.1816, 7590.7905, 7335.234]
2026-01-23 04:14:23,798 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:14:23,804 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 50 minutes, 29 seconds)
2026-01-23 04:17:50,880 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:18:03,328 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7174.42822 ± 362.433
2026-01-23 04:18:03,328 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7582.6865, 7455.0103, 7427.4263, 7679.741, 7178.5024, 6383.548, 6922.903, 7013.684, 7008.8833, 7091.898]
2026-01-23 04:18:03,328 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:18:03,334 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 46 minutes, 45 seconds)
2026-01-23 04:21:30,129 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:21:42,561 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7413.49121 ± 135.138
2026-01-23 04:21:42,561 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7350.4585, 7402.622, 7359.5366, 7379.1387, 7586.8486, 7196.173, 7429.051, 7287.6777, 7696.2534, 7447.1577]
2026-01-23 04:21:42,561 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:21:42,561 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (7413.49) for latency DatasetOffice
2026-01-23 04:21:42,568 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 43 minutes, 2 seconds)
2026-01-23 04:25:09,589 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:25:22,223 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7149.49756 ± 1015.078
2026-01-23 04:25:22,223 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7838.187, 7457.819, 6965.5693, 7310.2207, 7682.2646, 4208.3936, 7646.3413, 7596.9805, 7689.2476, 7099.953]
2026-01-23 04:25:22,223 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:25:22,231 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 39 minutes, 29 seconds)
2026-01-23 04:28:49,243 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:29:01,611 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7215.63525 ± 412.187
2026-01-23 04:29:01,611 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7328.9736, 6302.327, 7151.101, 7289.652, 7277.094, 7737.1743, 7716.3784, 7273.6304, 7394.1187, 6685.9004]
2026-01-23 04:29:01,612 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:29:01,618 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 35 minutes, 48 seconds)
2026-01-23 04:32:28,778 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:32:41,461 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7339.47803 ± 342.974
2026-01-23 04:32:41,461 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7544.457, 6993.6177, 7523.9644, 7873.0073, 6717.3066, 7369.554, 7527.0825, 6897.207, 7617.1797, 7331.4067]
2026-01-23 04:32:41,461 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:32:41,469 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 32 minutes, 12 seconds)
2026-01-23 04:36:08,836 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:36:21,280 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7340.24463 ± 273.132
2026-01-23 04:36:21,281 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7318.3003, 7223.7036, 7809.2705, 7818.978, 7430.292, 6924.204, 7115.554, 7391.703, 7171.451, 7198.9873]
2026-01-23 04:36:21,281 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:36:21,288 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 28 minutes, 36 seconds)
2026-01-23 04:39:48,304 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:40:00,668 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7443.95312 ± 232.019
2026-01-23 04:40:00,668 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7575.236, 7500.2056, 7622.8477, 7767.2686, 7623.6567, 7365.249, 6975.5034, 7283.4155, 7568.7476, 7157.397]
2026-01-23 04:40:00,668 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:40:00,668 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (7443.95) for latency DatasetOffice
2026-01-23 04:40:00,674 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 24 minutes, 58 seconds)
2026-01-23 04:43:27,856 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:43:40,532 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6970.96387 ± 1008.436
2026-01-23 04:43:40,532 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7355.949, 7151.926, 6530.3403, 7326.9033, 7461.1274, 4084.754, 7588.458, 7538.6333, 7588.2007, 7083.349]
2026-01-23 04:43:40,532 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:43:40,539 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 21 minutes, 21 seconds)
2026-01-23 04:47:07,666 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:47:20,110 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7132.74316 ± 1208.468
2026-01-23 04:47:20,110 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7586.3047, 6666.2734, 7406.286, 7989.865, 7517.936, 7916.643, 7718.6553, 7437.4673, 7431.314, 3656.6833]
2026-01-23 04:47:20,110 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:47:20,118 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 17 minutes, 43 seconds)
2026-01-23 04:50:47,510 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:50:59,904 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7509.70410 ± 407.709
2026-01-23 04:50:59,904 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7651.47, 7048.8696, 7707.2783, 7777.547, 6761.128, 7972.419, 7792.514, 7126.3506, 8009.731, 7249.725]
2026-01-23 04:50:59,904 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:50:59,904 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (7509.70) for latency DatasetOffice
2026-01-23 04:50:59,911 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 14 minutes, 3 seconds)
2026-01-23 04:54:27,061 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:54:39,552 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7416.57666 ± 269.376
2026-01-23 04:54:39,552 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7507.039, 7267.0244, 7809.8813, 7988.224, 7395.7617, 7063.183, 7306.1245, 7235.6113, 7381.572, 7211.3447]
2026-01-23 04:54:39,553 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:54:39,561 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 10 minutes, 22 seconds)
2026-01-23 04:58:06,842 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:58:19,323 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7867.07910 ± 191.664
2026-01-23 04:58:19,323 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7893.2827, 7740.125, 7962.705, 8218.712, 7970.0786, 7654.2227, 7647.4775, 7587.2935, 7978.177, 8018.7095]
2026-01-23 04:58:19,323 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:58:19,323 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (7867.08) for latency DatasetOffice
2026-01-23 04:58:19,330 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 6 minutes, 46 seconds)
2026-01-23 05:01:46,500 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:01:59,193 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7024.80762 ± 1064.297
2026-01-23 05:01:59,193 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7537.888, 7139.592, 6796.555, 7241.0625, 7669.423, 3980.7126, 7292.7197, 7775.0923, 7821.2236, 6993.806]
2026-01-23 05:01:59,193 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:01:59,199 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 3 minutes, 6 seconds)
2026-01-23 05:05:26,614 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:05:39,100 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7421.14453 ± 299.845
2026-01-23 05:05:39,100 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7621.941, 6902.5864, 7264.4946, 7645.8906, 7508.943, 7856.7305, 7548.8184, 7638.4746, 7288.432, 6935.1333]
2026-01-23 05:05:39,100 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:05:39,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 59 minutes, 30 seconds)
2026-01-23 05:09:06,409 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:09:19,085 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7661.35840 ± 343.666
2026-01-23 05:09:19,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7721.8555, 7031.0386, 7593.3286, 8190.869, 7430.7026, 7896.723, 7884.7466, 7169.1436, 7959.275, 7735.909]
2026-01-23 05:09:19,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:09:19,094 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 55 minutes, 52 seconds)
2026-01-23 05:12:46,624 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:12:59,004 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7622.71582 ± 349.692
2026-01-23 05:12:59,004 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7708.398, 7412.6963, 7931.192, 8128.8, 7953.5127, 6818.4785, 7687.55, 7398.242, 7532.2344, 7656.0586]
2026-01-23 05:12:59,005 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:12:59,012 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 52 minutes, 14 seconds)
2026-01-23 05:16:26,281 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:16:38,761 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7968.47021 ± 124.073
2026-01-23 05:16:38,761 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7937.3657, 8265.792, 7947.801, 7868.42, 7922.9736, 7808.177, 7929.4434, 7948.572, 7934.947, 8121.207]
2026-01-23 05:16:38,761 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:16:38,761 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (7968.47) for latency DatasetOffice
2026-01-23 05:16:38,769 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 48 minutes, 34 seconds)
2026-01-23 05:20:06,283 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:20:18,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7271.06885 ± 994.089
2026-01-23 05:20:18,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7729.6616, 7520.764, 7286.8115, 7684.4917, 7798.323, 4340.1094, 7614.4263, 7874.527, 7566.2583, 7295.3125]
2026-01-23 05:20:18,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:20:18,968 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 44 minutes, 57 seconds)
2026-01-23 05:23:46,226 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:23:58,614 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7720.73828 ± 344.109
2026-01-23 05:23:58,614 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7622.653, 7145.7974, 7541.578, 7940.621, 8005.8926, 8101.5864, 8046.88, 7840.6006, 7869.047, 7092.729]
2026-01-23 05:23:58,614 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:23:58,622 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 41 minutes, 15 seconds)
2026-01-23 05:27:26,020 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:27:38,490 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7657.49219 ± 461.428
2026-01-23 05:27:38,490 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8179.6147, 7460.4873, 7701.0737, 7987.005, 6807.5317, 7460.307, 8150.7183, 6936.755, 8038.0933, 7853.337]
2026-01-23 05:27:38,490 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:27:38,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 37 minutes, 34 seconds)
2026-01-23 05:31:05,083 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:31:17,397 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7077.92090 ± 2182.906
2026-01-23 05:31:17,397 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7776.3374, 7907.624, 8073.0566, 7898.5293, 7684.466, 539.239, 7642.804, 7766.6484, 7792.8076, 7697.7007]
2026-01-23 05:31:17,397 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:31:17,405 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 33 minutes, 46 seconds)
2026-01-23 05:34:43,351 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:34:55,694 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 8132.11035 ± 136.004
2026-01-23 05:34:55,694 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8037.6265, 8416.046, 8089.892, 7955.987, 8078.0264, 8137.6465, 7951.8584, 8274.674, 8200.258, 8179.0825]
2026-01-23 05:34:55,694 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:34:55,694 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (8132.11) for latency DatasetOffice
2026-01-23 05:34:55,704 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 29 minutes, 54 seconds)
2026-01-23 05:38:21,731 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:38:34,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7408.84375 ± 1025.648
2026-01-23 05:38:34,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8034.1196, 7596.35, 7307.308, 7442.1484, 8083.39, 4431.1094, 7857.135, 7900.6157, 7969.0894, 7467.169]
2026-01-23 05:38:34,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:38:34,095 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 26 minutes, 1 second)
2026-01-23 05:42:00,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:42:12,455 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7166.36328 ± 2104.091
2026-01-23 05:42:12,455 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7476.3975, 943.6021, 7705.451, 8079.767, 8086.7837, 8374.608, 8102.503, 7892.1616, 7944.9263, 7057.4365]
2026-01-23 05:42:12,455 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:42:12,464 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 22 minutes, 11 seconds)
2026-01-23 05:45:38,473 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:45:50,685 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7285.67969 ± 1308.226
2026-01-23 05:45:50,685 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8086.301, 7408.2446, 7842.56, 8056.071, 7127.708, 8040.9497, 8094.626, 7010.1836, 3531.9526, 7658.202]
2026-01-23 05:45:50,685 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:45:50,694 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 18 minutes, 20 seconds)
2026-01-23 05:49:16,804 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:49:29,157 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7823.40088 ± 266.573
2026-01-23 05:49:29,157 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7980.2495, 7643.351, 8223.815, 8202.743, 8004.653, 7357.181, 7794.9316, 7748.228, 7751.2095, 7527.6396]
2026-01-23 05:49:29,157 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:49:29,168 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 14 minutes, 39 seconds)
2026-01-23 05:52:54,403 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:53:06,539 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 8224.00879 ± 110.251
2026-01-23 05:53:06,539 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8332.125, 8458.6045, 8145.5415, 8198.3955, 8126.1885, 8147.0854, 8112.288, 8197.268, 8170.322, 8352.266]
2026-01-23 05:53:06,539 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:53:06,539 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (8224.01) for latency DatasetOffice
2026-01-23 05:53:06,549 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 10 minutes, 54 seconds)
2026-01-23 05:56:30,855 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:56:43,200 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7366.89160 ± 1382.925
2026-01-23 05:56:43,201 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8197.335, 7756.899, 4884.485, 7909.8457, 8275.389, 4462.9893, 8004.7026, 8560.069, 8225.274, 7391.929]
2026-01-23 05:56:43,201 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:56:43,213 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 7 minutes, 3 seconds)
2026-01-23 06:00:06,262 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:00:18,292 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7933.92432 ± 355.235
2026-01-23 06:00:18,292 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8075.5044, 7254.6265, 7601.358, 8266.533, 8137.5503, 8380.781, 8276.166, 8010.57, 7831.939, 7504.214]
2026-01-23 06:00:18,292 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:00:18,301 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 3 minutes, 3 seconds)
2026-01-23 06:03:41,144 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:03:53,094 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7830.03027 ± 712.025
2026-01-23 06:03:53,094 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8248.023, 7634.1763, 8259.8125, 8193.956, 7369.8193, 8190.485, 8247.091, 5878.3735, 8248.055, 8030.5093]
2026-01-23 06:03:53,094 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:03:53,102 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 59 minutes, 3 seconds)
2026-01-23 06:07:15,732 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:07:27,634 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7803.52881 ± 312.599
2026-01-23 06:07:27,634 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7891.218, 7906.998, 7953.2163, 8481.583, 7880.031, 7266.817, 7789.637, 7688.0806, 7392.613, 7785.096]
2026-01-23 06:07:27,634 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:07:27,641 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 55 minutes, 2 seconds)
2026-01-23 06:10:49,982 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:11:01,955 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 8330.23438 ± 156.645
2026-01-23 06:11:01,955 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8315.187, 8377.889, 8354.465, 8419.9, 8643.315, 8253.567, 8224.419, 8024.383, 8232.363, 8456.861]
2026-01-23 06:11:01,955 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:11:01,955 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (8330.23) for latency DatasetOffice
2026-01-23 06:11:01,964 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 51 minutes, 7 seconds)
2026-01-23 06:14:24,457 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:14:36,330 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7712.60303 ± 1146.817
2026-01-23 06:14:36,331 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8391.803, 7824.8853, 7737.1855, 7861.612, 8326.986, 4339.2344, 8193.369, 8334.817, 8201.426, 7914.7153]
2026-01-23 06:14:36,331 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:14:36,342 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 47 minutes, 18 seconds)
2026-01-23 06:17:58,790 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:18:10,782 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7838.03369 ± 388.708
2026-01-23 06:18:10,782 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7977.145, 7181.1807, 7426.522, 8309.956, 8012.5635, 8425.868, 7967.2407, 7915.723, 7845.0376, 7319.1016]
2026-01-23 06:18:10,782 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:18:10,790 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 43 minutes, 40 seconds)
2026-01-23 06:21:33,267 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:21:45,442 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 8065.20166 ± 445.608
2026-01-23 06:21:45,442 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8438.592, 7712.091, 8391.954, 8335.613, 7246.2134, 8406.693, 8292.226, 7344.9, 8488.356, 7995.379]
2026-01-23 06:21:45,442 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:21:45,452 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 40 minutes, 5 seconds)
2026-01-23 06:25:07,759 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:25:19,730 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 8024.94385 ± 329.909
2026-01-23 06:25:19,730 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8373.134, 7868.321, 8464.262, 8379.479, 8196.675, 7452.3403, 8133.0967, 7565.7515, 7816.6733, 7999.7036]
2026-01-23 06:25:19,730 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:25:19,738 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 36 minutes, 29 seconds)
2026-01-23 06:28:42,289 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:28:54,447 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 8180.58105 ± 155.432
2026-01-23 06:28:54,447 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8097.0923, 8311.139, 8247.061, 8165.5786, 8269.624, 7916.231, 8028.4673, 8005.904, 8377.59, 8387.121]
2026-01-23 06:28:54,447 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:28:54,462 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 32 minutes, 56 seconds)
2026-01-23 06:32:16,920 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:32:28,785 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7518.07666 ± 1987.329
2026-01-23 06:32:28,785 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8613.41, 7871.773, 7482.268, 7997.106, 8226.588, 1634.5321, 8240.363, 8522.378, 8496.038, 8096.306]
2026-01-23 06:32:28,785 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:32:28,794 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 29 minutes, 22 seconds)
2026-01-23 06:35:51,321 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:36:03,480 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7941.23145 ± 361.207
2026-01-23 06:36:03,480 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7861.8022, 7370.2275, 7928.5, 8130.3687, 8158.9976, 8679.764, 8276.255, 7592.896, 7703.315, 7710.19]
2026-01-23 06:36:03,480 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:36:03,488 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 25 minutes, 48 seconds)
2026-01-23 06:39:25,605 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:39:37,603 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 8120.57959 ± 371.805
2026-01-23 06:39:37,603 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8181.009, 7769.4795, 8347.101, 8591.993, 7903.592, 8230.658, 8494.708, 7326.9473, 8450.157, 7910.1523]
2026-01-23 06:39:37,603 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:39:37,611 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 22 minutes, 11 seconds)
2026-01-23 06:42:59,962 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:43:12,115 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7975.32031 ± 323.987
2026-01-23 06:43:12,116 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8191.857, 7826.884, 8352.96, 8167.2183, 8305.702, 7156.577, 7979.549, 7934.774, 8001.851, 7835.835]
2026-01-23 06:43:12,116 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:43:12,125 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 18 minutes, 38 seconds)
2026-01-23 06:46:34,515 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:46:46,688 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 8423.70410 ± 166.808
2026-01-23 06:46:46,689 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8524.895, 8294.194, 8543.61, 8457.856, 8445.636, 8431.297, 8031.804, 8673.997, 8513.899, 8319.85]
2026-01-23 06:46:46,689 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:46:46,689 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (8423.70) for latency DatasetOffice
2026-01-23 06:46:46,697 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 15 minutes, 3 seconds)
2026-01-23 06:50:09,027 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:50:21,176 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7862.02490 ± 1148.335
2026-01-23 06:50:21,176 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8591.923, 8243.11, 7510.176, 8177.63, 8582.171, 4565.652, 8305.821, 8299.014, 8572.453, 7772.295]
2026-01-23 06:50:21,176 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:50:21,184 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 11 minutes, 29 seconds)
2026-01-23 06:53:43,492 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:53:55,449 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 8005.77637 ± 371.209
2026-01-23 06:53:55,449 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7994.9062, 7187.827, 7970.2764, 8184.416, 8162.1943, 8469.238, 8533.323, 7832.843, 8069.2715, 7653.466]
2026-01-23 06:53:55,449 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:53:55,458 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 7 minutes, 53 seconds)
2026-01-23 06:57:17,796 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:57:29,616 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 8033.17188 ± 422.514
2026-01-23 06:57:29,616 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8355.912, 7588.3867, 8237.528, 8406.671, 7315.7637, 8083.637, 8364.468, 7512.815, 8635.817, 7830.717]
2026-01-23 06:57:29,617 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:57:29,625 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 4 minutes, 19 seconds)
2026-01-23 07:00:51,821 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:01:03,767 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 8043.81250 ± 333.978
2026-01-23 07:01:03,767 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8119.5547, 8006.104, 8470.058, 8121.798, 8437.91, 7193.921, 7953.7188, 7894.3726, 8138.989, 8101.6987]
2026-01-23 07:01:03,767 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:01:03,777 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 43 seconds)
2026-01-23 07:04:26,020 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:04:37,900 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 8351.11133 ± 145.070
2026-01-23 07:04:37,900 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8272.973, 8492.545, 8187.9595, 8415.102, 8468.911, 8128.859, 8341.423, 8306.361, 8634.63, 8262.35]
2026-01-23 07:04:37,900 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:04:37,910 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 85/100 (estimated time remaining: 57 minutes, 7 seconds)
2026-01-23 07:08:00,083 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:08:12,233 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7909.93848 ± 1185.408
2026-01-23 07:08:12,233 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8562.998, 8192.824, 7533.897, 8096.2314, 8712.693, 4515.864, 8207.264, 8575.176, 8728.534, 7973.904]
2026-01-23 07:08:12,233 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:08:12,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 86/100 (estimated time remaining: 53 minutes, 33 seconds)
2026-01-23 07:11:34,668 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:11:46,651 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 8107.64209 ± 401.997
2026-01-23 07:11:46,651 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8091.697, 7378.753, 7868.0786, 8434.176, 8303.166, 8538.695, 8687.461, 8119.7734, 8145.846, 7508.779]
2026-01-23 07:11:46,651 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:11:46,661 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 87/100 (estimated time remaining: 49 minutes, 59 seconds)
2026-01-23 07:15:08,984 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:15:20,941 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 8225.82324 ± 349.910
2026-01-23 07:15:20,941 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8482.523, 8141.581, 8308.614, 8604.326, 7927.331, 8270.92, 8457.048, 7494.908, 8682.55, 7888.4375]
2026-01-23 07:15:20,941 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:15:20,950 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 88/100 (estimated time remaining: 46 minutes, 25 seconds)
2026-01-23 07:18:43,244 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:18:55,125 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 8125.22266 ± 329.445
2026-01-23 07:18:55,125 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8152.8193, 8015.276, 8292.747, 8540.0205, 8434.462, 7352.019, 8102.923, 7991.7095, 7908.177, 8462.069]
2026-01-23 07:18:55,125 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:18:55,134 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 89/100 (estimated time remaining: 42 minutes, 51 seconds)
2026-01-23 07:22:17,409 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:22:29,285 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 8558.86621 ± 158.284
2026-01-23 07:22:29,285 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8541.805, 8714.565, 8578.294, 8667.397, 8612.933, 8465.994, 8275.028, 8517.681, 8847.6, 8367.366]
2026-01-23 07:22:29,285 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:22:29,285 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (8558.87) for latency DatasetOffice
2026-01-23 07:22:29,295 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 90/100 (estimated time remaining: 39 minutes, 17 seconds)
2026-01-23 07:25:51,674 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:26:03,678 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7811.72266 ± 1111.399
2026-01-23 07:26:03,679 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8440.561, 7820.643, 7731.322, 8164.4263, 8417.758, 4574.5815, 8233.694, 8310.604, 8552.577, 7871.0654]
2026-01-23 07:26:03,679 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:26:03,690 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 91/100 (estimated time remaining: 35 minutes, 42 seconds)
2026-01-23 07:29:26,074 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:29:38,249 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 8085.06738 ± 408.056
2026-01-23 07:29:38,249 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8035.4287, 7410.861, 8096.157, 8635.516, 7952.307, 8609.173, 8437.428, 8100.3433, 8180.5957, 7392.871]
2026-01-23 07:29:38,250 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:29:38,261 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 92/100 (estimated time remaining: 32 minutes, 8 seconds)
2026-01-23 07:33:00,444 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:33:12,344 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 8048.25928 ± 502.372
2026-01-23 07:33:12,344 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8524.709, 7650.3765, 8207.155, 8474.7, 7231.225, 8250.529, 8523.498, 7165.2036, 8504.126, 7951.074]
2026-01-23 07:33:12,344 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:33:12,353 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 93/100 (estimated time remaining: 28 minutes, 34 seconds)
2026-01-23 07:36:34,841 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:36:47,015 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7927.06543 ± 400.463
2026-01-23 07:36:47,015 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8213.6045, 7617.735, 8187.052, 8570.09, 8238.242, 7247.307, 8110.835, 7895.609, 7339.3877, 7850.799]
2026-01-23 07:36:47,015 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:36:47,025 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 94/100 (estimated time remaining: 25 minutes)
2026-01-23 07:40:09,365 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:40:21,364 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 8544.82129 ± 196.601
2026-01-23 07:40:21,364 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8268.933, 8643.575, 8611.96, 8853.529, 8506.823, 8619.695, 8412.783, 8385.111, 8299.7, 8846.099]
2026-01-23 07:40:21,364 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:40:21,376 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 95/100 (estimated time remaining: 21 minutes, 26 seconds)
2026-01-23 07:43:43,698 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:43:55,647 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7971.19678 ± 1209.357
2026-01-23 07:43:55,647 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8471.237, 8010.032, 8175.578, 8386.142, 8853.87, 4463.8525, 8518.073, 8260.267, 8785.43, 7787.484]
2026-01-23 07:43:55,648 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:43:55,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 96/100 (estimated time remaining: 17 minutes, 51 seconds)
2026-01-23 07:47:18,090 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:47:30,063 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 8219.28223 ± 401.050
2026-01-23 07:47:30,063 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8210.774, 7409.093, 7928.805, 8523.412, 8367.37, 8885.32, 8566.21, 8105.2646, 8369.287, 7827.28]
2026-01-23 07:47:30,063 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:47:30,074 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 97/100 (estimated time remaining: 14 minutes, 17 seconds)
2026-01-23 07:50:52,541 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:51:04,423 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 8125.70068 ± 329.284
2026-01-23 07:51:04,423 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8383.279, 7838.911, 8214.405, 8528.471, 7732.635, 8030.923, 8390.35, 7560.329, 8570.973, 8006.724]
2026-01-23 07:51:04,423 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:51:04,434 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 98/100 (estimated time remaining: 10 minutes, 43 seconds)
2026-01-23 07:54:26,632 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:54:38,788 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 8221.09473 ± 422.331
2026-01-23 07:54:38,788 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8609.739, 8502.087, 8280.957, 8671.527, 8805.474, 7479.0474, 7924.7627, 8336.458, 7798.474, 7802.411]
2026-01-23 07:54:38,788 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:54:38,798 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 99/100 (estimated time remaining: 7 minutes, 8 seconds)
2026-01-23 07:58:01,161 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:58:13,151 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 8559.38379 ± 172.152
2026-01-23 07:58:13,151 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8317.997, 8663.594, 8844.572, 8615.075, 8681.666, 8339.965, 8319.722, 8672.813, 8501.362, 8637.067]
2026-01-23 07:58:13,151 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:58:13,151 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (8559.38) for latency DatasetOffice
2026-01-23 07:58:13,162 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 34 seconds)
2026-01-23 08:01:35,432 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:01:47,312 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7915.78613 ± 1186.098
2026-01-23 08:01:47,312 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [8530.995, 8131.8193, 7667.2427, 8225.883, 8666.623, 4514.2417, 8519.419, 8549.8545, 8649.815, 7701.9707]
2026-01-23 08:01:47,312 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:01:47,322 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1299 [DEBUG]: Training session finished
