2026-01-23 01:56:34,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-halfcheetah/DatasetOffice-bpql-mda-mem2
2026-01-23 01:56:34,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-halfcheetah/DatasetOffice-bpql-mda-mem2
2026-01-23 01:56:34,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x14e0740ef150>}
2026-01-23 01:56:34,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1159 [DEBUG]: using device: cuda
2026-01-23 01:56:34,845 baseline-bpql-mda-noisy-halfcheetah:91 [WARNING]: args.assumed_delay != args.horizon: 2 != 32
2026-01-23 01:56:34,845 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1181 [INFO]: Creating new trainer
2026-01-23 01:56:34,862 baseline-bpql-mda-noisy-halfcheetah:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2026-01-23 01:56:34,862 baseline-bpql-mda-noisy-halfcheetah:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-23 01:56:34,868 baseline-bpql-mda-noisy-halfcheetah:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(6, 384, batch_first=True)
)
2026-01-23 01:56:35,867 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1242 [DEBUG]: Starting training session...
2026-01-23 01:56:35,867 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 1/100
2026-01-23 02:00:08,286 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:00:21,748 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: -454.47324 ± 1.833
2026-01-23 02:00:21,749 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [-454.6308, -451.73013, -455.57706, -455.38617, -456.6871, -453.26215, -457.56552, -453.629, -451.76175, -454.50256]
2026-01-23 02:00:21,749 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:00:21,749 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (-454.47) for latency DatasetOffice
2026-01-23 02:00:21,752 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 2/100 (estimated time remaining: 6 hours, 12 minutes, 42 seconds)
2026-01-23 02:04:02,745 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:04:16,283 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 274.85950 ± 50.183
2026-01-23 02:04:16,284 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [275.49518, 298.7517, 316.23694, 226.24626, 304.15268, 234.73251, 162.86353, 335.78772, 317.38904, 276.93924]
2026-01-23 02:04:16,284 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:04:16,284 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (274.86) for latency DatasetOffice
2026-01-23 02:04:16,290 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 3/100 (estimated time remaining: 6 hours, 16 minutes)
2026-01-23 02:07:52,176 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:08:05,382 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 649.66010 ± 450.873
2026-01-23 02:08:05,382 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [19.413465, 65.1409, -30.610489, 1168.2905, 893.186, 948.1269, 921.94214, 492.59134, 1177.2214, 841.29944]
2026-01-23 02:08:05,382 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:08:05,382 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (649.66) for latency DatasetOffice
2026-01-23 02:08:05,385 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 4/100 (estimated time remaining: 6 hours, 11 minutes, 34 seconds)
2026-01-23 02:11:40,998 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:11:54,196 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 1478.93542 ± 545.066
2026-01-23 02:11:54,196 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [1324.6449, 1915.5782, 1340.9835, 712.9653, 1248.2753, 829.74884, 2287.4824, 2172.9233, 1998.3525, 958.4001]
2026-01-23 02:11:54,196 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:11:54,196 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (1478.94) for latency DatasetOffice
2026-01-23 02:11:54,202 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 5/100 (estimated time remaining: 6 hours, 7 minutes, 20 seconds)
2026-01-23 02:15:30,130 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:15:43,366 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2472.46997 ± 1062.844
2026-01-23 02:15:43,367 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [471.46902, 260.6931, 3052.841, 3223.2844, 3000.2734, 3005.7188, 3110.7852, 3052.9634, 2849.0713, 2697.6013]
2026-01-23 02:15:43,367 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:15:43,367 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (2472.47) for latency DatasetOffice
2026-01-23 02:15:43,374 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 6/100 (estimated time remaining: 6 hours, 3 minutes, 22 seconds)
2026-01-23 02:19:18,980 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:19:32,150 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2884.66650 ± 965.433
2026-01-23 02:19:32,150 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3815.944, 3246.9565, 407.11957, 3549.8264, 2189.6016, 3298.9436, 2889.582, 2387.7832, 3583.6853, 3477.2212]
2026-01-23 02:19:32,150 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:19:32,150 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (2884.67) for latency DatasetOffice
2026-01-23 02:19:32,153 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 7/100 (estimated time remaining: 6 hours, 27 seconds)
2026-01-23 02:23:06,283 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:23:19,566 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 2158.83594 ± 1197.692
2026-01-23 02:23:19,566 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2717.762, 924.13165, 926.0641, 450.62515, 3465.4114, 3015.8728, 3196.061, 566.48914, 3148.5566, 3177.3875]
2026-01-23 02:23:19,566 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:23:19,571 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 8/100 (estimated time remaining: 5 hours, 54 minutes, 25 seconds)
2026-01-23 02:26:52,874 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:27:05,739 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3634.30127 ± 347.353
2026-01-23 02:27:05,740 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [2649.2856, 3792.0664, 3668.0442, 3879.8762, 3952.1528, 3705.3909, 3569.1172, 3587.252, 3787.549, 3752.2756]
2026-01-23 02:27:05,740 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:27:05,740 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (3634.30) for latency DatasetOffice
2026-01-23 02:27:05,745 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 9/100 (estimated time remaining: 5 hours, 49 minutes, 42 seconds)
2026-01-23 02:30:38,019 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:30:51,166 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3440.45508 ± 404.448
2026-01-23 02:30:51,166 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3362.4958, 3520.217, 3493.4885, 3494.862, 3692.429, 2275.2644, 3644.3923, 3792.7617, 3566.9849, 3561.656]
2026-01-23 02:30:51,166 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:30:51,170 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 10/100 (estimated time remaining: 5 hours, 44 minutes, 52 seconds)
2026-01-23 02:34:23,320 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:34:36,415 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3890.41162 ± 98.123
2026-01-23 02:34:36,415 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [3762.2415, 3773.8938, 3886.581, 3976.4092, 3804.8113, 4049.851, 3824.6096, 3931.961, 3864.1619, 4029.5996]
2026-01-23 02:34:36,415 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:34:36,415 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (3890.41) for latency DatasetOffice
2026-01-23 02:34:36,418 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 11/100 (estimated time remaining: 5 hours, 39 minutes, 54 seconds)
2026-01-23 02:38:07,971 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:38:21,069 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4069.50928 ± 130.238
2026-01-23 02:38:21,069 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4018.793, 3933.9329, 4084.6506, 4098.155, 3893.3557, 4200.2925, 4291.1367, 3876.638, 4164.9316, 4133.2114]
2026-01-23 02:38:21,069 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:38:21,069 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (4069.51) for latency DatasetOffice
2026-01-23 02:38:21,073 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 12/100 (estimated time remaining: 5 hours, 34 minutes, 54 seconds)
2026-01-23 02:41:52,663 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:42:05,602 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4253.56934 ± 143.717
2026-01-23 02:42:05,603 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4448.3843, 4064.4038, 4323.8394, 4277.2134, 4316.221, 4302.721, 4004.5369, 4184.5728, 4462.635, 4151.172]
2026-01-23 02:42:05,603 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:42:05,603 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (4253.57) for latency DatasetOffice
2026-01-23 02:42:05,606 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 13/100 (estimated time remaining: 5 hours, 30 minutes, 18 seconds)
2026-01-23 02:45:37,425 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:45:50,498 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 3959.55908 ± 1242.950
2026-01-23 02:45:50,498 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4358.424, 4359.352, 4277.5435, 4492.3203, 4624.881, 3940.8994, 267.85883, 4435.2637, 4309.2993, 4529.7505]
2026-01-23 02:45:50,498 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:45:50,503 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 14/100 (estimated time remaining: 5 hours, 26 minutes, 10 seconds)
2026-01-23 02:49:22,442 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:49:35,524 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4538.07568 ± 553.230
2026-01-23 02:49:35,524 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4884.1772, 4756.7007, 4507.091, 4641.291, 4838.577, 2913.7898, 4750.079, 4702.869, 4820.901, 4565.2764]
2026-01-23 02:49:35,525 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:49:35,525 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (4538.08) for latency DatasetOffice
2026-01-23 02:49:35,529 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 15/100 (estimated time remaining: 5 hours, 22 minutes, 18 seconds)
2026-01-23 02:53:07,237 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:53:20,156 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 4840.45996 ± 126.773
2026-01-23 02:53:20,156 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [4831.7676, 4547.3164, 4844.9443, 4996.8013, 4901.298, 4767.2495, 4995.003, 4925.869, 4741.2505, 4853.099]
2026-01-23 02:53:20,156 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:53:20,156 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (4840.46) for latency DatasetOffice
2026-01-23 02:53:20,160 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 16/100 (estimated time remaining: 5 hours, 18 minutes, 23 seconds)
2026-01-23 02:56:51,918 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:57:04,959 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5003.74854 ± 138.640
2026-01-23 02:57:04,959 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5065.287, 4995.048, 5068.066, 5028.648, 4731.4644, 4879.827, 5117.283, 4857.22, 5051.12, 5243.5215]
2026-01-23 02:57:04,959 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:57:04,959 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (5003.75) for latency DatasetOffice
2026-01-23 02:57:04,963 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 17/100 (estimated time remaining: 5 hours, 14 minutes, 41 seconds)
2026-01-23 03:00:36,846 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:00:49,777 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5406.66309 ± 119.722
2026-01-23 03:00:49,777 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5547.016, 5365.0835, 5520.119, 5355.1016, 5422.248, 5163.7627, 5542.3687, 5249.691, 5440.3867, 5460.854]
2026-01-23 03:00:49,777 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:00:49,777 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (5406.66) for latency DatasetOffice
2026-01-23 03:00:49,781 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 18/100 (estimated time remaining: 5 hours, 11 minutes, 1 second)
2026-01-23 03:04:21,496 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:04:34,368 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5603.10791 ± 60.406
2026-01-23 03:04:34,368 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5645.7056, 5599.883, 5508.9336, 5494.1025, 5616.301, 5617.869, 5578.6885, 5653.5474, 5704.6553, 5611.3887]
2026-01-23 03:04:34,368 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:04:34,368 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (5603.11) for latency DatasetOffice
2026-01-23 03:04:34,372 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 19/100 (estimated time remaining: 5 hours, 7 minutes, 11 seconds)
2026-01-23 03:08:05,785 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:08:18,586 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5032.79199 ± 1230.920
2026-01-23 03:08:18,587 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5642.7827, 5766.639, 5152.0127, 5456.392, 5453.851, 1385.8582, 5507.0225, 5475.77, 5416.678, 5070.9116]
2026-01-23 03:08:18,587 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:08:18,593 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 20/100 (estimated time remaining: 5 hours, 3 minutes, 13 seconds)
2026-01-23 03:11:50,109 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:12:03,172 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5648.83057 ± 209.096
2026-01-23 03:12:03,172 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5525.549, 5071.401, 5699.235, 5813.632, 5652.0776, 5664.2485, 5763.741, 5818.732, 5739.6514, 5740.035]
2026-01-23 03:12:03,172 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:12:03,172 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (5648.83) for latency DatasetOffice
2026-01-23 03:12:03,179 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 21/100 (estimated time remaining: 4 hours, 59 minutes, 28 seconds)
2026-01-23 03:15:34,526 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:15:47,356 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5736.78906 ± 221.693
2026-01-23 03:15:47,356 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5841.831, 5520.79, 6009.542, 5915.5264, 5390.53, 5793.9253, 5966.9155, 5355.455, 5808.29, 5765.085]
2026-01-23 03:15:47,356 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:15:47,356 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (5736.79) for latency DatasetOffice
2026-01-23 03:15:47,361 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 22/100 (estimated time remaining: 4 hours, 55 minutes, 33 seconds)
2026-01-23 03:19:19,060 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:19:32,114 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5874.27441 ± 77.450
2026-01-23 03:19:32,114 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5857.154, 5782.7876, 5829.6143, 5922.259, 5952.4004, 5914.1157, 5808.246, 5938.822, 5992.66, 5744.6865]
2026-01-23 03:19:32,114 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:19:32,114 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (5874.27) for latency DatasetOffice
2026-01-23 03:19:32,119 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 23/100 (estimated time remaining: 4 hours, 51 minutes, 48 seconds)
2026-01-23 03:23:03,547 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:23:16,395 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5884.93994 ± 143.423
2026-01-23 03:23:16,395 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5918.2056, 6157.4395, 5812.276, 5768.621, 6084.1963, 5797.9917, 5651.823, 5893.3027, 5812.618, 5952.931]
2026-01-23 03:23:16,395 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:23:16,395 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (5884.94) for latency DatasetOffice
2026-01-23 03:23:16,399 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 47 minutes, 59 seconds)
2026-01-23 03:26:47,779 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:27:00,718 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5467.79004 ± 1401.276
2026-01-23 03:27:00,718 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6060.1187, 5945.723, 5876.179, 5826.39, 5920.1855, 1296.5038, 6073.8555, 5912.6606, 6234.246, 5532.0356]
2026-01-23 03:27:00,718 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:27:00,723 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 25/100 (estimated time remaining: 4 hours, 44 minutes, 16 seconds)
2026-01-23 03:30:32,277 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:30:45,357 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5863.33691 ± 229.129
2026-01-23 03:30:45,357 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5860.565, 5209.552, 5937.6694, 5954.123, 5981.245, 5887.023, 5940.301, 5990.7266, 6071.3447, 5800.823]
2026-01-23 03:30:45,357 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:30:45,362 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 40 minutes, 32 seconds)
2026-01-23 03:34:17,052 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:34:29,930 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5903.36426 ± 207.684
2026-01-23 03:34:29,930 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6076.1543, 5642.746, 6098.0312, 5948.4775, 5519.163, 5867.2935, 6073.768, 5662.227, 6068.3, 6077.4785]
2026-01-23 03:34:29,930 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:34:29,930 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (5903.36) for latency DatasetOffice
2026-01-23 03:34:29,937 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 27/100 (estimated time remaining: 4 hours, 36 minutes, 54 seconds)
2026-01-23 03:38:01,590 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:38:14,442 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5984.01855 ± 115.129
2026-01-23 03:38:14,442 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [5942.3037, 5728.969, 6080.869, 5903.44, 6068.7275, 5933.233, 6182.1606, 5995.807, 6007.7134, 5996.959]
2026-01-23 03:38:14,442 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:38:14,442 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (5984.02) for latency DatasetOffice
2026-01-23 03:38:14,447 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 28/100 (estimated time remaining: 4 hours, 33 minutes, 5 seconds)
2026-01-23 03:41:45,740 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:41:58,565 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6288.33838 ± 123.667
2026-01-23 03:41:58,565 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6371.252, 6278.079, 6175.1416, 6294.4937, 6093.886, 6333.6787, 6141.286, 6392.0225, 6534.3027, 6269.244]
2026-01-23 03:41:58,565 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:41:58,565 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (6288.34) for latency DatasetOffice
2026-01-23 03:41:58,571 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 29/100 (estimated time remaining: 4 hours, 29 minutes, 19 seconds)
2026-01-23 03:45:29,925 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:45:42,784 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5937.57275 ± 698.350
2026-01-23 03:45:42,784 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6222.95, 6143.7256, 5643.5386, 6233.547, 6307.6353, 3938.1343, 6197.6045, 6256.273, 6460.9326, 5971.3857]
2026-01-23 03:45:42,784 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:45:42,790 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 25 minutes, 33 seconds)
2026-01-23 03:49:14,317 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:49:27,218 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6353.32715 ± 269.171
2026-01-23 03:49:27,218 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6533.506, 5584.34, 6571.7393, 6390.5273, 6348.758, 6528.398, 6436.8057, 6354.387, 6318.446, 6466.368]
2026-01-23 03:49:27,218 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:49:27,218 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (6353.33) for latency DatasetOffice
2026-01-23 03:49:27,223 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 21 minutes, 46 seconds)
2026-01-23 03:52:58,768 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:53:11,615 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6344.09229 ± 201.321
2026-01-23 03:53:11,615 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6420.681, 6174.482, 6592.0767, 6450.4214, 6050.4097, 6424.2134, 6603.325, 6013.011, 6474.6597, 6237.6455]
2026-01-23 03:53:11,615 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:53:11,621 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 32/100 (estimated time remaining: 4 hours, 17 minutes, 59 seconds)
2026-01-23 03:56:43,187 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:56:56,116 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6418.95361 ± 156.218
2026-01-23 03:56:56,117 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6453.2993, 6086.913, 6572.61, 6380.4746, 6328.521, 6431.0, 6237.488, 6583.0425, 6575.7285, 6540.4556]
2026-01-23 03:56:56,117 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:56:56,117 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (6418.95) for latency DatasetOffice
2026-01-23 03:56:56,122 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 14 minutes, 14 seconds)
2026-01-23 04:00:27,331 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:00:40,229 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6439.62598 ± 96.296
2026-01-23 04:00:40,229 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6593.7324, 6279.441, 6320.1987, 6420.6304, 6416.58, 6458.706, 6357.072, 6553.1035, 6522.9067, 6473.892]
2026-01-23 04:00:40,229 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:00:40,229 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (6439.63) for latency DatasetOffice
2026-01-23 04:00:40,236 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 34/100 (estimated time remaining: 4 hours, 10 minutes, 30 seconds)
2026-01-23 04:04:11,699 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:04:24,586 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 5960.97559 ± 1564.565
2026-01-23 04:04:24,586 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6452.9956, 6540.606, 6259.158, 6270.4746, 6698.976, 1292.953, 6616.3066, 6597.788, 6648.103, 6232.3994]
2026-01-23 04:04:24,587 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:04:24,593 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 35/100 (estimated time remaining: 4 hours, 6 minutes, 47 seconds)
2026-01-23 04:07:56,063 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:08:08,928 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6493.48975 ± 233.107
2026-01-23 04:08:08,928 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6428.7285, 5972.214, 6627.521, 6730.1323, 6491.0186, 6755.0576, 6476.1333, 6306.1064, 6769.827, 6378.1562]
2026-01-23 04:08:08,928 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:08:08,928 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (6493.49) for latency DatasetOffice
2026-01-23 04:08:08,935 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 36/100 (estimated time remaining: 4 hours, 3 minutes, 2 seconds)
2026-01-23 04:11:40,298 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:11:53,104 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6414.16113 ± 276.310
2026-01-23 04:11:53,104 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6687.2144, 5936.132, 6528.6445, 6612.9854, 6062.957, 6737.897, 6476.0137, 6050.4897, 6622.4263, 6426.849]
2026-01-23 04:11:53,104 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:11:53,112 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 59 minutes, 15 seconds)
2026-01-23 04:15:24,072 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:15:36,876 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6685.34521 ± 124.493
2026-01-23 04:15:36,876 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6641.1045, 6383.969, 6720.176, 6740.64, 6666.33, 6747.91, 6875.755, 6594.622, 6779.3325, 6703.613]
2026-01-23 04:15:36,876 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:15:36,876 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (6685.35) for latency DatasetOffice
2026-01-23 04:15:36,882 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 55 minutes, 21 seconds)
2026-01-23 04:19:08,253 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:19:21,102 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6687.31885 ± 91.659
2026-01-23 04:19:21,103 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6800.502, 6791.3105, 6567.745, 6784.2134, 6576.0366, 6736.315, 6571.0127, 6721.989, 6607.6724, 6716.3877]
2026-01-23 04:19:21,103 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:19:21,103 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (6687.32) for latency DatasetOffice
2026-01-23 04:19:21,110 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 51 minutes, 38 seconds)
2026-01-23 04:22:52,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:23:05,230 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6430.46680 ± 798.941
2026-01-23 04:23:05,230 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7011.1367, 6718.106, 6357.8975, 6729.349, 6815.629, 4151.437, 6768.3535, 6727.768, 6906.335, 6118.6577]
2026-01-23 04:23:05,231 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:23:05,237 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 47 minutes, 51 seconds)
2026-01-23 04:26:36,313 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:26:49,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6710.79980 ± 203.504
2026-01-23 04:26:49,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6731.893, 6169.079, 6898.0605, 6690.677, 6597.979, 6816.6284, 6849.4287, 6692.9224, 6909.9116, 6751.414]
2026-01-23 04:26:49,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:26:49,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (6710.80) for latency DatasetOffice
2026-01-23 04:26:49,113 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 44 minutes, 2 seconds)
2026-01-23 04:30:20,536 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:30:33,429 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6628.05566 ± 219.918
2026-01-23 04:30:33,429 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6784.881, 6418.4746, 6853.768, 6830.9014, 6266.1143, 6640.3916, 6755.41, 6242.419, 6746.1465, 6742.0503]
2026-01-23 04:30:33,429 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:30:33,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 40 minutes, 19 seconds)
2026-01-23 04:34:04,574 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:34:17,617 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6699.32031 ± 174.980
2026-01-23 04:34:17,617 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6678.5503, 6480.83, 6879.608, 6810.905, 6569.934, 6623.6074, 6965.067, 6397.5293, 6868.1235, 6719.047]
2026-01-23 04:34:17,617 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:34:17,623 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 36 minutes, 40 seconds)
2026-01-23 04:37:48,799 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:38:01,559 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6850.76416 ± 141.654
2026-01-23 04:38:01,559 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6637.084, 6985.762, 6846.737, 6913.847, 6720.9053, 6858.132, 6666.061, 6791.2866, 7086.308, 7001.518]
2026-01-23 04:38:01,559 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:38:01,559 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (6850.76) for latency DatasetOffice
2026-01-23 04:38:01,566 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 32 minutes, 53 seconds)
2026-01-23 04:41:32,726 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:41:45,563 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6204.75488 ± 1593.897
2026-01-23 04:41:45,563 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7015.6694, 6816.0874, 6019.474, 6780.265, 6712.842, 1488.6888, 6873.265, 6830.6323, 6934.1357, 6576.4917]
2026-01-23 04:41:45,563 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:41:45,569 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 29 minutes, 7 seconds)
2026-01-23 04:45:16,805 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:45:29,635 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6816.99756 ± 212.250
2026-01-23 04:45:29,635 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6816.417, 6226.0312, 6954.1626, 6943.425, 6735.465, 6973.549, 6845.913, 6967.8057, 6920.8574, 6786.3486]
2026-01-23 04:45:29,635 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:45:29,641 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 25 minutes, 25 seconds)
2026-01-23 04:49:00,935 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:49:13,816 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6640.17480 ± 203.694
2026-01-23 04:49:13,816 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6737.4478, 6259.885, 6760.252, 6864.6006, 6380.4937, 6624.2075, 6799.9663, 6397.768, 6791.8174, 6785.3193]
2026-01-23 04:49:13,816 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:49:13,824 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 21 minutes, 40 seconds)
2026-01-23 04:52:45,136 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:52:57,873 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6694.09766 ± 79.607
2026-01-23 04:52:57,873 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6696.5234, 6594.8887, 6792.426, 6800.9746, 6712.2188, 6799.8926, 6592.146, 6630.9575, 6614.268, 6706.681]
2026-01-23 04:52:57,873 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:52:57,881 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 17 minutes, 54 seconds)
2026-01-23 04:56:29,036 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:56:41,817 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6888.00537 ± 117.328
2026-01-23 04:56:41,817 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7010.92, 7078.216, 6918.4688, 6974.8223, 6757.3037, 6942.706, 6767.247, 6816.0225, 6695.3726, 6918.9834]
2026-01-23 04:56:41,817 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:56:41,817 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (6888.01) for latency DatasetOffice
2026-01-23 04:56:41,824 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 14 minutes, 10 seconds)
2026-01-23 05:00:12,886 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:00:25,668 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6211.80859 ± 1597.966
2026-01-23 05:00:25,668 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6864.4507, 6963.766, 6507.5835, 6477.7383, 6877.0034, 1456.7698, 6691.721, 7069.4487, 6774.7075, 6434.8936]
2026-01-23 05:00:25,668 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:00:25,678 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 10 minutes, 25 seconds)
2026-01-23 05:03:57,096 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:04:09,871 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6577.10693 ± 194.891
2026-01-23 05:04:09,871 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6775.0576, 6120.959, 6745.6704, 6495.539, 6619.1963, 6384.1606, 6680.3667, 6627.323, 6786.1177, 6536.6787]
2026-01-23 05:04:09,871 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:04:09,878 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 6 minutes, 42 seconds)
2026-01-23 05:07:41,352 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:07:54,168 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6681.53906 ± 223.421
2026-01-23 05:07:54,168 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6758.1514, 6436.0024, 6918.579, 6944.4995, 6395.86, 6856.845, 6873.1426, 6286.279, 6639.5176, 6706.5186]
2026-01-23 05:07:54,168 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:07:54,175 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 2 minutes, 59 seconds)
2026-01-23 05:11:25,366 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:11:38,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7031.67480 ± 164.032
2026-01-23 05:11:38,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7023.2207, 6850.115, 7243.233, 7080.5933, 7149.4277, 7191.6914, 7062.594, 6674.1226, 7121.8105, 6919.9404]
2026-01-23 05:11:38,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:11:38,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (7031.67) for latency DatasetOffice
2026-01-23 05:11:38,248 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 59 minutes, 15 seconds)
2026-01-23 05:15:09,608 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:15:22,504 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6906.79150 ± 119.667
2026-01-23 05:15:22,505 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7041.1016, 6942.245, 6724.8545, 6760.2207, 6878.703, 7002.4014, 6731.296, 6941.398, 7037.9575, 7007.739]
2026-01-23 05:15:22,505 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:15:22,512 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 55 minutes, 34 seconds)
2026-01-23 05:18:53,704 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:19:06,509 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6430.63232 ± 1636.653
2026-01-23 05:19:06,510 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7218.51, 6944.6904, 6674.19, 6917.191, 6970.8525, 1544.9915, 7072.7627, 7140.6484, 7092.2515, 6730.2383]
2026-01-23 05:19:06,510 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:19:06,516 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 51 minutes, 51 seconds)
2026-01-23 05:22:37,671 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:22:50,442 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6960.67041 ± 148.086
2026-01-23 05:22:50,442 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6994.5454, 6563.2905, 6882.925, 7069.3975, 6984.202, 7066.7876, 7063.0483, 6918.655, 6968.0674, 7095.7847]
2026-01-23 05:22:50,442 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:22:50,450 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 48 minutes, 5 seconds)
2026-01-23 05:26:21,649 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:26:34,407 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6944.77637 ± 251.995
2026-01-23 05:26:34,407 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7049.0757, 6801.9824, 7171.6543, 6986.0396, 6552.3657, 7053.0264, 7145.494, 6426.7046, 7190.031, 7071.3887]
2026-01-23 05:26:34,407 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:26:34,414 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 44 minutes, 18 seconds)
2026-01-23 05:30:05,780 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:30:18,558 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6908.76660 ± 101.771
2026-01-23 05:30:18,558 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6969.5815, 6712.9814, 6990.0474, 7066.9775, 6933.613, 6889.5728, 7018.206, 6820.5225, 6859.312, 6826.8516]
2026-01-23 05:30:18,558 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:30:18,567 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 40 minutes, 34 seconds)
2026-01-23 05:33:49,805 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:34:02,599 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7011.06885 ± 88.039
2026-01-23 05:34:02,599 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7010.8853, 7038.423, 7075.895, 6966.915, 7008.5464, 6930.162, 6848.613, 7176.4697, 6957.485, 7097.295]
2026-01-23 05:34:02,599 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:34:02,607 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 36 minutes, 48 seconds)
2026-01-23 05:37:33,942 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:37:46,719 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6747.47119 ± 850.496
2026-01-23 05:37:46,719 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6933.5615, 7018.8315, 6547.3696, 6963.8257, 7257.4346, 4271.691, 7211.813, 7222.406, 7189.377, 6858.4043]
2026-01-23 05:37:46,719 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:37:46,726 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 33 minutes, 5 seconds)
2026-01-23 05:41:17,580 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:41:30,372 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7034.71191 ± 225.369
2026-01-23 05:41:30,372 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7042.33, 6539.4434, 7065.5386, 7297.316, 7030.4126, 7311.9243, 7149.4424, 7144.0117, 6733.5894, 7033.1025]
2026-01-23 05:41:30,372 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:41:30,372 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (7034.71) for latency DatasetOffice
2026-01-23 05:41:30,379 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 29 minutes, 19 seconds)
2026-01-23 05:45:01,622 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:45:14,403 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6800.13281 ± 267.121
2026-01-23 05:45:14,403 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [6989.444, 6603.366, 7110.34, 7025.576, 6137.779, 6786.557, 6865.427, 6651.842, 6906.677, 6924.3228]
2026-01-23 05:45:14,403 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:45:14,411 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 25 minutes, 35 seconds)
2026-01-23 05:48:45,432 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:48:58,491 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7050.09229 ± 128.786
2026-01-23 05:48:58,492 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7141.164, 6703.539, 7101.477, 7151.4375, 7020.1416, 7052.815, 7129.262, 7119.9434, 6962.0835, 7119.06]
2026-01-23 05:48:58,492 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:48:58,492 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (7050.09) for latency DatasetOffice
2026-01-23 05:48:58,502 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 21 minutes, 51 seconds)
2026-01-23 05:52:29,669 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:52:42,467 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6990.23975 ± 129.661
2026-01-23 05:52:42,467 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7043.0537, 6893.3047, 6866.7266, 7293.1597, 6885.215, 6883.614, 7085.496, 6876.67, 7031.8804, 7043.28]
2026-01-23 05:52:42,467 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:52:42,477 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 18 minutes, 7 seconds)
2026-01-23 05:56:13,722 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:56:26,635 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6468.92969 ± 1684.950
2026-01-23 05:56:26,635 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7159.737, 6998.6313, 6526.5347, 6880.371, 7188.017, 1450.4231, 7253.883, 7175.2383, 7120.427, 6936.0347]
2026-01-23 05:56:26,635 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:56:26,645 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 14 minutes, 23 seconds)
2026-01-23 05:59:57,741 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:00:10,622 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7157.10059 ± 162.587
2026-01-23 06:00:10,622 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7370.4644, 6843.088, 7235.242, 7124.9253, 7167.0264, 7126.7207, 7076.9526, 7322.685, 7349.6387, 6954.2676]
2026-01-23 06:00:10,622 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:00:10,622 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (7157.10) for latency DatasetOffice
2026-01-23 06:00:10,632 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 10 minutes, 41 seconds)
2026-01-23 06:03:41,649 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:03:54,534 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7048.22559 ± 230.957
2026-01-23 06:03:54,534 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7080.0913, 6852.694, 7079.5464, 7260.322, 6704.0386, 7277.2075, 7196.426, 6603.2783, 7181.3267, 7247.3306]
2026-01-23 06:03:54,534 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:03:54,542 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 6 minutes, 56 seconds)
2026-01-23 06:07:25,602 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:07:38,494 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7136.78760 ± 139.505
2026-01-23 06:07:38,494 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7112.021, 7048.6753, 7152.998, 7027.115, 7370.7925, 7133.211, 7261.7676, 6862.516, 7098.161, 7300.618]
2026-01-23 06:07:38,495 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:07:38,503 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 3 minutes, 12 seconds)
2026-01-23 06:11:09,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:11:22,363 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7257.82031 ± 166.918
2026-01-23 06:11:22,363 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7337.7544, 7284.159, 7149.0405, 7345.6255, 7136.9346, 7221.3994, 6953.4805, 7477.2944, 7132.597, 7539.918]
2026-01-23 06:11:22,363 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:11:22,363 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (7257.82) for latency DatasetOffice
2026-01-23 06:11:22,371 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 59 minutes, 27 seconds)
2026-01-23 06:14:53,441 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:15:06,203 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6865.16016 ± 847.810
2026-01-23 06:15:06,203 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7406.7314, 7228.1543, 6416.3447, 7113.267, 7251.402, 4473.7266, 7250.3896, 7332.536, 7375.976, 6803.0723]
2026-01-23 06:15:06,203 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:15:06,211 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 55 minutes, 41 seconds)
2026-01-23 06:18:37,451 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:18:50,324 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7100.36621 ± 119.587
2026-01-23 06:18:50,324 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7093.7793, 6761.753, 7164.4995, 7075.389, 7126.227, 7121.1387, 7133.861, 7124.154, 7214.746, 7188.1143]
2026-01-23 06:18:50,324 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:18:50,332 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 51 minutes, 58 seconds)
2026-01-23 06:22:21,563 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:22:34,325 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7145.26660 ± 273.784
2026-01-23 06:22:34,326 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7253.2075, 6677.458, 7499.488, 7239.9756, 6749.6113, 7282.5913, 7356.6504, 6807.0376, 7349.504, 7237.144]
2026-01-23 06:22:34,326 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:22:34,333 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 48 minutes, 14 seconds)
2026-01-23 06:26:05,592 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:26:18,428 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7289.38135 ± 156.721
2026-01-23 06:26:18,428 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7301.356, 7115.516, 7443.8955, 7374.99, 7320.079, 7008.3203, 7159.677, 7279.593, 7584.1655, 7306.2207]
2026-01-23 06:26:18,428 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:26:18,428 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (7289.38) for latency DatasetOffice
2026-01-23 06:26:18,437 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 44 minutes, 31 seconds)
2026-01-23 06:29:49,799 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:30:02,592 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7169.43457 ± 132.801
2026-01-23 06:30:02,592 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7082.063, 7311.9043, 7168.867, 7170.2573, 7281.4746, 7082.6016, 6881.2764, 7136.981, 7377.307, 7201.606]
2026-01-23 06:30:02,592 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:30:02,600 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 40 minutes, 49 seconds)
2026-01-23 06:33:34,112 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:33:46,686 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6911.04541 ± 918.506
2026-01-23 06:33:46,686 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7345.46, 7397.8633, 6548.368, 7227.472, 7463.527, 4257.443, 7261.758, 7325.377, 7266.8, 7016.384]
2026-01-23 06:33:46,686 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:33:46,696 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 37 minutes, 6 seconds)
2026-01-23 06:37:18,043 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:37:30,734 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7296.15869 ± 266.178
2026-01-23 06:37:30,734 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7336.061, 6542.9263, 7439.688, 7498.182, 7189.71, 7498.3477, 7396.2876, 7338.4604, 7307.501, 7414.424]
2026-01-23 06:37:30,734 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:37:30,734 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (7296.16) for latency DatasetOffice
2026-01-23 06:37:30,745 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 33 minutes, 22 seconds)
2026-01-23 06:41:01,947 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:41:14,795 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7173.79932 ± 160.680
2026-01-23 06:41:14,795 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7377.1416, 7020.571, 7099.0283, 7314.1885, 6938.127, 7382.855, 7317.8623, 6954.823, 7165.525, 7167.869]
2026-01-23 06:41:14,795 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:41:14,803 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 29 minutes, 38 seconds)
2026-01-23 06:44:45,937 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:44:58,803 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7366.24854 ± 126.633
2026-01-23 06:44:58,803 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7208.235, 7223.5127, 7422.38, 7469.8237, 7259.5435, 7379.7188, 7269.1904, 7423.2573, 7363.286, 7643.543]
2026-01-23 06:44:58,803 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:44:58,803 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (7366.25) for latency DatasetOffice
2026-01-23 06:44:58,815 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 25 minutes, 53 seconds)
2026-01-23 06:48:30,043 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:48:42,865 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7419.43457 ± 174.865
2026-01-23 06:48:42,865 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7284.3306, 7475.4, 7314.513, 7468.3, 7516.6025, 7366.86, 7038.2207, 7416.178, 7674.453, 7639.482]
2026-01-23 06:48:42,865 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:48:42,865 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (7419.43) for latency DatasetOffice
2026-01-23 06:48:42,875 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 22 minutes, 9 seconds)
2026-01-23 06:52:13,845 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:52:26,628 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6887.77490 ± 864.549
2026-01-23 06:52:26,628 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7249.1465, 7312.345, 6779.75, 6973.936, 7403.8003, 4372.8545, 7317.3677, 7227.2725, 7390.0303, 6851.2466]
2026-01-23 06:52:26,629 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:52:26,638 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 18 minutes, 23 seconds)
2026-01-23 06:55:57,737 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:56:10,622 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7400.97119 ± 214.703
2026-01-23 06:56:10,622 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7456.3945, 6969.2646, 7525.1465, 7644.055, 7485.0586, 7500.0664, 7378.654, 7568.5586, 7465.463, 7017.0576]
2026-01-23 06:56:10,622 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:56:10,631 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 14 minutes, 39 seconds)
2026-01-23 06:59:41,870 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:59:54,659 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7097.10449 ± 361.913
2026-01-23 06:59:54,659 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7115.1714, 6930.3276, 7159.859, 7497.855, 7076.2417, 7445.012, 7304.0723, 7042.6533, 7263.1113, 6136.743]
2026-01-23 06:59:54,659 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:59:54,669 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 10 minutes, 55 seconds)
2026-01-23 07:03:26,048 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:03:38,946 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7471.69385 ± 144.384
2026-01-23 07:03:38,946 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7579.1665, 7129.651, 7617.302, 7546.6504, 7446.767, 7549.8867, 7470.3184, 7366.744, 7633.8784, 7376.576]
2026-01-23 07:03:38,946 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:03:38,946 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (7471.69) for latency DatasetOffice
2026-01-23 07:03:38,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 7 minutes, 12 seconds)
2026-01-23 07:07:10,185 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:07:23,008 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7330.28125 ± 98.479
2026-01-23 07:07:23,008 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7297.079, 7436.294, 7318.0723, 7302.285, 7318.6714, 7378.0107, 7118.9536, 7433.3247, 7462.3896, 7237.7373]
2026-01-23 07:07:23,008 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:07:23,018 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 3 minutes, 28 seconds)
2026-01-23 07:10:54,033 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:11:06,839 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6978.11719 ± 836.199
2026-01-23 07:11:06,839 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7375.678, 7251.765, 6941.2773, 7080.1914, 7371.288, 4525.2314, 7394.2593, 7529.2783, 7290.3184, 7021.883]
2026-01-23 07:11:06,839 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:11:06,848 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 85/100 (estimated time remaining: 59 minutes, 44 seconds)
2026-01-23 07:14:38,212 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:14:51,010 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7302.40869 ± 266.091
2026-01-23 07:14:51,010 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7365.0283, 6580.0464, 7214.8022, 7646.0664, 7393.1807, 7262.395, 7318.586, 7339.632, 7461.9097, 7442.444]
2026-01-23 07:14:51,010 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:14:51,022 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 86/100 (estimated time remaining: 56 minutes, 1 second)
2026-01-23 07:18:22,219 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:18:35,262 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7110.74902 ± 272.885
2026-01-23 07:18:35,262 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7277.4355, 6744.49, 7304.7627, 7337.843, 6688.083, 7165.759, 7211.063, 6678.299, 7364.085, 7335.6694]
2026-01-23 07:18:35,262 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:18:35,272 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 87/100 (estimated time remaining: 52 minutes, 17 seconds)
2026-01-23 07:22:06,474 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:22:19,309 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7414.33203 ± 127.745
2026-01-23 07:22:19,309 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7359.8003, 7261.7, 7589.8916, 7460.297, 7151.9736, 7504.624, 7545.4204, 7365.8438, 7403.7, 7500.0684]
2026-01-23 07:22:19,309 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:22:19,319 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 88/100 (estimated time remaining: 48 minutes, 32 seconds)
2026-01-23 07:25:50,558 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:26:03,608 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7537.01807 ± 137.854
2026-01-23 07:26:03,608 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7453.062, 7608.285, 7588.6836, 7726.79, 7645.7754, 7408.645, 7269.73, 7462.6265, 7491.1406, 7715.4463]
2026-01-23 07:26:03,608 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:26:03,608 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (7537.02) for latency DatasetOffice
2026-01-23 07:26:03,618 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 89/100 (estimated time remaining: 44 minutes, 49 seconds)
2026-01-23 07:29:34,808 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:29:47,818 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7045.45215 ± 811.301
2026-01-23 07:29:47,818 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7505.805, 7428.9883, 6892.8135, 7330.604, 7306.368, 4677.0483, 7461.186, 7424.4824, 7397.9854, 7029.2446]
2026-01-23 07:29:47,818 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:29:47,831 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 90/100 (estimated time remaining: 41 minutes, 6 seconds)
2026-01-23 07:33:19,041 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:33:32,123 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7467.90381 ± 242.574
2026-01-23 07:33:32,123 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7456.291, 6827.8164, 7386.8477, 7607.151, 7446.256, 7704.2573, 7469.7466, 7768.7856, 7543.735, 7468.15]
2026-01-23 07:33:32,123 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:33:32,134 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 91/100 (estimated time remaining: 37 minutes, 22 seconds)
2026-01-23 07:37:03,235 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:37:16,105 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7317.59619 ± 201.362
2026-01-23 07:37:16,105 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7497.5024, 7229.6807, 7401.7905, 7549.1304, 6879.847, 7305.5, 7507.4897, 7059.5728, 7418.486, 7326.961]
2026-01-23 07:37:16,105 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:37:16,115 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 92/100 (estimated time remaining: 33 minutes, 37 seconds)
2026-01-23 07:40:46,984 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:41:00,057 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7528.12988 ± 144.677
2026-01-23 07:41:00,057 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7731.229, 7308.9204, 7580.0405, 7534.2407, 7475.7744, 7642.537, 7370.638, 7345.89, 7734.452, 7557.5747]
2026-01-23 07:41:00,057 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:41:00,067 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 93/100 (estimated time remaining: 29 minutes, 53 seconds)
2026-01-23 07:44:31,374 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:44:44,194 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7491.93457 ± 137.497
2026-01-23 07:44:44,194 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7567.733, 7679.486, 7418.5454, 7491.979, 7401.3643, 7642.5693, 7179.2754, 7509.3335, 7594.4497, 7434.6055]
2026-01-23 07:44:44,194 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:44:44,206 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 94/100 (estimated time remaining: 26 minutes, 8 seconds)
2026-01-23 07:48:15,340 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:48:28,100 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 6820.13525 ± 1733.371
2026-01-23 07:48:28,100 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7390.53, 7638.469, 6766.215, 7627.7715, 7452.3687, 1683.3485, 7430.885, 7652.456, 7525.7705, 7033.54]
2026-01-23 07:48:28,100 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:48:28,110 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 95/100 (estimated time remaining: 22 minutes, 24 seconds)
2026-01-23 07:51:59,444 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:52:12,133 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7454.73975 ± 183.116
2026-01-23 07:52:12,133 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7549.578, 6990.566, 7396.7915, 7537.3013, 7532.8594, 7541.6963, 7561.681, 7678.312, 7294.9756, 7463.6357]
2026-01-23 07:52:12,133 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:52:12,143 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 96/100 (estimated time remaining: 18 minutes, 40 seconds)
2026-01-23 07:55:41,374 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:55:54,021 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7400.50000 ± 186.447
2026-01-23 07:55:54,021 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7538.5327, 7249.964, 7506.7925, 7511.6807, 7023.9023, 7493.367, 7513.4834, 7130.941, 7614.8223, 7421.515]
2026-01-23 07:55:54,021 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:55:54,031 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 97/100 (estimated time remaining: 14 minutes, 54 seconds)
2026-01-23 07:59:22,913 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:59:35,604 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7530.87402 ± 116.520
2026-01-23 07:59:35,605 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7651.6836, 7371.23, 7504.23, 7593.3477, 7345.1016, 7737.979, 7515.3135, 7438.3057, 7596.122, 7555.4263]
2026-01-23 07:59:35,605 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:59:35,615 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 98/100 (estimated time remaining: 11 minutes, 9 seconds)
2026-01-23 08:03:04,484 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:03:17,384 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7608.44043 ± 105.181
2026-01-23 08:03:17,385 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7742.8784, 7635.244, 7741.419, 7513.7563, 7491.631, 7514.6445, 7462.079, 7568.0825, 7699.057, 7715.6133]
2026-01-23 08:03:17,385 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:03:17,385 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (7608.44) for latency DatasetOffice
2026-01-23 08:03:17,396 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 99/100 (estimated time remaining: 7 minutes, 25 seconds)
2026-01-23 08:06:45,914 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:06:58,801 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7240.43115 ± 856.821
2026-01-23 08:06:58,801 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7699.5605, 7491.858, 7090.4243, 7490.565, 7678.737, 4731.4897, 7591.1787, 7779.706, 7506.5747, 7344.2207]
2026-01-23 08:06:58,801 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:06:58,811 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1247 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 42 seconds)
2026-01-23 08:10:27,521 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:10:40,244 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1269 [DEBUG]: Total Reward: 7656.90234 ± 208.557
2026-01-23 08:10:40,244 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1270 [DEBUG]: All rewards: [7756.9956, 7116.9385, 7585.448, 7646.567, 7819.3945, 7634.305, 7835.489, 7837.828, 7533.1606, 7802.899]
2026-01-23 08:10:40,244 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:10:40,244 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1274 [INFO]: New best (7656.90) for latency DatasetOffice
2026-01-23 08:10:40,255 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1299 [DEBUG]: Training session finished
