2026-01-23 01:58:30,989 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-hopper/DatasetOffice-bpql-mda-mem2
2026-01-23 01:58:30,990 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-hopper/DatasetOffice-bpql-mda-mem2
2026-01-23 01:58:30,990 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x145b83dfc8d0>}
2026-01-23 01:58:30,990 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1159 [DEBUG]: using device: cuda
2026-01-23 01:58:31,130 baseline-bpql-mda-noisy-hopper:91 [WARNING]: args.assumed_delay != args.horizon: 2 != 32
2026-01-23 01:58:31,130 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1181 [INFO]: Creating new trainer
2026-01-23 01:58:31,146 baseline-bpql-mda-noisy-hopper:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2026-01-23 01:58:31,147 baseline-bpql-mda-noisy-hopper:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-23 01:58:31,152 baseline-bpql-mda-noisy-hopper:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=11, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(3, 384, batch_first=True)
)
2026-01-23 01:58:32,012 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1242 [DEBUG]: Starting training session...
2026-01-23 01:58:32,013 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 1/100
2026-01-23 02:01:51,177 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:01:52,119 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 109.58927 ± 11.074
2026-01-23 02:01:52,119 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [98.73455, 128.69412, 117.76133, 120.83742, 108.62117, 107.543915, 107.425125, 103.014015, 115.24087, 88.020195]
2026-01-23 02:01:52,119 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [57.0, 86.0, 79.0, 81.0, 74.0, 72.0, 72.0, 60.0, 77.0, 51.0]
2026-01-23 02:01:52,119 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (109.59) for latency DatasetOffice
2026-01-23 02:01:52,122 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 2/100 (estimated time remaining: 5 hours, 30 minutes, 10 seconds)
2026-01-23 02:05:21,618 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:05:23,524 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 276.37155 ± 73.270
2026-01-23 02:05:23,524 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [331.88538, 159.13374, 263.61984, 354.6904, 359.62894, 174.24611, 193.3927, 271.8472, 304.84488, 350.4262]
2026-01-23 02:05:23,524 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [151.0, 103.0, 142.0, 168.0, 163.0, 103.0, 116.0, 144.0, 160.0, 153.0]
2026-01-23 02:05:23,524 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (276.37) for latency DatasetOffice
2026-01-23 02:05:23,527 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 3/100 (estimated time remaining: 5 hours, 36 minutes, 4 seconds)
2026-01-23 02:08:55,483 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:08:57,861 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 294.05731 ± 111.918
2026-01-23 02:08:57,861 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [362.37915, 244.25293, 486.10422, 382.30096, 345.6896, 227.36238, 116.15882, 222.7466, 156.38062, 397.19806]
2026-01-23 02:08:57,862 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [161.0, 186.0, 267.0, 224.0, 233.0, 178.0, 87.0, 113.0, 107.0, 192.0]
2026-01-23 02:08:57,862 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (294.06) for latency DatasetOffice
2026-01-23 02:08:57,864 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 4/100 (estimated time remaining: 5 hours, 37 minutes, 15 seconds)
2026-01-23 02:12:38,948 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:12:40,934 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 200.28081 ± 133.433
2026-01-23 02:12:40,934 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [102.66215, 472.59558, 117.00503, 125.91826, 97.53659, 101.38121, 395.87753, 319.97632, 149.56715, 120.288376]
2026-01-23 02:12:40,934 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [91.0, 325.0, 103.0, 114.0, 87.0, 90.0, 288.0, 141.0, 136.0, 108.0]
2026-01-23 02:12:40,937 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 5/100 (estimated time remaining: 5 hours, 39 minutes, 34 seconds)
2026-01-23 02:16:03,402 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:16:05,285 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 387.77319 ± 6.814
2026-01-23 02:16:05,285 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [394.32538, 384.58496, 389.6859, 383.67035, 382.55237, 394.1862, 375.85507, 399.36902, 382.0773, 391.4254]
2026-01-23 02:16:05,285 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [143.0, 142.0, 141.0, 142.0, 140.0, 142.0, 137.0, 145.0, 140.0, 145.0]
2026-01-23 02:16:05,285 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (387.77) for latency DatasetOffice
2026-01-23 02:16:05,289 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 6/100 (estimated time remaining: 5 hours, 33 minutes, 32 seconds)
2026-01-23 02:19:37,691 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:19:39,852 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 435.96631 ± 128.944
2026-01-23 02:19:39,852 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [596.36707, 610.14044, 434.49817, 435.39264, 416.47662, 381.49268, 155.03812, 368.69986, 579.2418, 382.31528]
2026-01-23 02:19:39,852 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [199.0, 209.0, 161.0, 163.0, 158.0, 147.0, 80.0, 154.0, 195.0, 151.0]
2026-01-23 02:19:39,852 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (435.97) for latency DatasetOffice
2026-01-23 02:19:39,855 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 7/100 (estimated time remaining: 5 hours, 34 minutes, 33 seconds)
2026-01-23 02:23:13,473 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:23:15,693 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 515.94019 ± 53.194
2026-01-23 02:23:15,693 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [532.55194, 541.5243, 561.6108, 554.5891, 402.1052, 526.00995, 571.21313, 437.07248, 543.2904, 489.43442]
2026-01-23 02:23:15,693 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [163.0, 165.0, 177.0, 170.0, 138.0, 182.0, 175.0, 145.0, 165.0, 155.0]
2026-01-23 02:23:15,693 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (515.94) for latency DatasetOffice
2026-01-23 02:23:15,697 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 8/100 (estimated time remaining: 5 hours, 32 minutes, 22 seconds)
2026-01-23 02:26:47,493 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:26:49,820 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 526.11633 ± 6.301
2026-01-23 02:26:49,820 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [536.82697, 519.9256, 530.36346, 533.50055, 523.3372, 520.6261, 533.14795, 522.83026, 521.9842, 518.62006]
2026-01-23 02:26:49,820 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [173.0, 174.0, 169.0, 171.0, 169.0, 174.0, 168.0, 169.0, 168.0, 170.0]
2026-01-23 02:26:49,820 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (526.12) for latency DatasetOffice
2026-01-23 02:26:49,823 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 9/100 (estimated time remaining: 5 hours, 28 minutes, 44 seconds)
2026-01-23 02:30:19,170 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:30:20,971 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 302.30029 ± 9.786
2026-01-23 02:30:20,972 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [306.82568, 303.7595, 302.5502, 302.95462, 305.17648, 295.5514, 292.10913, 321.75296, 283.45187, 308.8709]
2026-01-23 02:30:20,972 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [139.0, 136.0, 136.0, 139.0, 138.0, 136.0, 137.0, 135.0, 137.0, 136.0]
2026-01-23 02:30:20,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 10/100 (estimated time remaining: 5 hours, 21 minutes, 32 seconds)
2026-01-23 02:33:51,810 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:33:54,302 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 544.15417 ± 145.078
2026-01-23 02:33:54,302 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [638.8515, 119.12009, 595.5858, 536.35583, 596.5859, 574.61523, 571.74994, 630.753, 622.0952, 555.8295]
2026-01-23 02:33:54,302 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [204.0, 63.0, 204.0, 190.0, 203.0, 217.0, 203.0, 202.0, 207.0, 183.0]
2026-01-23 02:33:54,303 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (544.15) for latency DatasetOffice
2026-01-23 02:33:54,306 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 11/100 (estimated time remaining: 5 hours, 20 minutes, 42 seconds)
2026-01-23 02:37:23,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:37:26,246 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 450.53775 ± 341.599
2026-01-23 02:37:26,246 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [396.38583, 176.74298, 169.28264, 1031.1318, 455.66995, 183.36136, 1097.5283, 176.50615, 637.77423, 180.99423]
2026-01-23 02:37:26,246 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [168.0, 94.0, 93.0, 344.0, 168.0, 95.0, 373.0, 94.0, 208.0, 94.0]
2026-01-23 02:37:26,250 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 12/100 (estimated time remaining: 5 hours, 16 minutes, 21 seconds)
2026-01-23 02:40:56,635 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:40:58,819 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 502.86630 ± 136.741
2026-01-23 02:40:58,819 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [93.85211, 554.5605, 549.9234, 569.2213, 526.45483, 552.2049, 549.0129, 540.2632, 551.88776, 541.2821]
2026-01-23 02:40:58,819 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [54.0, 184.0, 173.0, 182.0, 175.0, 180.0, 183.0, 180.0, 172.0, 176.0]
2026-01-23 02:40:58,822 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 13/100 (estimated time remaining: 5 hours, 11 minutes, 51 seconds)
2026-01-23 02:44:33,576 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:44:36,788 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 719.39056 ± 108.962
2026-01-23 02:44:36,788 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [943.25366, 688.2251, 691.09235, 784.07477, 824.36743, 694.8311, 507.73657, 725.66437, 687.02374, 647.63654]
2026-01-23 02:44:36,788 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [319.0, 209.0, 224.0, 249.0, 267.0, 211.0, 207.0, 237.0, 228.0, 213.0]
2026-01-23 02:44:36,788 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (719.39) for latency DatasetOffice
2026-01-23 02:44:36,793 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 14/100 (estimated time remaining: 5 hours, 9 minutes, 25 seconds)
2026-01-23 02:48:04,374 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:48:07,098 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 605.65143 ± 166.462
2026-01-23 02:48:07,099 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [797.56494, 692.9386, 691.7709, 355.61456, 699.85474, 359.09344, 734.2184, 659.62897, 355.40222, 710.42706]
2026-01-23 02:48:07,099 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [263.0, 227.0, 223.0, 141.0, 228.0, 140.0, 244.0, 230.0, 139.0, 223.0]
2026-01-23 02:48:07,104 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 15/100 (estimated time remaining: 5 hours, 5 minutes, 37 seconds)
2026-01-23 02:51:42,572 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:51:45,491 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 645.68622 ± 278.344
2026-01-23 02:51:45,491 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [725.9079, 1014.1332, 802.1239, 555.65155, 792.6828, 766.7009, 606.8563, 866.40955, 317.98145, 8.415154]
2026-01-23 02:51:45,491 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [231.0, 345.0, 257.0, 175.0, 265.0, 253.0, 189.0, 316.0, 132.0, 11.0]
2026-01-23 02:51:45,495 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 16/100 (estimated time remaining: 5 hours, 3 minutes, 30 seconds)
2026-01-23 02:55:11,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:55:15,057 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 772.25854 ± 45.096
2026-01-23 02:55:15,057 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [799.6135, 698.4404, 819.3472, 825.0671, 734.6418, 705.4219, 820.2395, 798.2476, 758.61206, 762.95465]
2026-01-23 02:55:15,057 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [264.0, 225.0, 270.0, 274.0, 242.0, 229.0, 280.0, 262.0, 247.0, 253.0]
2026-01-23 02:55:15,057 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (772.26) for latency DatasetOffice
2026-01-23 02:55:15,061 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 59 minutes, 16 seconds)
2026-01-23 02:58:46,657 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:58:49,653 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 688.06848 ± 236.505
2026-01-23 02:58:49,653 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [691.2593, 698.7047, 127.19921, 712.2923, 704.5939, 700.2694, 1181.116, 671.5002, 689.34827, 704.40137]
2026-01-23 02:58:49,653 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [230.0, 226.0, 64.0, 235.0, 217.0, 226.0, 369.0, 201.0, 212.0, 226.0]
2026-01-23 02:58:49,657 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 56 minutes, 15 seconds)
2026-01-23 03:02:22,395 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:02:25,262 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 657.67023 ± 165.493
2026-01-23 03:02:25,262 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [716.9367, 433.8902, 266.5428, 603.438, 767.818, 783.5792, 781.143, 731.03253, 739.2923, 753.02954]
2026-01-23 03:02:25,262 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [242.0, 166.0, 117.0, 196.0, 246.0, 251.0, 249.0, 231.0, 234.0, 239.0]
2026-01-23 03:02:25,267 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 52 minutes, 2 seconds)
2026-01-23 03:05:55,338 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:05:58,882 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 876.47363 ± 37.323
2026-01-23 03:05:58,882 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [965.4816, 850.74945, 865.15283, 842.9415, 865.2419, 843.17334, 843.00464, 882.36816, 891.8467, 914.77655]
2026-01-23 03:05:58,882 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [300.0, 265.0, 270.0, 250.0, 256.0, 251.0, 252.0, 275.0, 261.0, 281.0]
2026-01-23 03:05:58,882 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (876.47) for latency DatasetOffice
2026-01-23 03:05:58,886 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 49 minutes, 22 seconds)
2026-01-23 03:09:31,164 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:09:33,377 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 467.44305 ± 583.608
2026-01-23 03:09:33,377 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1846.3756, 176.78516, 82.39046, 172.42523, 957.94135, 174.37466, 1074.1442, 52.692493, 61.957935, 75.343285]
2026-01-23 03:09:33,377 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [580.0, 87.0, 51.0, 86.0, 293.0, 89.0, 325.0, 38.0, 42.0, 48.0]
2026-01-23 03:09:33,381 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 21/100 (estimated time remaining: 4 hours, 44 minutes, 46 seconds)
2026-01-23 03:13:03,711 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:13:06,951 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 766.23938 ± 82.925
2026-01-23 03:13:06,951 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [610.19324, 781.4188, 715.69867, 924.6598, 816.4932, 766.7943, 854.4738, 756.7643, 745.04694, 690.8512]
2026-01-23 03:13:06,951 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [181.0, 248.0, 229.0, 289.0, 260.0, 244.0, 268.0, 242.0, 237.0, 224.0]
2026-01-23 03:13:06,958 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 22/100 (estimated time remaining: 4 hours, 42 minutes, 15 seconds)
2026-01-23 03:16:36,686 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:16:39,240 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 623.63525 ± 44.115
2026-01-23 03:16:39,240 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [674.5954, 577.3545, 636.07495, 633.1093, 673.5783, 582.67737, 686.6848, 629.0929, 550.5695, 592.6156]
2026-01-23 03:16:39,240 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [204.0, 178.0, 189.0, 188.0, 208.0, 173.0, 216.0, 186.0, 164.0, 176.0]
2026-01-23 03:16:39,244 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 23/100 (estimated time remaining: 4 hours, 38 minutes, 5 seconds)
2026-01-23 03:20:08,929 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:20:10,391 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 309.25394 ± 316.178
2026-01-23 03:20:10,392 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [692.7431, 660.13776, 646.8311, 770.8352, 64.91872, 17.04024, 14.7949915, 17.3014, 128.17918, 79.757515]
2026-01-23 03:20:10,392 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [219.0, 210.0, 205.0, 245.0, 41.0, 16.0, 13.0, 16.0, 85.0, 47.0]
2026-01-23 03:20:10,397 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 33 minutes, 22 seconds)
2026-01-23 03:23:42,007 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:23:46,672 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1092.37231 ± 315.510
2026-01-23 03:23:46,673 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1646.8478, 1409.7079, 790.5747, 711.128, 927.3149, 1490.2374, 1086.5963, 1079.0273, 1096.2711, 686.01746]
2026-01-23 03:23:46,673 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [500.0, 435.0, 249.0, 233.0, 293.0, 461.0, 336.0, 335.0, 333.0, 219.0]
2026-01-23 03:23:46,673 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (1092.37) for latency DatasetOffice
2026-01-23 03:23:46,677 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 25/100 (estimated time remaining: 4 hours, 30 minutes, 30 seconds)
2026-01-23 03:27:19,402 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:27:22,412 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 679.50134 ± 76.980
2026-01-23 03:27:22,412 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [706.52313, 701.1974, 696.31995, 764.1316, 684.30945, 672.17175, 677.60675, 735.3754, 462.3167, 695.06146]
2026-01-23 03:27:22,412 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [227.0, 229.0, 222.0, 252.0, 221.0, 213.0, 215.0, 240.0, 167.0, 227.0]
2026-01-23 03:27:22,417 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 27 minutes, 15 seconds)
2026-01-23 03:30:51,066 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:30:54,336 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 766.75653 ± 117.217
2026-01-23 03:30:54,336 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [687.62305, 872.13495, 693.5147, 897.3425, 867.7091, 813.68176, 677.4581, 695.145, 544.8896, 918.06647]
2026-01-23 03:30:54,336 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [218.0, 281.0, 223.0, 285.0, 271.0, 263.0, 199.0, 214.0, 199.0, 288.0]
2026-01-23 03:30:54,341 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 27/100 (estimated time remaining: 4 hours, 23 minutes, 17 seconds)
2026-01-23 03:34:30,316 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:34:34,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 882.42206 ± 132.031
2026-01-23 03:34:34,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [790.66003, 794.1286, 956.325, 1144.5656, 797.12067, 806.60223, 777.7737, 736.35333, 956.1101, 1064.5814]
2026-01-23 03:34:34,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [243.0, 248.0, 294.0, 349.0, 247.0, 250.0, 238.0, 226.0, 309.0, 406.0]
2026-01-23 03:34:34,112 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 28/100 (estimated time remaining: 4 hours, 21 minutes, 33 seconds)
2026-01-23 03:38:01,334 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:38:04,405 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 725.41467 ± 89.294
2026-01-23 03:38:04,405 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [718.86554, 752.608, 714.20123, 795.31134, 804.2884, 471.74802, 772.73126, 734.5448, 732.91125, 756.9371]
2026-01-23 03:38:04,405 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [226.0, 236.0, 228.0, 251.0, 260.0, 173.0, 244.0, 228.0, 230.0, 233.0]
2026-01-23 03:38:04,411 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 29/100 (estimated time remaining: 4 hours, 17 minutes, 45 seconds)
2026-01-23 03:41:38,603 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:41:44,537 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1385.22583 ± 273.536
2026-01-23 03:41:44,537 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1344.9934, 1988.1368, 1217.1525, 1006.01794, 1140.1069, 1312.5153, 1761.6514, 1367.2777, 1324.8616, 1389.5455]
2026-01-23 03:41:44,537 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [430.0, 630.0, 371.0, 314.0, 353.0, 421.0, 584.0, 442.0, 421.0, 435.0]
2026-01-23 03:41:44,537 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (1385.23) for latency DatasetOffice
2026-01-23 03:41:44,541 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 15 minutes, 5 seconds)
2026-01-23 03:45:13,530 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:45:19,971 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1472.78174 ± 571.786
2026-01-23 03:45:19,971 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1146.3732, 1328.5309, 2885.1448, 1424.9045, 1282.2852, 2074.9465, 1266.8438, 1173.125, 1456.2109, 689.4529]
2026-01-23 03:45:19,971 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [358.0, 409.0, 1000.0, 434.0, 388.0, 643.0, 383.0, 360.0, 446.0, 256.0]
2026-01-23 03:45:19,971 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (1472.78) for latency DatasetOffice
2026-01-23 03:45:19,977 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 11 minutes, 25 seconds)
2026-01-23 03:48:54,061 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:48:58,048 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 911.23206 ± 242.780
2026-01-23 03:48:58,048 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [612.227, 877.7499, 874.99426, 838.4813, 994.72705, 1061.3081, 433.11514, 927.6194, 1315.7899, 1176.3087]
2026-01-23 03:48:58,048 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [225.0, 277.0, 275.0, 269.0, 303.0, 316.0, 182.0, 297.0, 419.0, 348.0]
2026-01-23 03:48:58,053 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 32/100 (estimated time remaining: 4 hours, 9 minutes, 15 seconds)
2026-01-23 03:52:27,983 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:52:33,031 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1213.32349 ± 438.863
2026-01-23 03:52:33,031 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1332.6273, 589.9845, 1087.63, 962.9263, 1134.6625, 1381.1116, 1097.1206, 2360.1848, 1256.0887, 930.89954]
2026-01-23 03:52:33,031 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [405.0, 225.0, 326.0, 291.0, 351.0, 405.0, 331.0, 732.0, 388.0, 290.0]
2026-01-23 03:52:33,035 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 4 minutes, 33 seconds)
2026-01-23 03:56:05,251 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:56:10,458 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1233.56299 ± 184.022
2026-01-23 03:56:10,458 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1116.422, 1184.8013, 1154.1293, 1169.2007, 1055.5637, 1117.2045, 1718.8289, 1392.5682, 1264.8927, 1162.0188]
2026-01-23 03:56:10,458 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [354.0, 362.0, 368.0, 377.0, 330.0, 360.0, 543.0, 436.0, 394.0, 375.0]
2026-01-23 03:56:10,463 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 34/100 (estimated time remaining: 4 hours, 2 minutes, 33 seconds)
2026-01-23 03:59:37,991 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:59:42,201 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1027.97437 ± 374.988
2026-01-23 03:59:42,201 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [961.373, 1346.8174, 934.6557, 1467.8315, 1087.4674, 1166.1418, 1133.03, 1273.9985, 869.71594, 38.712765]
2026-01-23 03:59:42,201 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [310.0, 404.0, 291.0, 437.0, 326.0, 355.0, 346.0, 397.0, 278.0, 30.0]
2026-01-23 03:59:42,206 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 57 minutes, 5 seconds)
2026-01-23 04:03:11,101 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:03:14,605 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 859.89990 ± 211.130
2026-01-23 04:03:14,605 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [896.2417, 911.8578, 270.63455, 777.88354, 1002.6751, 942.1136, 1095.79, 897.18036, 901.5151, 903.10657]
2026-01-23 04:03:14,605 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [279.0, 286.0, 112.0, 234.0, 304.0, 291.0, 327.0, 283.0, 285.0, 284.0]
2026-01-23 04:03:14,609 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 52 minutes, 50 seconds)
2026-01-23 04:06:41,798 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:06:45,213 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 831.67010 ± 143.046
2026-01-23 04:06:45,213 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [809.3301, 685.5344, 691.6799, 952.6633, 717.95416, 699.38855, 917.60706, 849.0335, 1163.9471, 829.56244]
2026-01-23 04:06:45,213 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [258.0, 211.0, 213.0, 287.0, 222.0, 219.0, 276.0, 268.0, 351.0, 263.0]
2026-01-23 04:06:45,219 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 47 minutes, 39 seconds)
2026-01-23 04:10:13,094 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:10:15,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 700.91919 ± 9.444
2026-01-23 04:10:15,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [686.0103, 705.594, 686.6317, 701.2808, 704.83936, 706.7033, 697.128, 693.7725, 715.4517, 711.78033]
2026-01-23 04:10:15,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [222.0, 217.0, 222.0, 213.0, 218.0, 218.0, 221.0, 217.0, 219.0, 220.0]
2026-01-23 04:10:15,900 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 43 minutes, 12 seconds)
2026-01-23 04:13:44,778 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:13:47,861 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 766.71130 ± 128.964
2026-01-23 04:13:47,861 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [713.8442, 826.5807, 706.16876, 714.46686, 736.4109, 705.91797, 716.48267, 709.0973, 699.2312, 1138.9127]
2026-01-23 04:13:47,861 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [217.0, 267.0, 218.0, 224.0, 227.0, 219.0, 228.0, 214.0, 212.0, 346.0]
2026-01-23 04:13:47,867 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 38 minutes, 31 seconds)
2026-01-23 04:17:13,700 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:17:15,940 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 474.78351 ± 302.066
2026-01-23 04:17:15,940 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [854.735, 557.36993, 685.84467, 919.4826, 372.2782, 41.88688, 740.13513, 257.0187, 112.32151, 206.7628]
2026-01-23 04:17:15,940 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [273.0, 196.0, 220.0, 286.0, 144.0, 52.0, 248.0, 108.0, 64.0, 107.0]
2026-01-23 04:17:15,945 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 34 minutes, 15 seconds)
2026-01-23 04:20:42,649 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:20:46,519 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 972.28595 ± 156.300
2026-01-23 04:20:46,520 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [739.77386, 1042.6638, 982.79236, 1122.846, 1085.3779, 865.59894, 807.0171, 1065.0062, 782.10583, 1229.6781]
2026-01-23 04:20:46,520 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [231.0, 322.0, 298.0, 334.0, 325.0, 272.0, 254.0, 320.0, 244.0, 374.0]
2026-01-23 04:20:46,524 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 30 minutes, 22 seconds)
2026-01-23 04:24:18,546 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:24:21,676 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 783.69965 ± 216.725
2026-01-23 04:24:21,676 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [768.4131, 633.3631, 634.1024, 840.4739, 629.7033, 637.13055, 651.12494, 1233.488, 652.66046, 1156.5367]
2026-01-23 04:24:21,676 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [238.0, 191.0, 195.0, 264.0, 190.0, 193.0, 196.0, 385.0, 200.0, 351.0]
2026-01-23 04:24:21,682 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 27 minutes, 46 seconds)
2026-01-23 04:27:50,189 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:27:54,379 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1017.37225 ± 492.303
2026-01-23 04:27:54,379 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [55.47666, 977.3667, 1211.835, 2036.2245, 1177.2971, 1123.0477, 1181.3309, 744.00586, 1161.4291, 505.71]
2026-01-23 04:27:54,379 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [62.0, 293.0, 367.0, 627.0, 352.0, 337.0, 355.0, 227.0, 383.0, 183.0]
2026-01-23 04:27:54,384 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 24 minutes, 38 seconds)
2026-01-23 04:31:16,385 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:31:20,351 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 945.44226 ± 756.151
2026-01-23 04:31:20,351 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [629.7024, 727.31354, 642.7078, 671.33185, 677.8301, 1015.0444, 627.9458, 624.6174, 3189.3794, 648.54987]
2026-01-23 04:31:20,351 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [198.0, 222.0, 200.0, 208.0, 209.0, 308.0, 198.0, 197.0, 1000.0, 201.0]
2026-01-23 04:31:20,357 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 19 minutes, 58 seconds)
2026-01-23 04:34:47,009 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:34:51,061 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 986.13269 ± 452.623
2026-01-23 04:34:51,062 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [100.73646, 958.1615, 1360.2316, 1936.8177, 1215.1356, 960.899, 973.5006, 694.81934, 700.7461, 960.2796]
2026-01-23 04:34:51,062 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [59.0, 291.0, 413.0, 597.0, 359.0, 298.0, 298.0, 214.0, 218.0, 291.0]
2026-01-23 04:34:51,067 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 16 minutes, 57 seconds)
2026-01-23 04:38:22,081 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:38:25,917 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 945.36340 ± 142.404
2026-01-23 04:38:25,917 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [780.5413, 1165.3268, 923.1481, 1083.2136, 949.2553, 1165.9175, 787.94995, 966.89795, 825.9782, 805.40466]
2026-01-23 04:38:25,917 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [239.0, 354.0, 289.0, 327.0, 294.0, 350.0, 244.0, 291.0, 249.0, 247.0]
2026-01-23 04:38:25,923 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 14 minutes, 13 seconds)
2026-01-23 04:41:49,554 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:41:55,236 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1339.15503 ± 827.763
2026-01-23 04:41:55,236 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [2175.1416, 2359.5632, 959.9931, 1873.388, 1617.2601, 1168.0237, 26.197083, 315.72717, 2391.7932, 504.4634]
2026-01-23 04:41:55,236 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [683.0, 732.0, 298.0, 586.0, 498.0, 401.0, 22.0, 129.0, 744.0, 184.0]
2026-01-23 04:41:55,244 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 9 minutes, 38 seconds)
2026-01-23 04:45:35,572 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:45:45,452 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2240.90259 ± 1019.402
2026-01-23 04:45:45,452 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3089.8333, 2551.9675, 80.14387, 3069.2068, 2629.7297, 1219.5359, 3088.9675, 3070.3687, 2591.1272, 1018.14435]
2026-01-23 04:45:45,452 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 849.0, 49.0, 1000.0, 818.0, 362.0, 1000.0, 1000.0, 843.0, 311.0]
2026-01-23 04:45:45,452 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (2240.90) for latency DatasetOffice
2026-01-23 04:45:45,458 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 9 minutes, 13 seconds)
2026-01-23 04:49:05,989 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:49:08,606 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 628.63489 ± 245.260
2026-01-23 04:49:08,606 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [699.665, 832.59827, 740.95935, 776.20404, 310.988, 13.478602, 728.5893, 710.9276, 760.4348, 712.5046]
2026-01-23 04:49:08,606 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [217.0, 258.0, 225.0, 238.0, 121.0, 13.0, 244.0, 220.0, 234.0, 218.0]
2026-01-23 04:49:08,612 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 5 minutes, 9 seconds)
2026-01-23 04:52:34,398 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:52:40,938 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1532.36951 ± 596.038
2026-01-23 04:52:40,938 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1327.9934, 1406.0592, 3033.4353, 1582.8555, 773.53107, 1776.2263, 1604.6154, 1349.4792, 794.2842, 1675.216]
2026-01-23 04:52:40,938 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [422.0, 445.0, 1000.0, 504.0, 238.0, 573.0, 516.0, 413.0, 289.0, 533.0]
2026-01-23 04:52:40,943 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 1 minute, 52 seconds)
2026-01-23 04:56:09,297 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:56:11,663 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 486.33563 ± 859.826
2026-01-23 04:56:11,663 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [2311.6184, 36.20383, 43.510303, 89.88332, 45.19781, 30.88891, 133.21977, 30.455307, 49.71888, 2092.6602]
2026-01-23 04:56:11,663 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [752.0, 26.0, 27.0, 56.0, 43.0, 24.0, 72.0, 34.0, 57.0, 651.0]
2026-01-23 04:56:11,670 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 57 minutes, 37 seconds)
2026-01-23 04:59:48,552 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:59:53,358 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1098.58179 ± 458.447
2026-01-23 04:59:53,359 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1359.1786, 1401.6516, 446.4055, 1322.4135, 1329.2367, 1282.2913, 1272.6716, 1508.2446, 1051.0511, 12.673816]
2026-01-23 04:59:53,359 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [427.0, 444.0, 173.0, 429.0, 420.0, 443.0, 388.0, 478.0, 370.0, 13.0]
2026-01-23 04:59:53,370 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 56 minutes, 5 seconds)
2026-01-23 05:03:13,264 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:03:19,211 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1424.86316 ± 412.576
2026-01-23 05:03:19,211 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [2191.556, 1574.4052, 1155.2985, 1745.1906, 964.28674, 867.157, 964.013, 1815.5491, 1588.4705, 1382.7048]
2026-01-23 05:03:19,211 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [665.0, 477.0, 361.0, 533.0, 297.0, 307.0, 302.0, 558.0, 494.0, 424.0]
2026-01-23 05:03:19,217 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 48 minutes, 36 seconds)
2026-01-23 05:06:43,182 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:06:48,964 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1393.24316 ± 645.706
2026-01-23 05:06:48,964 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [980.17194, 1523.369, 1023.46124, 2284.0845, 1193.6327, 979.7771, 1088.5215, 953.46594, 2934.761, 971.18646]
2026-01-23 05:06:48,964 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [317.0, 486.0, 327.0, 731.0, 372.0, 294.0, 344.0, 281.0, 923.0, 300.0]
2026-01-23 05:06:48,970 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 46 minutes, 7 seconds)
2026-01-23 05:10:16,447 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:10:21,133 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1047.16992 ± 1034.762
2026-01-23 05:10:21,133 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [93.426384, 807.3848, 550.1883, 51.293045, 49.403053, 36.34755, 1914.1681, 3050.7993, 2123.3555, 1795.3345]
2026-01-23 05:10:21,133 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [83.0, 280.0, 217.0, 48.0, 52.0, 28.0, 594.0, 1000.0, 661.0, 542.0]
2026-01-23 05:10:21,140 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 42 minutes, 33 seconds)
2026-01-23 05:13:53,732 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:13:57,857 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1032.26648 ± 174.727
2026-01-23 05:13:57,857 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1093.8732, 761.7789, 1058.2644, 916.7518, 1106.4248, 1019.7625, 839.333, 1445.3467, 1058.785, 1022.3445]
2026-01-23 05:13:57,857 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [335.0, 236.0, 316.0, 281.0, 339.0, 305.0, 257.0, 459.0, 314.0, 316.0]
2026-01-23 05:13:57,865 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 39 minutes, 55 seconds)
2026-01-23 05:17:24,206 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:17:27,741 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 911.54425 ± 82.946
2026-01-23 05:17:27,741 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1094.0541, 813.74335, 816.0398, 940.05634, 932.19116, 931.7011, 928.06885, 960.48627, 800.53815, 898.5634]
2026-01-23 05:17:27,741 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [329.0, 250.0, 251.0, 279.0, 278.0, 278.0, 279.0, 284.0, 243.0, 268.0]
2026-01-23 05:17:27,748 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 34 minutes, 38 seconds)
2026-01-23 05:21:12,446 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:21:21,632 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2102.30249 ± 1038.329
2026-01-23 05:21:21,632 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3176.885, 1334.6633, 955.72614, 3120.2007, 3070.9106, 3106.287, 1448.0791, 3112.5845, 1122.0995, 575.5897]
2026-01-23 05:21:21,632 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [971.0, 391.0, 288.0, 1000.0, 1000.0, 1000.0, 436.0, 1000.0, 333.0, 200.0]
2026-01-23 05:21:21,638 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 35 minutes, 8 seconds)
2026-01-23 05:24:30,826 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:24:35,639 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1202.53394 ± 526.663
2026-01-23 05:24:35,639 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [2760.471, 937.3991, 948.8001, 1175.543, 924.71704, 1145.0487, 1124.956, 1031.6055, 1031.617, 945.18243]
2026-01-23 05:24:35,639 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [845.0, 283.0, 287.0, 344.0, 277.0, 341.0, 332.0, 309.0, 310.0, 282.0]
2026-01-23 05:24:35,646 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 29 minutes, 20 seconds)
2026-01-23 05:28:07,913 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:28:11,436 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 902.39246 ± 62.972
2026-01-23 05:28:11,436 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [766.09265, 960.8766, 907.80536, 968.6982, 841.87427, 883.4241, 897.5787, 979.16174, 949.20276, 869.2105]
2026-01-23 05:28:11,436 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [237.0, 286.0, 273.0, 294.0, 254.0, 268.0, 272.0, 295.0, 286.0, 263.0]
2026-01-23 05:28:11,443 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 26 minutes, 16 seconds)
2026-01-23 05:31:34,148 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:31:37,759 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 932.32410 ± 116.589
2026-01-23 05:31:37,759 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [956.4724, 956.96405, 977.1127, 806.2461, 1235.541, 829.2895, 875.2624, 887.1586, 958.09, 841.10486]
2026-01-23 05:31:37,759 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [288.0, 291.0, 297.0, 245.0, 366.0, 252.0, 266.0, 268.0, 289.0, 257.0]
2026-01-23 05:31:37,766 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 21 minutes, 19 seconds)
2026-01-23 05:35:06,299 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:35:09,881 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 919.72839 ± 321.270
2026-01-23 05:35:09,881 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [957.0002, 1121.4427, 950.6167, 1316.532, 916.0677, 946.97375, 972.57886, 981.39844, 17.10297, 1017.5712]
2026-01-23 05:35:09,881 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [288.0, 337.0, 289.0, 389.0, 272.0, 283.0, 285.0, 291.0, 15.0, 312.0]
2026-01-23 05:35:09,888 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 18 minutes, 4 seconds)
2026-01-23 05:38:32,666 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:38:36,122 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 830.44763 ± 415.758
2026-01-23 05:38:36,122 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1063.6434, 1720.075, 839.464, 946.55945, 916.84393, 889.67865, 867.49207, 577.40515, 18.674004, 464.6407]
2026-01-23 05:38:36,122 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [368.0, 533.0, 255.0, 281.0, 276.0, 267.0, 262.0, 199.0, 17.0, 155.0]
2026-01-23 05:38:36,128 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 11 minutes, 2 seconds)
2026-01-23 05:42:05,177 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:42:09,578 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1071.91943 ± 690.700
2026-01-23 05:42:09,578 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [9.566386, 1090.5446, 960.6427, 1145.644, 953.3521, 895.5609, 867.5731, 938.2742, 916.62354, 2941.4133]
2026-01-23 05:42:09,578 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [11.0, 352.0, 290.0, 338.0, 285.0, 269.0, 263.0, 281.0, 271.0, 1000.0]
2026-01-23 05:42:09,586 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 9 minutes, 59 seconds)
2026-01-23 05:45:35,180 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:45:38,287 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 792.18689 ± 146.155
2026-01-23 05:45:38,287 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [709.14636, 915.22516, 807.82697, 741.8634, 777.5976, 862.80743, 924.3688, 809.05023, 955.0518, 418.93094]
2026-01-23 05:45:38,287 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [219.0, 273.0, 247.0, 227.0, 236.0, 264.0, 273.0, 245.0, 282.0, 157.0]
2026-01-23 05:45:38,295 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 5 minutes, 37 seconds)
2026-01-23 05:49:05,034 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:49:06,412 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 306.58490 ± 387.435
2026-01-23 05:49:06,413 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [994.6681, 886.20374, 794.6727, 12.736734, 26.033796, 42.90465, 72.04042, 152.92567, 44.945633, 38.717762]
2026-01-23 05:49:06,413 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [299.0, 268.0, 241.0, 13.0, 26.0, 35.0, 49.0, 94.0, 28.0, 28.0]
2026-01-23 05:49:06,421 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 2 minutes, 20 seconds)
2026-01-23 05:52:37,754 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:52:41,518 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 981.36707 ± 112.962
2026-01-23 05:52:41,518 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [869.798, 799.8655, 1134.0857, 1154.5728, 930.6033, 882.12354, 925.47815, 1064.6534, 1060.708, 991.7822]
2026-01-23 05:52:41,518 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [266.0, 245.0, 333.0, 324.0, 280.0, 268.0, 280.0, 324.0, 313.0, 297.0]
2026-01-23 05:52:41,525 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 59 minutes, 11 seconds)
2026-01-23 05:56:03,726 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:56:07,001 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 818.56628 ± 49.847
2026-01-23 05:56:07,001 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [951.44257, 812.5418, 840.4464, 746.89197, 812.4632, 795.0435, 813.96735, 790.02124, 813.67194, 809.1728]
2026-01-23 05:56:07,001 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [281.0, 246.0, 255.0, 232.0, 248.0, 245.0, 249.0, 239.0, 249.0, 247.0]
2026-01-23 05:56:07,008 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 55 minutes, 35 seconds)
2026-01-23 05:59:35,953 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:59:39,988 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 979.84387 ± 696.893
2026-01-23 05:59:39,988 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [753.334, 831.44543, 820.5698, 600.688, 3027.7563, 797.77625, 841.2642, 900.58014, 826.62573, 398.39865]
2026-01-23 05:59:39,988 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [231.0, 250.0, 248.0, 222.0, 1000.0, 238.0, 252.0, 270.0, 244.0, 147.0]
2026-01-23 05:59:39,995 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 52 minutes, 2 seconds)
2026-01-23 06:03:09,320 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:03:15,067 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1294.14648 ± 817.931
2026-01-23 06:03:15,067 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1811.1494, 2956.1763, 677.9455, 907.7558, 2033.2968, 1198.0586, 1031.5216, 434.98288, 71.97441, 1818.604]
2026-01-23 06:03:15,067 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [636.0, 1000.0, 234.0, 283.0, 629.0, 355.0, 303.0, 158.0, 42.0, 621.0]
2026-01-23 06:03:15,075 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 49 minutes, 12 seconds)
2026-01-23 06:06:45,325 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:06:50,867 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1369.73901 ± 679.201
2026-01-23 06:06:50,868 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [2303.0886, 946.1902, 985.7536, 1148.8256, 897.7386, 1053.95, 3047.6226, 1136.3071, 1179.499, 998.4154]
2026-01-23 06:06:50,868 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [712.0, 285.0, 295.0, 344.0, 268.0, 312.0, 1000.0, 339.0, 355.0, 301.0]
2026-01-23 06:06:50,876 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 46 minutes, 26 seconds)
2026-01-23 06:10:12,838 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:10:16,426 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 940.56750 ± 50.091
2026-01-23 06:10:16,426 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [918.41174, 863.366, 899.3032, 971.6393, 1002.19006, 913.3418, 917.3009, 911.26575, 973.7899, 1035.0665]
2026-01-23 06:10:16,426 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [275.0, 259.0, 269.0, 284.0, 297.0, 272.0, 273.0, 275.0, 292.0, 303.0]
2026-01-23 06:10:16,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 41 minutes, 58 seconds)
2026-01-23 06:13:47,397 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:13:54,636 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1700.39526 ± 971.961
2026-01-23 06:13:54,636 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3037.5479, 1296.4894, 2390.363, 2975.801, 954.32227, 933.25653, 757.3109, 785.7762, 3034.415, 838.6701]
2026-01-23 06:13:54,636 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 387.0, 795.0, 916.0, 290.0, 280.0, 237.0, 236.0, 1000.0, 253.0]
2026-01-23 06:13:54,645 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 39 minutes, 38 seconds)
2026-01-23 06:17:13,021 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:17:16,115 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 773.97076 ± 59.522
2026-01-23 06:17:16,116 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [835.1505, 757.295, 761.41547, 749.38403, 875.5236, 637.62177, 786.41187, 792.98676, 796.6243, 747.29376]
2026-01-23 06:17:16,116 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [254.0, 232.0, 231.0, 229.0, 264.0, 204.0, 241.0, 239.0, 243.0, 227.0]
2026-01-23 06:17:16,123 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 35 minutes, 3 seconds)
2026-01-23 06:20:52,778 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:21:02,030 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2134.29883 ± 938.481
2026-01-23 06:21:02,030 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3028.8335, 3001.4688, 3095.0479, 965.31006, 3047.3013, 1008.75653, 1305.7305, 1071.8032, 1739.0892, 3079.6465]
2026-01-23 06:21:02,030 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 292.0, 1000.0, 306.0, 389.0, 320.0, 540.0, 1000.0]
2026-01-23 06:21:02,037 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 32 minutes, 28 seconds)
2026-01-23 06:24:19,189 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:24:27,146 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1884.68384 ± 827.527
2026-01-23 06:24:27,146 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1911.0405, 2849.021, 2517.6006, 3083.3054, 1229.3123, 2684.4333, 1291.6017, 1200.5944, 421.8422, 1658.0856]
2026-01-23 06:24:27,146 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [575.0, 934.0, 775.0, 1000.0, 370.0, 896.0, 413.0, 355.0, 162.0, 538.0]
2026-01-23 06:24:27,153 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 28 minutes, 1 second)
2026-01-23 06:27:53,404 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:27:58,006 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1013.97052 ± 924.031
2026-01-23 06:27:58,006 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1219.3726, 881.74286, 3085.5806, 1319.6345, 1968.3337, 1153.6952, 108.25092, 251.08829, 102.80358, 49.20276]
2026-01-23 06:27:58,006 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [406.0, 270.0, 1000.0, 444.0, 694.0, 368.0, 60.0, 114.0, 65.0, 44.0]
2026-01-23 06:27:58,014 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 24 minutes, 55 seconds)
2026-01-23 06:31:38,111 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:31:45,098 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1662.19019 ± 959.490
2026-01-23 06:31:45,098 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3056.3567, 3058.643, 779.4395, 3102.6753, 1027.1525, 813.66534, 1539.967, 661.5408, 1143.8348, 1438.6261]
2026-01-23 06:31:45,098 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 238.0, 1000.0, 308.0, 250.0, 464.0, 209.0, 340.0, 436.0]
2026-01-23 06:31:45,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 22 minutes, 4 seconds)
2026-01-23 06:34:56,326 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:35:05,681 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2187.73096 ± 1036.312
2026-01-23 06:35:05,681 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3117.7568, 1850.7181, 2460.4927, 3126.8738, 2245.4277, 2903.3975, 3142.2437, 2468.0818, 160.1564, 402.16168]
2026-01-23 06:35:05,681 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 550.0, 748.0, 1000.0, 727.0, 896.0, 1000.0, 747.0, 83.0, 155.0]
2026-01-23 06:35:05,689 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 18 minutes, 26 seconds)
2026-01-23 06:38:34,676 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:38:40,516 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1292.26172 ± 1375.151
2026-01-23 06:38:40,516 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3039.3381, 3056.2695, 3056.3115, 781.97314, 2654.7214, 53.220745, 20.765778, 27.0671, 105.45531, 127.49425]
2026-01-23 06:38:40,516 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 239.0, 869.0, 31.0, 24.0, 26.0, 76.0, 67.0]
2026-01-23 06:38:40,523 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 14 minutes, 5 seconds)
2026-01-23 06:42:05,122 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:42:08,063 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 749.46643 ± 352.398
2026-01-23 06:42:08,064 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [798.0551, 769.5252, 878.8115, 1155.156, 737.82245, 136.18147, 1189.0654, 842.9175, 911.03314, 76.09612]
2026-01-23 06:42:08,064 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [242.0, 236.0, 264.0, 336.0, 227.0, 72.0, 350.0, 254.0, 271.0, 46.0]
2026-01-23 06:42:08,072 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 10 minutes, 43 seconds)
2026-01-23 06:45:47,263 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:45:52,664 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1337.76685 ± 761.405
2026-01-23 06:45:52,664 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [803.25195, 979.607, 1001.2518, 3110.1763, 2480.5762, 866.43304, 793.8242, 796.01135, 1311.0508, 1235.4855]
2026-01-23 06:45:52,664 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [245.0, 288.0, 300.0, 1000.0, 772.0, 257.0, 242.0, 245.0, 390.0, 361.0]
2026-01-23 06:45:52,673 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 8 minutes, 3 seconds)
2026-01-23 06:49:23,756 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:49:26,859 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 764.14410 ± 100.751
2026-01-23 06:49:26,859 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [903.678, 691.77014, 975.9167, 731.2834, 812.4552, 680.83417, 665.1176, 665.9436, 725.1892, 789.2526]
2026-01-23 06:49:26,859 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [273.0, 217.0, 291.0, 224.0, 245.0, 224.0, 232.0, 211.0, 243.0, 239.0]
2026-01-23 06:49:26,867 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 3 minutes, 42 seconds)
2026-01-23 06:52:35,400 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:52:38,560 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 801.99860 ± 121.214
2026-01-23 06:52:38,560 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [855.24396, 864.7837, 819.1345, 868.7181, 820.96893, 853.02216, 758.888, 454.98358, 900.42914, 823.81366]
2026-01-23 06:52:38,560 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [259.0, 263.0, 251.0, 262.0, 249.0, 259.0, 234.0, 169.0, 273.0, 249.0]
2026-01-23 06:52:38,571 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 84/100 (estimated time remaining: 59 minutes, 39 seconds)
2026-01-23 06:56:18,838 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:56:29,982 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2547.83398 ± 754.811
2026-01-23 06:56:29,983 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1846.8456, 3250.1765, 1497.224, 3038.5327, 1090.3678, 2364.0754, 3097.3767, 3094.568, 3085.3901, 3113.781]
2026-01-23 06:56:29,983 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [614.0, 982.0, 441.0, 1000.0, 394.0, 780.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:56:29,983 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (2547.83) for latency DatasetOffice
2026-01-23 06:56:29,991 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 85/100 (estimated time remaining: 57 minutes, 2 seconds)
2026-01-23 06:59:42,820 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:59:45,711 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 730.24109 ± 70.023
2026-01-23 06:59:45,711 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [659.19165, 736.758, 860.2761, 713.49536, 799.39966, 650.7515, 661.605, 757.727, 660.2578, 802.9493]
2026-01-23 06:59:45,711 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [208.0, 223.0, 254.0, 221.0, 239.0, 206.0, 206.0, 230.0, 209.0, 241.0]
2026-01-23 06:59:45,719 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 86/100 (estimated time remaining: 52 minutes, 52 seconds)
2026-01-23 07:03:17,640 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:03:27,800 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2320.14111 ± 957.101
2026-01-23 07:03:27,800 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1434.5092, 26.80196, 1859.0765, 3056.426, 3065.1885, 3040.4832, 2436.7668, 2082.5173, 3132.4253, 3067.2158]
2026-01-23 07:03:27,800 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [433.0, 21.0, 560.0, 1000.0, 1000.0, 1000.0, 733.0, 609.0, 1000.0, 1000.0]
2026-01-23 07:03:27,810 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 87/100 (estimated time remaining: 49 minutes, 14 seconds)
2026-01-23 07:07:01,842 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:07:05,254 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 889.99982 ± 126.198
2026-01-23 07:07:05,254 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [992.2031, 743.7818, 1177.6675, 916.6905, 775.4719, 908.0796, 773.29236, 963.8845, 862.25824, 786.6683]
2026-01-23 07:07:05,254 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [295.0, 231.0, 340.0, 276.0, 235.0, 271.0, 236.0, 288.0, 260.0, 239.0]
2026-01-23 07:07:05,263 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 88/100 (estimated time remaining: 45 minutes, 51 seconds)
2026-01-23 07:10:22,222 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:10:34,941 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2874.40088 ± 418.844
2026-01-23 07:10:34,942 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [2676.4534, 3178.8147, 3121.0696, 3029.7905, 2400.3652, 3135.183, 3198.6077, 3035.9675, 1848.066, 3119.6921]
2026-01-23 07:10:34,942 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [822.0, 1000.0, 944.0, 1000.0, 734.0, 1000.0, 1000.0, 1000.0, 554.0, 1000.0]
2026-01-23 07:10:34,942 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (2874.40) for latency DatasetOffice
2026-01-23 07:10:34,959 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 89/100 (estimated time remaining: 43 minutes, 3 seconds)
2026-01-23 07:14:04,127 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:14:08,295 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1056.17261 ± 120.298
2026-01-23 07:14:08,295 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1231.5753, 1180.8315, 856.49774, 969.2248, 972.3413, 1094.102, 1132.193, 1196.4144, 973.63525, 954.9125]
2026-01-23 07:14:08,295 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [357.0, 339.0, 256.0, 292.0, 289.0, 317.0, 324.0, 345.0, 291.0, 295.0]
2026-01-23 07:14:08,304 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 90/100 (estimated time remaining: 38 minutes, 48 seconds)
2026-01-23 07:17:38,110 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:17:40,597 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 582.83710 ± 355.115
2026-01-23 07:17:40,597 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [14.414273, 77.428276, 73.97532, 926.3711, 972.5813, 739.3508, 766.5185, 827.4037, 732.4338, 697.8948]
2026-01-23 07:17:40,597 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [14.0, 69.0, 62.0, 311.0, 289.0, 228.0, 236.0, 254.0, 225.0, 218.0]
2026-01-23 07:17:40,619 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 91/100 (estimated time remaining: 35 minutes, 49 seconds)
2026-01-23 07:21:16,931 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:21:27,681 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2438.08960 ± 754.200
2026-01-23 07:21:27,681 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3215.9478, 1545.1412, 2101.5513, 3130.3416, 3114.7712, 1965.7095, 3055.2532, 2146.762, 3113.2385, 992.1808]
2026-01-23 07:21:27,681 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 488.0, 626.0, 965.0, 1000.0, 605.0, 1000.0, 696.0, 1000.0, 342.0]
2026-01-23 07:21:27,691 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 92/100 (estimated time remaining: 32 minutes, 23 seconds)
2026-01-23 07:24:53,891 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:24:59,004 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1226.93127 ± 640.433
2026-01-23 07:24:59,005 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3096.3542, 969.477, 833.6959, 1210.943, 890.0992, 1175.0775, 1293.5049, 867.4017, 963.0504, 969.7089]
2026-01-23 07:24:59,005 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 294.0, 253.0, 359.0, 266.0, 346.0, 380.0, 261.0, 299.0, 287.0]
2026-01-23 07:24:59,013 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 93/100 (estimated time remaining: 28 minutes, 38 seconds)
2026-01-23 07:28:32,270 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:28:35,387 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 790.54529 ± 27.666
2026-01-23 07:28:35,387 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [763.7851, 784.43994, 799.4779, 824.41583, 745.3047, 747.4132, 822.88385, 807.9478, 803.1228, 806.6619]
2026-01-23 07:28:35,387 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [232.0, 236.0, 241.0, 247.0, 228.0, 229.0, 247.0, 245.0, 244.0, 244.0]
2026-01-23 07:28:35,396 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 94/100 (estimated time remaining: 25 minutes, 12 seconds)
2026-01-23 07:32:08,472 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:32:12,085 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 923.55927 ± 70.762
2026-01-23 07:32:12,085 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [914.94916, 949.8547, 909.7995, 1099.8469, 920.55096, 861.6652, 946.5985, 816.44867, 878.7129, 937.1672]
2026-01-23 07:32:12,085 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [273.0, 283.0, 274.0, 319.0, 275.0, 257.0, 286.0, 247.0, 261.0, 279.0]
2026-01-23 07:32:12,094 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 95/100 (estimated time remaining: 21 minutes, 40 seconds)
2026-01-23 07:35:50,866 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:35:54,848 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1008.51593 ± 102.848
2026-01-23 07:35:54,848 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [947.8928, 971.46075, 1195.4203, 961.5735, 922.27356, 1074.781, 1115.935, 884.51794, 1117.0796, 894.2257]
2026-01-23 07:35:54,848 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [289.0, 304.0, 349.0, 291.0, 279.0, 317.0, 323.0, 271.0, 330.0, 271.0]
2026-01-23 07:35:54,857 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 96/100 (estimated time remaining: 18 minutes, 14 seconds)
2026-01-23 07:39:16,923 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:39:26,670 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2228.55957 ± 813.926
2026-01-23 07:39:26,671 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3048.1287, 3085.009, 2166.3813, 1188.4028, 1919.8632, 2700.7607, 3124.744, 2865.8572, 1212.0292, 974.4199]
2026-01-23 07:39:26,671 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 660.0, 343.0, 627.0, 843.0, 1000.0, 897.0, 374.0, 298.0]
2026-01-23 07:39:26,680 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 97/100 (estimated time remaining: 14 minutes, 23 seconds)
2026-01-23 07:42:57,301 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:43:00,704 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 722.77594 ± 646.009
2026-01-23 07:43:00,704 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [987.9219, 68.037254, 1804.6724, 996.1456, 379.83093, 56.386944, 68.84631, 81.098816, 1648.5189, 1136.3002]
2026-01-23 07:43:00,704 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [338.0, 40.0, 598.0, 396.0, 144.0, 36.0, 59.0, 71.0, 528.0, 356.0]
2026-01-23 07:43:00,716 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 98/100 (estimated time remaining: 10 minutes, 49 seconds)
2026-01-23 07:46:40,460 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:46:48,699 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1928.56213 ± 975.220
2026-01-23 07:46:48,699 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [2051.4072, 3259.769, 3111.9036, 1205.3928, 1555.6567, 1099.3949, 3123.4944, 2501.826, 422.93246, 953.8463]
2026-01-23 07:46:48,699 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [616.0, 1000.0, 1000.0, 405.0, 473.0, 337.0, 1000.0, 749.0, 165.0, 298.0]
2026-01-23 07:46:48,737 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 99/100 (estimated time remaining: 7 minutes, 17 seconds)
2026-01-23 07:50:10,844 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:50:14,146 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 818.15686 ± 260.110
2026-01-23 07:50:14,146 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [821.3888, 858.96405, 845.92596, 946.48645, 1199.7366, 856.0553, 941.7805, 901.9093, 134.14044, 675.1804]
2026-01-23 07:50:14,146 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [249.0, 259.0, 256.0, 281.0, 370.0, 258.0, 286.0, 270.0, 70.0, 208.0]
2026-01-23 07:50:14,156 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 36 seconds)
2026-01-23 07:53:46,218 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:53:52,505 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1536.00830 ± 797.729
2026-01-23 07:53:52,506 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [2653.182, 870.9956, 1489.884, 1072.2664, 1207.8402, 2232.1533, 832.4218, 1004.07056, 3151.4265, 845.8419]
2026-01-23 07:53:52,506 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [798.0, 266.0, 451.0, 319.0, 350.0, 679.0, 251.0, 297.0, 1000.0, 253.0]
2026-01-23 07:53:52,515 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1299 [DEBUG]: Training session finished
